feat: LRU cache eviction for PDF page decoders by dpantaleoni · Pull Request #282 · docling-project/docling-parse

dpantaleoni · 2026-06-25T18:32:44Z

Description

This PR adds a Least Recently Used cache eviction policy for pdf page decoders, preventing unbounded memory growth when processing many large PDF documents in succession. This resolves the bug described in this comment on docling-serve issue #366.

Fix

Replacing unbounded page decoder accumulation with an LRU-bounded cache of max 16 live page decoders eliminated the observed steady RSS growth when converting many large PDFs in succession.

Before:

After:

Testing

The script below was ran within a docling dev environment with the locally edited docling-parse dependency installed. A folder of real-world, diverse pdfs of varying sizes ~1 MB - ~20 MB was used. This was done on a Macbook pro m4 36 GB.
check_mem.py

github-actions · 2026-06-25T18:32:55Z

✅ DCO Check Passed

Thanks @dpantaleoni, all your commits are properly signed off. 🎉

mergify · 2026-06-25T18:33:20Z

Merge Protections

🟢 Merge protection satisfied — ready to merge.

Show 1 satisfied protection

🟢 Enforce conventional commit

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

PeterStaar-IBM · 2026-06-26T06:59:21Z

    // New: Persistent page decoders for typed API
    std::map<int, page_decoder_ptr> page_decoders;
+    std::list<int> page_access_order;
+    size_t max_cached_pages = 16;


this needs to be a configurable parameter for sure.

After making max_cached_pages configurable and setting default to -1 (no limit, to follow set conventions), I believe both docling and docling-serve will need a few changes to allow the user to set max_cached_pages to a limit. Would this be okay?

@dpantaleoni we need to set sensible default values. We do not necessarily need to propagate these variables to docling/docling-serve.

Signed-off-by: Dominik Pantaleoni <dominikpantaleoniibm@Dominiks-MacBook-Pro.local>

PeterStaar-IBM reviewed Jun 26, 2026

View reviewed changes

PeterStaar-IBM requested a review from dolfim-ibm June 26, 2026 07:00

feat: implement LRU cache for page decoders with automatic eviction

42a3031

Signed-off-by: Dominik Pantaleoni <dominikpantaleoniibm@Dominiks-MacBook-Pro.local>

dpantaleoni force-pushed the lru-cache-page-decoders branch from 1dc778f to 42a3031 Compare June 26, 2026 17:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: LRU cache eviction for PDF page decoders#282

feat: LRU cache eviction for PDF page decoders#282
dpantaleoni wants to merge 1 commit into
docling-project:mainfrom
dpantaleoni:lru-cache-page-decoders

dpantaleoni commented Jun 25, 2026

Uh oh!

github-actions Bot commented Jun 25, 2026 •

edited

Loading

Uh oh!

mergify Bot commented Jun 25, 2026

🟢 Enforce conventional commit

Uh oh!

PeterStaar-IBM Jun 26, 2026

Uh oh!

dpantaleoni Jun 26, 2026

Uh oh!

PeterStaar-IBM Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

dpantaleoni commented Jun 25, 2026

Description

Fix

Testing

Uh oh!

github-actions Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mergify Bot commented Jun 25, 2026

Merge Protections

🟢 Enforce conventional commit

Uh oh!

PeterStaar-IBM Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

dpantaleoni Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

PeterStaar-IBM Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented Jun 25, 2026 •

edited

Loading