PDF Reader Tool

Overview

Extracts text and metadata from a PDF, given either a local file path or an HTTP(S) URL. Supports page-range filtering (e.g. 1-5, 3, 1,3,5-7) and exposes per-page text plus the concatenated full text. Backed by pypdf — pure Python, no native binaries needed.

Prerequisites and Dependencies

Python 3.12 or later
atomic-agents
pydantic
pypdf
requests

Installation

Use the Atomic Assembler CLI: run atomic and pick pdf_reader.
Or copy the tool/ folder directly into your project.

Input & Output Structure

Input Schema

source (str): a local file path or HTTP(S) URL pointing at a PDF.
page_range (str, optional): '1', '1-5', '1,3,5-7', etc. (1-based). Out-of-range pages are silently skipped.
include_metadata (bool): default True.

Output Schema

source (str): echo of the input source.
text (str): concatenated text for the extracted pages, separated by \f (form feed).
pages (list[PdfPage]): per-page extracted text with 1-based page_number.
page_count (int): number of pages extracted (after page_range filter).
total_page_count (int): total pages in the source document.
metadata (PdfMetadata, optional): document info dictionary.
error (str, optional): set when the operation failed.

Usage

from tool.pdf_reader import PdfReaderTool, PdfReaderToolInputSchema

tool = PdfReaderTool()

# Read remote PDF, only first two pages
out = tool.run(PdfReaderToolInputSchema(
    source="https://arxiv.org/pdf/1706.03762",
    page_range="1-2",
))
print(out.text[:1000])

# Read a local PDF, every page
out = tool.run(PdfReaderToolInputSchema(source="./paper.pdf"))

Notes

The tool fails gracefully and returns error set in the output rather than raising.
For URL sources, the download is hard-capped by max_size_bytes in the config (default 50 MB).
Encrypted PDFs are not currently supported.

Contributing

PRs welcome — see the main repo CONTRIBUTING.md.

License

Same as the main Atomic Agents project.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PDF Reader Tool

Overview

Prerequisites and Dependencies

Installation

Input & Output Structure

Input Schema

Output Schema

Usage

Notes

Contributing

License

Uh oh!

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

PDF Reader Tool

Overview

Prerequisites and Dependencies

Installation

Input & Output Structure

Input Schema

Output Schema

Usage

Notes

Contributing

License