Skip to content

Add llms.txt generator for LLM-friendly documentation#2912

Merged
koxudaxi merged 6 commits intomainfrom
feature/llms-txt-generator
Jan 3, 2026
Merged

Add llms.txt generator for LLM-friendly documentation#2912
koxudaxi merged 6 commits intomainfrom
feature/llms-txt-generator

Conversation

@koxudaxi
Copy link
Copy Markdown
Owner

@koxudaxi koxudaxi commented Jan 3, 2026

Summary by CodeRabbit

  • New Features

    • Automated generation of llms.txt and llms-full.txt from the site navigation, plus a test task to produce/validate these files.
    • New workflow to automatically build and commit updated llms documentation when relevant docs or build scripts change.
  • Chores

    • Updated spelling-check configuration to exclude the generated llms documentation files.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jan 3, 2026

Warning

Rate limit exceeded

@koxudaxi has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 9 minutes and 31 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 7b1f9e7 and d9c1e57.

⛔ Files ignored due to path filters (1)
  • docs/llms-full.txt is excluded by none and included by none
📒 Files selected for processing (1)
  • tox.ini
📝 Walkthrough

Walkthrough

Adds a new script to generate docs/llms.txt and docs/llms-full.txt from the site nav, a GitHub Actions workflow to run and commit those outputs, and related test/config updates; also excludes the generated files from codespell checks.

Changes

Cohort / File(s) Summary
Workflows
.github/workflows/codespell.yaml, .github/workflows/llms-txt.yaml
Expanded codespell skip list to include generated llms files; added new "Update llms.txt" workflow that conditionally runs on docs/build changes, checks out code (handles forked PRs), runs the build script, and commits/pushes updated files when changed.
Build script
scripts/build_llms_txt.py
New script that parses zensical.toml, flattens nav into sections, extracts metadata and content from Markdown pages, marks optional pages, and generates docs/llms.txt (hierarchical links + descriptions) and docs/llms-full.txt (detailed page contents). Includes CLI (--check, --output-dir) and dataclasses (PageInfo, NavSection).
Config / Test matrix
pyproject.toml, tox.ini
Added docs/llms.txt and docs/llms-full.txt to codespell skip patterns in pyproject.toml; added [testenv:llms-txt] in tox.ini to run the build script (supports --check).

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant GH as GitHub Actions
    participant Repo as Repository
    participant Runner as Action Runner
    participant Script as build_llms_txt.py
    participant FS as File System
    participant Git as Git

    GH->>Repo: Trigger on push / PR (docs/build changes)
    Repo->>Runner: Checkout code (handles fork PRs)
    Runner->>Runner: Install deps (uv, tox, python)
    Runner->>Script: Execute build_llms_txt.py
    Script->>FS: Read zensical.toml and Markdown files
    FS-->>Script: Return nav, page content, metadata
    Script->>Script: Generate llms.txt and llms-full.txt
    Script->>FS: Write output files under docs/
    FS-->>Git: Detect changes
    alt Changes detected
        Runner->>Git: Configure user & commit files
        Git->>Repo: Push updates
    else No changes
        Note right of Runner: No commit/push performed
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰📜 I nibble nav trees and hop through each page,
Flattening branches to tidy the stage.
Two texts I assemble — concise and full,
Pushed by a workflow, neat and not dull.
Hooray for docs that dance off the page!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 72.73% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding a new llms.txt generator script for creating LLM-friendly documentation, which is directly supported by the addition of scripts/build_llms_txt.py and related workflow/configuration changes.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jan 3, 2026

📚 Docs Preview: https://pr-2912.datamodel-code-generator.pages.dev

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Jan 3, 2026

CodSpeed Performance Report

Merging #2912 will not alter performance

Comparing feature/llms-txt-generator (d9c1e57) with main (e12cd96)

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

Summary

✅ 11 untouched
⏩ 98 skipped1

Footnotes

  1. 98 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@codecov
Copy link
Copy Markdown

codecov Bot commented Jan 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.40%. Comparing base (e12cd96) to head (d9c1e57).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2912   +/-   ##
=======================================
  Coverage   99.40%   99.40%           
=======================================
  Files          95       95           
  Lines       16905    16905           
  Branches     1990     1990           
=======================================
  Hits        16804    16804           
  Misses         52       52           
  Partials       49       49           
Flag Coverage Δ
unittests 99.40% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@koxudaxi
Copy link
Copy Markdown
Owner Author

koxudaxi commented Jan 3, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jan 3, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
.github/workflows/llms-txt.yaml (1)

63-72: Consider handling concurrent push failures.

The commit and push step may fail if another workflow run pushes changes concurrently (e.g., multiple PRs merged in quick succession). While unlikely, you might want to add retry logic or accept that subsequent runs will fix any missed updates.

🔎 Optional: Add retry logic for push
       - name: Commit and push if changed
         if: github.event_name == 'push' || github.event_name == 'pull_request_target' || github.event.pull_request.head.repo.full_name == github.repository
         run: |
           git config user.name "github-actions[bot]"
           git config user.email "github-actions[bot]@users.noreply.github.com"
           git add docs/llms.txt docs/llms-full.txt
-          git diff --staged --quiet || git commit -m "docs: update llms.txt files
+          if git diff --staged --quiet; then
+            echo "No changes to commit"
+            exit 0
+          fi
+          git commit -m "docs: update llms.txt files

           Generated by GitHub Actions"
-          git push
+          # Retry push in case of concurrent updates
+          for i in 1 2 3; do
+            git push && break
+            git pull --rebase
+          done
scripts/build_llms_txt.py (1)

57-68: Unused field is_section.

The is_section field in PageInfo is defined but never set to True or checked anywhere in the code. Consider removing it if it's not needed.

🔎 Remove unused field
 @dataclass
 class PageInfo:
     """Information about a documentation page."""

     title: str
     path: str
     url: str
     description: str
     content: str
     depth: int = 0
-    is_section: bool = False
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f2859a6 and 7d9f3c9.

⛔ Files ignored due to path filters (2)
  • docs/llms-full.txt is excluded by none and included by none
  • docs/llms.txt is excluded by none and included by none
📒 Files selected for processing (5)
  • .github/workflows/codespell.yaml
  • .github/workflows/llms-txt.yaml
  • pyproject.toml
  • scripts/build_llms_txt.py
  • tox.ini
🔇 Additional comments (11)
pyproject.toml (1)

210-210: LGTM!

The codespell skip configuration correctly excludes the newly generated llms.txt and llms-full.txt files from spell checking, which is appropriate for auto-generated documentation files.

.github/workflows/codespell.yaml (1)

30-30: LGTM!

The workflow skip configuration aligns with the pyproject.toml codespell settings, ensuring the generated documentation files are excluded from spell checking in CI.

.github/workflows/llms-txt.yaml (1)

28-37: Security model looks reasonable.

The job condition properly restricts execution:

  • Push events to main branch
  • Non-fork PRs (safe since they're from collaborators)
  • Specific trusted actors for fork PRs
  • pull_request_target requires both safe-to-fix label AND trusted sender

This mitigates the typical pull_request_target security risks while allowing the workflow to push commits.

Verify that the trusted actors list (koxudaxi, gaborbernat, ilovelinux) represents the current maintainers who should have commit access via this workflow.

scripts/build_llms_txt.py (8)

21-24: LGTM!

The tomllib/tomli fallback pattern correctly handles Python 3.10 compatibility where tomllib isn't available in the standard library.


80-99: LGTM!

Error handling is appropriate for a CLI script, with clear error messages for missing files and TOML parsing errors.


102-115: LGTM!

The recursive flattening logic correctly handles both leaf pages (string paths) and nested sections (list of children).


143-179: LGTM!

The Markdown parsing logic handles common constructs well, including code blocks, headings, admonitions, and images. The description extraction stops at the first empty line after content or at a level-2 heading, which is a sensible heuristic.

Minor note: The code block detection (startswith("```")) won't catch fenced blocks using tildes (~~~), but this is unlikely to cause issues in practice.


202-224: LGTM!

The URL generation logic correctly handles index pages and constructs clean URLs with trailing slashes.


250-306: LGTM!

The generate_llms_txt function correctly separates main and optional content, renders hierarchical sections with appropriate heading levels, and ensures consistent trailing newline handling.


309-324: LGTM!

The function correctly strips the initial H1 heading from content to avoid duplication with the generated title line. The separator pattern (---) between pages follows common conventions.


327-386: LGTM!

The main function provides a clean CLI interface with helpful --check mode for CI validation. Error messages clearly indicate how to regenerate out-of-date files.

Comment thread tox.ini
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
scripts/build_llms_txt.py (3)

99-107: Consider separating print and return statements.

The pattern return print(...) works (returns None) but is non-idiomatic. Consider:

if not md_path.exists():
    print(f"Warning: {md_path} not found, skipping", file=sys.stderr)
    return None

119-122: Simplify the nested conditional logic.

The nested if-statement checks the same condition as part of the outer clause, making the flow harder to follow.

🔎 Clearer alternative
-            if s.startswith(SKIP_PREFIXES) or (not s and desc_lines):
-                if not s and desc_lines:
-                    break
-                continue
+            if s.startswith(SKIP_PREFIXES):
+                continue
+            if not s and desc_lines:
+                break
+            if not s:
+                continue

181-186: Consider breaking down the complex list comprehension.

The triple-nested comprehension with a walrus operator is functionally correct but dense. For improved readability:

🔎 More readable alternative
-        lines.extend(
-            fmt(p)
-            for s in optional
-            for item in ([s] if s.path else s.children)
-            if item.path and (p := page_map.get(item.path))
-        )
+        for s in optional:
+            items = [s] if s.path else s.children
+            for item in items:
+                if item.path and (p := page_map.get(item.path)):
+                    lines.append(fmt(p))
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7d9f3c9 and 7b1f9e7.

📒 Files selected for processing (2)
  • pyproject.toml
  • scripts/build_llms_txt.py
🔇 Additional comments (9)
pyproject.toml (1)

210-210: LGTM!

Correctly excludes the generated llms.txt files from codespell checks.

scripts/build_llms_txt.py (8)

1-36: LGTM!

The module is well-documented, imports are appropriate, and the tomllib/tomli fallback correctly handles Python 3.10 compatibility.


39-68: LGTM!

Type definitions are clear and use appropriate constructs (TypedDict for configuration, dataclasses for data models).


70-84: LGTM!

Appropriate error handling for a CLI script with clear error messages.


87-96: LGTM!

The recursive navigation flattening logic correctly handles both leaf pages and nested sections.


136-147: LGTM!

The page collection logic correctly handles URL construction for both root index pages and nested documentation paths, with appropriate recursion for sections.


194-202: LGTM!

The full text generation correctly strips the initial H1 heading to avoid duplication and formats each page with proper separators.


205-218: LGTM!

The check mode implementation correctly validates file existence and content, with appropriate error messages for CI workflows.


221-247: LGTM!

The main function and entry point are well-structured, handle both check and generation modes appropriately, and return proper exit codes for CI integration.

@koxudaxi koxudaxi merged commit c83470e into main Jan 3, 2026
38 checks passed
@koxudaxi koxudaxi deleted the feature/llms-txt-generator branch January 3, 2026 09:07
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jan 3, 2026

Breaking Change Analysis

Result: No breaking changes detected

Reasoning: This PR adds new documentation generation tooling (llms.txt generator) without modifying any core code generator functionality. The changes are: 1) A new script to generate llms.txt files from documentation, 2) A new GitHub Actions workflow, 3) Generated documentation files, 4) A new tox environment, and 5) Codespell configuration updates. No changes were made to the datamodel_code_generator package, CLI options, Python API, templates, default behaviors, or error handling. This is purely additive infrastructure for LLM-friendly documentation and does not affect users of the library in any way.


This analysis was performed by Claude Code Action

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jan 3, 2026

🎉 Released in 0.52.1

This PR is now available in the latest release. See the release notes for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant