Add llms.txt generator for LLM-friendly documentation by koxudaxi · Pull Request #2912 · koxudaxi/datamodel-code-generator

koxudaxi · 2026-01-03T06:24:12Z

Summary by CodeRabbit

New Features
- Automated generation of llms.txt and llms-full.txt from the site navigation, plus a test task to produce/validate these files.
- New workflow to automatically build and commit updated llms documentation when relevant docs or build scripts change.
Chores
- Updated spelling-check configuration to exclude the generated llms documentation files.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-03T06:24:17Z

Warning

Rate limit exceeded

@koxudaxi has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 9 minutes and 31 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 7b1f9e7 and d9c1e57.

⛔ Files ignored due to path filters (1)

docs/llms-full.txt is excluded by none and included by none

📒 Files selected for processing (1)

tox.ini

📝 Walkthrough

Walkthrough

Adds a new script to generate docs/llms.txt and docs/llms-full.txt from the site nav, a GitHub Actions workflow to run and commit those outputs, and related test/config updates; also excludes the generated files from codespell checks.

Changes

Cohort / File(s)	Summary
Workflows `.github/workflows/codespell.yaml`, `.github/workflows/llms-txt.yaml`	Expanded codespell skip list to include generated llms files; added new "Update llms.txt" workflow that conditionally runs on docs/build changes, checks out code (handles forked PRs), runs the build script, and commits/pushes updated files when changed.
Build script `scripts/build_llms_txt.py`	New script that parses `zensical.toml`, flattens nav into sections, extracts metadata and content from Markdown pages, marks optional pages, and generates `docs/llms.txt` (hierarchical links + descriptions) and `docs/llms-full.txt` (detailed page contents). Includes CLI (--check, --output-dir) and dataclasses (`PageInfo`, `NavSection`).
Config / Test matrix `pyproject.toml`, `tox.ini`	Added `docs/llms.txt` and `docs/llms-full.txt` to codespell skip patterns in `pyproject.toml`; added `[testenv:llms-txt]` in `tox.ini` to run the build script (supports `--check`).

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant GH as GitHub Actions
    participant Repo as Repository
    participant Runner as Action Runner
    participant Script as build_llms_txt.py
    participant FS as File System
    participant Git as Git

    GH->>Repo: Trigger on push / PR (docs/build changes)
    Repo->>Runner: Checkout code (handles fork PRs)
    Runner->>Runner: Install deps (uv, tox, python)
    Runner->>Script: Execute build_llms_txt.py
    Script->>FS: Read zensical.toml and Markdown files
    FS-->>Script: Return nav, page content, metadata
    Script->>Script: Generate llms.txt and llms-full.txt
    Script->>FS: Write output files under docs/
    FS-->>Git: Detect changes
    alt Changes detected
        Runner->>Git: Configure user & commit files
        Git->>Repo: Push updates
    else No changes
        Note right of Runner: No commit/push performed
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰📜 I nibble nav trees and hop through each page,
Flattening branches to tidy the stage.
Two texts I assemble — concise and full,
Pushed by a workflow, neat and not dull.
Hooray for docs that dance off the page!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 72.73% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: adding a new llms.txt generator script for creating LLM-friendly documentation, which is directly supported by the addition of scripts/build_llms_txt.py and related workflow/configuration changes.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-01-03T06:24:57Z

📚 Docs Preview: https://pr-2912.datamodel-code-generator.pages.dev

codspeed-hq · 2026-01-03T06:26:57Z

CodSpeed Performance Report

Merging #2912 will not alter performance

_{Comparing feature/llms-txt-generator (d9c1e57) with main (e12cd96)}

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

Summary

✅ 11 untouched
⏩ 98 skipped¹

98 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

codecov · 2026-01-03T06:27:46Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.40%. Comparing base (e12cd96) to head (d9c1e57).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2912   +/-   ##
=======================================
  Coverage   99.40%   99.40%           
=======================================
  Files          95       95           
  Lines       16905    16905           
  Branches     1990     1990           
=======================================
  Hits        16804    16804           
  Misses         52       52           
  Partials       49       49

Flag	Coverage Δ
unittests	`99.40% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

koxudaxi · 2026-01-03T07:17:12Z

@coderabbitai review

coderabbitai · 2026-01-03T07:17:17Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

.github/workflows/llms-txt.yaml (1)

63-72: Consider handling concurrent push failures.

The commit and push step may fail if another workflow run pushes changes concurrently (e.g., multiple PRs merged in quick succession). While unlikely, you might want to add retry logic or accept that subsequent runs will fix any missed updates.

🔎 Optional: Add retry logic for push

       - name: Commit and push if changed
         if: github.event_name == 'push' || github.event_name == 'pull_request_target' || github.event.pull_request.head.repo.full_name == github.repository
         run: |
           git config user.name "github-actions[bot]"
           git config user.email "github-actions[bot]@users.noreply.github.com"
           git add docs/llms.txt docs/llms-full.txt
-          git diff --staged --quiet || git commit -m "docs: update llms.txt files
+          if git diff --staged --quiet; then
+            echo "No changes to commit"
+            exit 0
+          fi
+          git commit -m "docs: update llms.txt files

           Generated by GitHub Actions"
-          git push
+          # Retry push in case of concurrent updates
+          for i in 1 2 3; do
+            git push && break
+            git pull --rebase
+          done

scripts/build_llms_txt.py (1)

57-68: Unused field is_section.

The is_section field in PageInfo is defined but never set to True or checked anywhere in the code. Consider removing it if it's not needed.
🔎 Remove unused field
 @dataclass
 class PageInfo:
     """Information about a documentation page."""

     title: str
     path: str
     url: str
     description: str
     content: str
     depth: int = 0
-    is_section: bool = False

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f2859a6 and 7d9f3c9.

⛔ Files ignored due to path filters (2)

docs/llms-full.txt is excluded by none and included by none
docs/llms.txt is excluded by none and included by none

📒 Files selected for processing (5)

.github/workflows/codespell.yaml
.github/workflows/llms-txt.yaml
pyproject.toml
scripts/build_llms_txt.py
tox.ini

🔇 Additional comments (11)

pyproject.toml (1)

210-210: LGTM!

The codespell skip configuration correctly excludes the newly generated llms.txt and llms-full.txt files from spell checking, which is appropriate for auto-generated documentation files.

.github/workflows/codespell.yaml (1)

30-30: LGTM!

The workflow skip configuration aligns with the pyproject.toml codespell settings, ensuring the generated documentation files are excluded from spell checking in CI.

.github/workflows/llms-txt.yaml (1)

28-37: Security model looks reasonable.

The job condition properly restricts execution:

Push events to main branch

Non-fork PRs (safe since they're from collaborators)

Specific trusted actors for fork PRs

pull_request_target requires both safe-to-fix label AND trusted sender

This mitigates the typical pull_request_target security risks while allowing the workflow to push commits.

Verify that the trusted actors list (koxudaxi, gaborbernat, ilovelinux) represents the current maintainers who should have commit access via this workflow.

scripts/build_llms_txt.py (8)

21-24: LGTM!

The tomllib/tomli fallback pattern correctly handles Python 3.10 compatibility where tomllib isn't available in the standard library.

80-99: LGTM!

Error handling is appropriate for a CLI script, with clear error messages for missing files and TOML parsing errors.

102-115: LGTM!

The recursive flattening logic correctly handles both leaf pages (string paths) and nested sections (list of children).

143-179: LGTM!

The Markdown parsing logic handles common constructs well, including code blocks, headings, admonitions, and images. The description extraction stops at the first empty line after content or at a level-2 heading, which is a sensible heuristic.

Minor note: The code block detection (startswith("```")) won't catch fenced blocks using tildes (~~~), but this is unlikely to cause issues in practice.

202-224: LGTM!

The URL generation logic correctly handles index pages and constructs clean URLs with trailing slashes.

250-306: LGTM!

The generate_llms_txt function correctly separates main and optional content, renders hierarchical sections with appropriate heading levels, and ensures consistent trailing newline handling.

309-324: LGTM!

The function correctly strips the initial H1 heading from content to avoid duplication with the generated title line. The separator pattern (---) between pages follows common conventions.

327-386: LGTM!

The main function provides a clean CLI interface with helpful --check mode for CI validation. Error messages clearly indicate how to regenerate out-of-date files.

Generated by GitHub Actions

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

scripts/build_llms_txt.py (3)

99-107: Consider separating print and return statements.

The pattern return print(...) works (returns None) but is non-idiomatic. Consider:

if not md_path.exists():
    print(f"Warning: {md_path} not found, skipping", file=sys.stderr)
    return None

119-122: Simplify the nested conditional logic.

The nested if-statement checks the same condition as part of the outer clause, making the flow harder to follow.

🔎 Clearer alternative

-            if s.startswith(SKIP_PREFIXES) or (not s and desc_lines):
-                if not s and desc_lines:
-                    break
-                continue
+            if s.startswith(SKIP_PREFIXES):
+                continue
+            if not s and desc_lines:
+                break
+            if not s:
+                continue

181-186: Consider breaking down the complex list comprehension.

The triple-nested comprehension with a walrus operator is functionally correct but dense. For improved readability:

🔎 More readable alternative

-        lines.extend(
-            fmt(p)
-            for s in optional
-            for item in ([s] if s.path else s.children)
-            if item.path and (p := page_map.get(item.path))
-        )
+        for s in optional:
+            items = [s] if s.path else s.children
+            for item in items:
+                if item.path and (p := page_map.get(item.path)):
+                    lines.append(fmt(p))

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7d9f3c9 and 7b1f9e7.

📒 Files selected for processing (2)

pyproject.toml
scripts/build_llms_txt.py

🔇 Additional comments (9)

pyproject.toml (1)

210-210: LGTM!

Correctly excludes the generated llms.txt files from codespell checks.

scripts/build_llms_txt.py (8)

1-36: LGTM!

The module is well-documented, imports are appropriate, and the tomllib/tomli fallback correctly handles Python 3.10 compatibility.

39-68: LGTM!

Type definitions are clear and use appropriate constructs (TypedDict for configuration, dataclasses for data models).

70-84: LGTM!

Appropriate error handling for a CLI script with clear error messages.

87-96: LGTM!

The recursive navigation flattening logic correctly handles both leaf pages and nested sections.

136-147: LGTM!

The page collection logic correctly handles URL construction for both root index pages and nested documentation paths, with appropriate recursion for sections.

194-202: LGTM!

The full text generation correctly strips the initial H1 heading to avoid duplication and formats each page with proper separators.

205-218: LGTM!

The check mode implementation correctly validates file existence and content, with appropriate error messages for CI workflows.

221-247: LGTM!

The main function and entry point are well-structured, handle both check and generation modes appropriately, and return proper exit codes for CI integration.

github-actions · 2026-01-03T09:08:20Z

Breaking Change Analysis

Result: No breaking changes detected

Reasoning: This PR adds new documentation generation tooling (llms.txt generator) without modifying any core code generator functionality. The changes are: 1) A new script to generate llms.txt files from documentation, 2) A new GitHub Actions workflow, 3) Generated documentation files, 4) A new tox environment, and 5) Codespell configuration updates. No changes were made to the datamodel_code_generator package, CLI options, Python API, templates, default behaviors, or error handling. This is purely additive infrastructure for LLM-friendly documentation and does not affect users of the library in any way.

This analysis was performed by Claude Code Action

github-actions · 2026-01-03T17:52:54Z

🎉 Released in 0.52.1

This PR is now available in the latest release. See the release notes for details.

Add llms.txt generator for LLM-friendly documentation

94afac7

Fix codespell skip for llms.txt files

7d9f3c9

coderabbitai Bot reviewed Jan 3, 2026

View reviewed changes

Comment thread tox.ini

koxudaxi and others added 3 commits January 3, 2026 07:24

Refactor build_llms_txt.py: DRY, TypedDict, walrus operator

84afdcd

Merge branch 'main' into feature/llms-txt-generator

7b1f9e7

docs: update llms.txt files

e710c80

Generated by GitHub Actions

coderabbitai Bot reviewed Jan 3, 2026

View reviewed changes

Add Python 3.11+ requirement note to llms-txt env

d9c1e57

koxudaxi merged commit c83470e into main Jan 3, 2026
38 checks passed

koxudaxi deleted the feature/llms-txt-generator branch January 3, 2026 09:07

github-actions Bot added the breaking-change-analyzed label Jan 3, 2026

Uh oh!

Conversation

koxudaxi commented Jan 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jan 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

github-actions Bot commented Jan 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq Bot commented Jan 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #2912 will not alter performance

Summary

Footnotes

Uh oh!

codecov Bot commented Jan 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

koxudaxi commented Jan 3, 2026

Uh oh!

coderabbitai Bot commented Jan 3, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Jan 3, 2026

Breaking Change Analysis

Uh oh!

github-actions Bot commented Jan 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

koxudaxi commented Jan 3, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jan 3, 2026 •

edited

Loading

github-actions Bot commented Jan 3, 2026 •

edited

Loading

codspeed-hq Bot commented Jan 3, 2026 •

edited

Loading

codecov Bot commented Jan 3, 2026 •

edited

Loading