Add llms.txt generator for LLM-friendly documentation#2912
Conversation
|
Warning Rate limit exceeded@koxudaxi has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 9 minutes and 31 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds a new script to generate docs/llms.txt and docs/llms-full.txt from the site nav, a GitHub Actions workflow to run and commit those outputs, and related test/config updates; also excludes the generated files from codespell checks. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant GH as GitHub Actions
participant Repo as Repository
participant Runner as Action Runner
participant Script as build_llms_txt.py
participant FS as File System
participant Git as Git
GH->>Repo: Trigger on push / PR (docs/build changes)
Repo->>Runner: Checkout code (handles fork PRs)
Runner->>Runner: Install deps (uv, tox, python)
Runner->>Script: Execute build_llms_txt.py
Script->>FS: Read zensical.toml and Markdown files
FS-->>Script: Return nav, page content, metadata
Script->>Script: Generate llms.txt and llms-full.txt
Script->>FS: Write output files under docs/
FS-->>Git: Detect changes
alt Changes detected
Runner->>Git: Configure user & commit files
Git->>Repo: Push updates
else No changes
Note right of Runner: No commit/push performed
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
📚 Docs Preview: https://pr-2912.datamodel-code-generator.pages.dev |
CodSpeed Performance ReportMerging #2912 will not alter performanceComparing
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2912 +/- ##
=======================================
Coverage 99.40% 99.40%
=======================================
Files 95 95
Lines 16905 16905
Branches 1990 1990
=======================================
Hits 16804 16804
Misses 52 52
Partials 49 49
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
.github/workflows/llms-txt.yaml (1)
63-72: Consider handling concurrent push failures.The commit and push step may fail if another workflow run pushes changes concurrently (e.g., multiple PRs merged in quick succession). While unlikely, you might want to add retry logic or accept that subsequent runs will fix any missed updates.
🔎 Optional: Add retry logic for push
- name: Commit and push if changed if: github.event_name == 'push' || github.event_name == 'pull_request_target' || github.event.pull_request.head.repo.full_name == github.repository run: | git config user.name "github-actions[bot]" git config user.email "github-actions[bot]@users.noreply.github.com" git add docs/llms.txt docs/llms-full.txt - git diff --staged --quiet || git commit -m "docs: update llms.txt files + if git diff --staged --quiet; then + echo "No changes to commit" + exit 0 + fi + git commit -m "docs: update llms.txt files Generated by GitHub Actions" - git push + # Retry push in case of concurrent updates + for i in 1 2 3; do + git push && break + git pull --rebase + donescripts/build_llms_txt.py (1)
57-68: Unused fieldis_section.The
is_sectionfield inPageInfois defined but never set toTrueor checked anywhere in the code. Consider removing it if it's not needed.🔎 Remove unused field
@dataclass class PageInfo: """Information about a documentation page.""" title: str path: str url: str description: str content: str depth: int = 0 - is_section: bool = False
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
docs/llms-full.txtis excluded by none and included by nonedocs/llms.txtis excluded by none and included by none
📒 Files selected for processing (5)
.github/workflows/codespell.yaml.github/workflows/llms-txt.yamlpyproject.tomlscripts/build_llms_txt.pytox.ini
🔇 Additional comments (11)
pyproject.toml (1)
210-210: LGTM!The codespell skip configuration correctly excludes the newly generated
llms.txtandllms-full.txtfiles from spell checking, which is appropriate for auto-generated documentation files..github/workflows/codespell.yaml (1)
30-30: LGTM!The workflow skip configuration aligns with the
pyproject.tomlcodespell settings, ensuring the generated documentation files are excluded from spell checking in CI..github/workflows/llms-txt.yaml (1)
28-37: Security model looks reasonable.The job condition properly restricts execution:
- Push events to main branch
- Non-fork PRs (safe since they're from collaborators)
- Specific trusted actors for fork PRs
pull_request_targetrequires bothsafe-to-fixlabel AND trusted senderThis mitigates the typical
pull_request_targetsecurity risks while allowing the workflow to push commits.Verify that the trusted actors list (
koxudaxi,gaborbernat,ilovelinux) represents the current maintainers who should have commit access via this workflow.scripts/build_llms_txt.py (8)
21-24: LGTM!The
tomllib/tomlifallback pattern correctly handles Python 3.10 compatibility wheretomllibisn't available in the standard library.
80-99: LGTM!Error handling is appropriate for a CLI script, with clear error messages for missing files and TOML parsing errors.
102-115: LGTM!The recursive flattening logic correctly handles both leaf pages (string paths) and nested sections (list of children).
143-179: LGTM!The Markdown parsing logic handles common constructs well, including code blocks, headings, admonitions, and images. The description extraction stops at the first empty line after content or at a level-2 heading, which is a sensible heuristic.
Minor note: The code block detection (
startswith("```")) won't catch fenced blocks using tildes (~~~), but this is unlikely to cause issues in practice.
202-224: LGTM!The URL generation logic correctly handles index pages and constructs clean URLs with trailing slashes.
250-306: LGTM!The
generate_llms_txtfunction correctly separates main and optional content, renders hierarchical sections with appropriate heading levels, and ensures consistent trailing newline handling.
309-324: LGTM!The function correctly strips the initial H1 heading from content to avoid duplication with the generated title line. The separator pattern (
---) between pages follows common conventions.
327-386: LGTM!The
mainfunction provides a clean CLI interface with helpful--checkmode for CI validation. Error messages clearly indicate how to regenerate out-of-date files.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
scripts/build_llms_txt.py (3)
99-107: Consider separating print and return statements.The pattern
return print(...)works (returns None) but is non-idiomatic. Consider:if not md_path.exists(): print(f"Warning: {md_path} not found, skipping", file=sys.stderr) return None
119-122: Simplify the nested conditional logic.The nested if-statement checks the same condition as part of the outer clause, making the flow harder to follow.
🔎 Clearer alternative
- if s.startswith(SKIP_PREFIXES) or (not s and desc_lines): - if not s and desc_lines: - break - continue + if s.startswith(SKIP_PREFIXES): + continue + if not s and desc_lines: + break + if not s: + continue
181-186: Consider breaking down the complex list comprehension.The triple-nested comprehension with a walrus operator is functionally correct but dense. For improved readability:
🔎 More readable alternative
- lines.extend( - fmt(p) - for s in optional - for item in ([s] if s.path else s.children) - if item.path and (p := page_map.get(item.path)) - ) + for s in optional: + items = [s] if s.path else s.children + for item in items: + if item.path and (p := page_map.get(item.path)): + lines.append(fmt(p))
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
pyproject.tomlscripts/build_llms_txt.py
🔇 Additional comments (9)
pyproject.toml (1)
210-210: LGTM!Correctly excludes the generated llms.txt files from codespell checks.
scripts/build_llms_txt.py (8)
1-36: LGTM!The module is well-documented, imports are appropriate, and the tomllib/tomli fallback correctly handles Python 3.10 compatibility.
39-68: LGTM!Type definitions are clear and use appropriate constructs (TypedDict for configuration, dataclasses for data models).
70-84: LGTM!Appropriate error handling for a CLI script with clear error messages.
87-96: LGTM!The recursive navigation flattening logic correctly handles both leaf pages and nested sections.
136-147: LGTM!The page collection logic correctly handles URL construction for both root index pages and nested documentation paths, with appropriate recursion for sections.
194-202: LGTM!The full text generation correctly strips the initial H1 heading to avoid duplication and formats each page with proper separators.
205-218: LGTM!The check mode implementation correctly validates file existence and content, with appropriate error messages for CI workflows.
221-247: LGTM!The main function and entry point are well-structured, handle both check and generation modes appropriately, and return proper exit codes for CI integration.
Breaking Change AnalysisResult: No breaking changes detected Reasoning: This PR adds new documentation generation tooling (llms.txt generator) without modifying any core code generator functionality. The changes are: 1) A new script to generate llms.txt files from documentation, 2) A new GitHub Actions workflow, 3) Generated documentation files, 4) A new tox environment, and 5) Codespell configuration updates. No changes were made to the datamodel_code_generator package, CLI options, Python API, templates, default behaviors, or error handling. This is purely additive infrastructure for LLM-friendly documentation and does not affect users of the library in any way. This analysis was performed by Claude Code Action |
|
🎉 Released in 0.52.1 This PR is now available in the latest release. See the release notes for details. |
Summary by CodeRabbit
New Features
Chores
✏️ Tip: You can customize this high-level summary in your review settings.