chore(docs): add link checker + fix Voltron/polarity-axis broken images#414
Merged
Conversation
New scripts/development/lint/link_check.py walks docs/ (or any path
argument), parses every markdown link and image reference, and reports
local targets that don't resolve on disk. External URLs are counted but
not network-verified. Fenced code blocks are skipped so `[brackets](and
parens)` in examples don't fire false positives. Anchor-only links
(`#section`) are ignored — verifying them needs a heading-slug parser.
Usage:
python3 scripts/development/lint/link_check.py # scan docs/
python3 scripts/development/lint/link_check.py docs/architecture
python3 scripts/development/lint/link_check.py --fail-on-broken # CI
First run found 70 broken local links across docs/. This PR fixes the
3-link cohort that shares the same root cause:
- ADR-022 Voltron image
- ADR-058 polarity-axis triangulation overview
- ADR-058 polarity-axis before/after comparison
All three pointed at `media/ADR-XXX/file.png` (nested subdir convention)
but the actual on-disk paths are `media-ADR-XXX/file.png` (hyphen, flat).
The on-disk convention wins — three 1-line link-target fixes.
The remaining 67 broken links are renamed/moved/deleted targets
(QUICKSTART.md, ARCHITECTURE.md, ADR-027, ADR-016, ADR-068, ADR-076,
ADR-071a, polarity-axis-analysis/, RECURSIVE_UPSERT_ARCHITECTURE.md,
etc.). Each is a 1-line fix but the right destination requires per-link
research, so they're left for a separate triage pass — not wired into CI
yet to avoid blocking PRs on the backlog.
… vendored .venv Second pass after the initial 3 image fixes. Used the new link_check.py report as input to a small fixer script that, for each broken local target, finds candidates by basename under docs/ and applies the fix when exactly one candidate exists. - 37 fixes via basename-uniqueness: ADRs that moved into their category subdirs (authentication-security/, ai-embeddings/, database-schema/, vocabulary-relationships/, infrastructure/, user-interfaces/, query-search/, ingestion-content/) plus top-level docs that moved to reference/ (ARCHITECTURE_OVERVIEW.md, RECURSIVE_UPSERT_ARCHITECTURE.md) and guides/ (exploring.md, understanding-grounding.md). - 7 fixes via historical-rename mapping (recorded inline in the fixer): ADR-027 authentication → user-management-api, ADR-031 api-key-management → encrypted-api-key-storage, ADR-028 rbac → dynamic-rbac-system, ADR-032 → ADR-032.1 (split into .1/.2 over time), ADR-048 query-facade-namespace-safety → vocabulary-metadata-as-graph (renamed to reflect the ADR's primary subject). - link_check.py: skip vendored directories (.venv, node_modules, .git, __pycache__, venv) so dependency package METADATA files don't surface as scan targets. Broken count: 70 → 23 over the two passes. The remaining 23 are deleted-and-not-replaced (QUICKSTART.md ×6, ARCHITECTURE.md ×4, polarity-axis-analysis/ ×3, ../guides/README.md, ../deployment/README.md, ../installation.md, VOCABULARY_CATEGORIES.md) or never-created (ADR-071a "parallel implementation findings" ×3, ADR-068 unified-embedding-regeneration). Each needs source-context judgment — rewrite the surrounding text vs remove the link vs create the missing doc — so they're left for a separate triage.
Final pass — broken local link count: 23 → 0.
**Defensible redirects** (14 sites across 5 docs) for deleted-but-replaceable
targets, batched via the same fixer pattern as the earlier pass:
- ARCHITECTURE.md → reference/ARCHITECTURE_OVERVIEW.md (system-design entry
point in the published mkdocs site)
- QUICKSTART.md → manual/01-getting-started/02-CLI_USAGE.md (first existing
chapter of the getting-started manual; original QUICKSTART.md was deleted
without a 1:1 replacement)
- using/README.md → manual/README.md ("Using the System" maps to the user
manual)
- deployment/README.md → guides/DEPLOYMENT.md (deployment guide, which exists)
- ADR-068-unified-embedding-regeneration.md →
ai-embeddings/ADR-068-source-text-embeddings.md (the actual ADR-068; the
referenced title was a planning-stage name)
**Text rewrites** (8 sites) where no redirect target was honest:
- docs/README.md: drop VOCABULARY_CATEGORIES.md from the doc inventory and
the New Users numbered list (guide was deleted; renumbered subsequent
items).
- ADR-070 §References: prototype directory under features/polarity-axis-
analysis/ was removed when the feature shipped — replaced the dead bullets
with a pointer to api/app/lib/polarity_axis.py and the surviving research
findings.
- ADR-071 §Actual Performance Results + Related ADRs: the planned standalone
ADR-071a was never split out; findings live inline in this ADR. Removed
the self-referential see-also link and the duplicate bullet, added a
one-sentence note explaining the inlining.
- ADR-072 §Related: same — pointed to the inline #actual-performance-
results-adr-071a anchor in ADR-071.
- macvlan-dedicated-ip.md §Related: dropped the link to the never-created
installation.md; kept the prose describing what install.sh's SSL flags do.
**New stub:** docs/guides/README.md — small TOC for the guides directory,
grouped by topic. Closes features/README.md:45's "User Guides" link and
gives mkdocs awesome-pages a clean H1 title for the section.
**CI wiring:** new docs-link-check job in .github/workflows/lint.yml runs
the checker with --fail-on-broken. With the backlog at zero, any new PR
that introduces a broken local link will fail CI.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
New `scripts/development/lint/link_check.py` walks markdown files under `docs/` (or any path argument), parses every link/image reference, and reports local targets that don't resolve on disk.
```
python3 scripts/development/lint/link_check.py # scan docs/
python3 scripts/development/lint/link_check.py docs/architecture
python3 scripts/development/lint/link_check.py --fail-on-broken # CI mode
```
Fixes shipped here
First run found 70 broken local links. This PR fixes the 3-link cohort with a single root cause:
All three pointed at `media/ADR-XXX/` (nested subdir convention) but the actual on-disk paths are `media-ADR-XXX/` (hyphen, flat). On-disk convention wins — 3 one-line link-target fixes.
Remaining 67 broken links
Not fixed in this PR — separate triage pass:
Each is a 1-line fix but the right destination needs per-link research, so they're left for a separate pass. The checker is not wired into CI yet — doing so before clearing the backlog would block every PR on these. Once the backlog is down, `--fail-on-broken` can be added to `.github/workflows/lint.yml`.
Test plan