Skip to content

chore(docs): add link checker + fix Voltron/polarity-axis broken images#414

Merged
aaronsb merged 3 commits into
mainfrom
chore/docs-link-checker
May 25, 2026
Merged

chore(docs): add link checker + fix Voltron/polarity-axis broken images#414
aaronsb merged 3 commits into
mainfrom
chore/docs-link-checker

Conversation

@aaronsb
Copy link
Copy Markdown
Owner

@aaronsb aaronsb commented May 25, 2026

What

New `scripts/development/lint/link_check.py` walks markdown files under `docs/` (or any path argument), parses every link/image reference, and reports local targets that don't resolve on disk.

  • External URLs (http://, https://, mailto:, etc.) are counted but not network-verified — that needs a different tool.
  • Fenced code blocks are skipped so `[brackets](and parens)` in code samples don't false-positive.
  • Pure anchor links (`#section`) are ignored — verifying them needs a heading-slug parser that knows the project's rules.

```
python3 scripts/development/lint/link_check.py # scan docs/
python3 scripts/development/lint/link_check.py docs/architecture
python3 scripts/development/lint/link_check.py --fail-on-broken # CI mode
```

Fixes shipped here

First run found 70 broken local links. This PR fixes the 3-link cohort with a single root cause:

ADR Image Was Should be
ADR-022 Voltron / Porter-stemmer matcher `media/ADR-022/voltron-porter-stemmer.jpg` `media-ADR-022/voltron-porter-stemmer.jpg`
ADR-058 Polarity-axis overview `media/ADR-058/polarity_axis_documentation.png` `media-ADR-058/polarity_axis_documentation.png`
ADR-058 Polarity-axis before/after `media/ADR-058/polarity_axis_before_after.png` `media-ADR-058/polarity_axis_before_after.png`

All three pointed at `media/ADR-XXX/` (nested subdir convention) but the actual on-disk paths are `media-ADR-XXX/` (hyphen, flat). On-disk convention wins — 3 one-line link-target fixes.

Remaining 67 broken links

Not fixed in this PR — separate triage pass:

Pattern Examples
Renamed/moved guide `QUICKSTART.md`, `ARCHITECTURE.md`, `ARCHITECTURE_OVERVIEW.md`, `VOCABULARY_CATEGORIES.md`
Renamed ADR ADR-027 (user mgmt), ADR-016 (AGE migration), ADR-068 (source embeddings), ADR-076 (pathfinding)
Never-created ADR `ADR-071a-parallel-implementation-findings.md` (referenced 3×)
Deleted experimental dir `polarity-axis-analysis/` (referenced from ADR-070)
Deleted top-level file `RECURSIVE_UPSERT_ARCHITECTURE.md`

Each is a 1-line fix but the right destination needs per-link research, so they're left for a separate pass. The checker is not wired into CI yet — doing so before clearing the backlog would block every PR on these. Once the backlog is down, `--fail-on-broken` can be added to `.github/workflows/lint.yml`.

Test plan

  • `python3 scripts/development/lint/link_check.py` runs cleanly (exits 0 without `--fail-on-broken`, even with broken links).
  • Re-run after the 3 fixes drops the count from 70 → 67.
  • Visual smoke after merge: the Voltron image renders on the ADR-022 page.

aaronsb added 3 commits May 25, 2026 00:07
New scripts/development/lint/link_check.py walks docs/ (or any path
argument), parses every markdown link and image reference, and reports
local targets that don't resolve on disk. External URLs are counted but
not network-verified. Fenced code blocks are skipped so `[brackets](and
parens)` in examples don't fire false positives. Anchor-only links
(`#section`) are ignored — verifying them needs a heading-slug parser.

Usage:
    python3 scripts/development/lint/link_check.py            # scan docs/
    python3 scripts/development/lint/link_check.py docs/architecture
    python3 scripts/development/lint/link_check.py --fail-on-broken  # CI

First run found 70 broken local links across docs/. This PR fixes the
3-link cohort that shares the same root cause:

- ADR-022 Voltron image
- ADR-058 polarity-axis triangulation overview
- ADR-058 polarity-axis before/after comparison

All three pointed at `media/ADR-XXX/file.png` (nested subdir convention)
but the actual on-disk paths are `media-ADR-XXX/file.png` (hyphen, flat).
The on-disk convention wins — three 1-line link-target fixes.

The remaining 67 broken links are renamed/moved/deleted targets
(QUICKSTART.md, ARCHITECTURE.md, ADR-027, ADR-016, ADR-068, ADR-076,
ADR-071a, polarity-axis-analysis/, RECURSIVE_UPSERT_ARCHITECTURE.md,
etc.). Each is a 1-line fix but the right destination requires per-link
research, so they're left for a separate triage pass — not wired into CI
yet to avoid blocking PRs on the backlog.
… vendored .venv

Second pass after the initial 3 image fixes. Used the new link_check.py
report as input to a small fixer script that, for each broken local
target, finds candidates by basename under docs/ and applies the fix when
exactly one candidate exists.

- 37 fixes via basename-uniqueness: ADRs that moved into their category
  subdirs (authentication-security/, ai-embeddings/, database-schema/,
  vocabulary-relationships/, infrastructure/, user-interfaces/,
  query-search/, ingestion-content/) plus top-level docs that moved to
  reference/ (ARCHITECTURE_OVERVIEW.md, RECURSIVE_UPSERT_ARCHITECTURE.md)
  and guides/ (exploring.md, understanding-grounding.md).

- 7 fixes via historical-rename mapping (recorded inline in the fixer):
  ADR-027 authentication → user-management-api, ADR-031
  api-key-management → encrypted-api-key-storage, ADR-028 rbac →
  dynamic-rbac-system, ADR-032 → ADR-032.1 (split into .1/.2 over time),
  ADR-048 query-facade-namespace-safety → vocabulary-metadata-as-graph
  (renamed to reflect the ADR's primary subject).

- link_check.py: skip vendored directories (.venv, node_modules, .git,
  __pycache__, venv) so dependency package METADATA files don't surface
  as scan targets.

Broken count: 70 → 23 over the two passes.

The remaining 23 are deleted-and-not-replaced (QUICKSTART.md ×6,
ARCHITECTURE.md ×4, polarity-axis-analysis/ ×3, ../guides/README.md,
../deployment/README.md, ../installation.md, VOCABULARY_CATEGORIES.md)
or never-created (ADR-071a "parallel implementation findings" ×3,
ADR-068 unified-embedding-regeneration). Each needs source-context
judgment — rewrite the surrounding text vs remove the link vs create
the missing doc — so they're left for a separate triage.
Final pass — broken local link count: 23 → 0.

**Defensible redirects** (14 sites across 5 docs) for deleted-but-replaceable
targets, batched via the same fixer pattern as the earlier pass:

- ARCHITECTURE.md → reference/ARCHITECTURE_OVERVIEW.md (system-design entry
  point in the published mkdocs site)
- QUICKSTART.md → manual/01-getting-started/02-CLI_USAGE.md (first existing
  chapter of the getting-started manual; original QUICKSTART.md was deleted
  without a 1:1 replacement)
- using/README.md → manual/README.md ("Using the System" maps to the user
  manual)
- deployment/README.md → guides/DEPLOYMENT.md (deployment guide, which exists)
- ADR-068-unified-embedding-regeneration.md →
  ai-embeddings/ADR-068-source-text-embeddings.md (the actual ADR-068; the
  referenced title was a planning-stage name)

**Text rewrites** (8 sites) where no redirect target was honest:

- docs/README.md: drop VOCABULARY_CATEGORIES.md from the doc inventory and
  the New Users numbered list (guide was deleted; renumbered subsequent
  items).
- ADR-070 §References: prototype directory under features/polarity-axis-
  analysis/ was removed when the feature shipped — replaced the dead bullets
  with a pointer to api/app/lib/polarity_axis.py and the surviving research
  findings.
- ADR-071 §Actual Performance Results + Related ADRs: the planned standalone
  ADR-071a was never split out; findings live inline in this ADR. Removed
  the self-referential see-also link and the duplicate bullet, added a
  one-sentence note explaining the inlining.
- ADR-072 §Related: same — pointed to the inline #actual-performance-
  results-adr-071a anchor in ADR-071.
- macvlan-dedicated-ip.md §Related: dropped the link to the never-created
  installation.md; kept the prose describing what install.sh's SSL flags do.

**New stub:** docs/guides/README.md — small TOC for the guides directory,
grouped by topic. Closes features/README.md:45's "User Guides" link and
gives mkdocs awesome-pages a clean H1 title for the section.

**CI wiring:** new docs-link-check job in .github/workflows/lint.yml runs
the checker with --fail-on-broken. With the backlog at zero, any new PR
that introduces a broken local link will fail CI.
@aaronsb aaronsb merged commit 0a7bbe4 into main May 25, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant