Skip to content

docs: align documentation with code across all sections#415

Open
aaronsb wants to merge 26 commits into
mainfrom
claude/docs-code-alignment-XVN8b
Open

docs: align documentation with code across all sections#415
aaronsb wants to merge 26 commits into
mainfrom
claude/docs-code-alignment-XVN8b

Conversation

@aaronsb
Copy link
Copy Markdown
Owner

@aaronsb aaronsb commented May 25, 2026

Summary

Cross-cutting alignment of docs/ against the current code. 26 commits, 64 docs files touched, +1875 / -1462 lines. Treats code as ground truth; conceptual framing and authorial voice preserved.

Approach

Eight parallel sub-agents were launched, each scoped to a disjoint slice of docs/. Each agent read its docs, verified claims against the implementing code (routers, CLI commands, schema, scripts), and fixed verified drift in place. Edits committed per-section so the diff is reviewable as a sequence.

Scope rules per agent:

  • Docs only — no code edits.
  • Preserve conceptual framing; fix only verified factual drift.
  • When unsure, leave the doc alone and flag for human review.

Excluded by design (historical / dated content where drift is acceptable):

  • docs/architecture/ — all ADRs
  • docs/research/ — research snapshots
  • docs/testing/{API_AUTH_AUDIT_*, SCHEMA_MIGRATION_TEST_REPORT, ADR-200-PHASE-3A-EXERCISE-REPORT}.md — dated audit reports
  • docs/PRODUCT_REVIEW_2026-02-07.md
  • docs/guides/adr-{045-046, 068, 074}-*.md — historical ADR implementation plans
  • docs/manual/06-reference/{04-github_project_history, 08-DISTRIBUTED_SHARDING_RESEARCH}.md

Changes by section

testing (3 files)

  • TEST_COVERAGE.md: replaced venv-based pytest with containerized workflow (./tests/run.sh, docker exec kg-api-dev pytest); documented actual tests/ layout (api/, unit/, security/, manual/); flagged empty placeholder test files; corrected security marker registration note.
  • INTEGRATION_TEST_PLAN.md: replaced removed scripts (migrate-db.sh, configure-ai.sh, start-api.sh) with ./operator.sh equivalents; fixed psql user to match .env; fixed client/cli/.
  • INTEGRATION_TEST_NOTES.md: updated client/src/*cli/src/* with historical "formerly" notes.

operating + deployment (8 files + root README)

  • Removed nonexistent commands (./operator.sh backup, ./operator.sh restore, --version flag); pointed at the real operator-container scripts (/workspace/operator/database/{backup,restore,migrate}-database.sh).
  • Corrected container-naming assumptions: standalone install does NOT rename postgres/garage to kg-* (only the prod overlay does).
  • Rewrote compose-file selection table to match actual order in operator.sh get_compose_cmd.
  • Replaced fictitious configure.py subcommands (show, create-user, reset-password, list-users) with real ones (status, admin, embedding, models, oauth).
  • Backup/restore: switched from .sql to .dump (pg_dump custom format); fixed test-restore example to use apache/age image (AGE not in postgres:16).
  • Fixed wrong network name (kg-internalknowledge-graph-network).
  • Replaced kg auth login / kg config show (don't exist) with kg login / kg config get.
  • Root docs/README.md: ADR count 96 → 107; corrected 00-introduction listing.

features + concepts (3 files modified, 6 verified-current)

  • features/cli.md: replaced kg search detailskg search show; kg source showkg source get; --format json--json; fixed kg job approve invocation; fixed config path (~/.config/kg-cli/~/.config/kg/).
  • features/rest-api.md: full endpoint-table rewrite — removed /auth/login, /ingest/file, /ingest/directory, /jobs/{id}/cancel; routed /search/* and /concepts/* to actual /query/* paths; /ontologies/ontology (singular); replaced PUT with PATCH on graph CRUD.
  • features/mcp-server.md: added missing Programs & Session Tools section (program, session_context, session_ingest, epoch).
  • Concepts (3 files) verified unchanged — conceptual framing was correct.

reference architecture (5 files)

  • ARCHITECTURE_OVERVIEW.md: dropped Phase 1/Phase 2 framing; rewrote topology to actual 5-container layout (kg-{postgres,garage,api,web,operator}); replaced "in-memory + SQLite queue" with PostgreSQLJobQueue (ADR-100); replaced placeholder auth with OAuth 2.0 + JWT (ADR-054); updated AI provider/model lists; corrected concept-matching to two-tier (0.85 strict / 0.75 label-boosted).
  • OPERATOR_ARCHITECTURE.md: corrected container paths /app/operator/*/workspace/operator/*; expanded operator.sh command matrix (versions, restart, query, garage, self-update, recert); updated file inventories.
  • RECURSIVE_UPSERT_ARCHITECTURE.md: documented the two-tier matcher actually implemented.
  • STORAGE-ARCHITECTURE.md: added kg_logs.* schema; corrected Garage key patterns (sources/{ontology}/{hash32}.{ext}).
  • reference/README.md: client/cli/; expanded REST API tag list.

reference: API/CLI/MCP/FUSE (10 files modified, 3 deleted, 24 verified-current)

  • api/ADMIN-ENDPOINTS.md: removed dead /auth/{login,token,device/code}; added /auth/oauth/*, /users/me*, /sources*, /jobs/{id}/stream, POST /users/{id}/reset-password.
  • api/README.md: replaced JWT/API-key blurb with OAuth 2.0.
  • cli/README.md: added missing top-level commands (mcp-config, document, source, vocabulary, concept, edge, batch, admin, polarity, projection, artifact, group, query-def, program, storage).
  • cli/commands/admin.md: backup default jsonarchive; added --confirm on restore; added workers / lanes (ADR-100); expanded embedding subcommand surface.
  • cli/commands/vocabulary.md: removed deprecated config-update; added search; nested profiles {list,show,create,delete}.
  • fuse/README.md: added missing kg-fuse reset references.
  • mcp/README.md: added session_context, session_ingest.
  • Deleted: mcp/tools/{ingest-file,ingest-directory,inspect-file}.md — these became actions on the unified ingest tool.

guides (9 files modified, 8 verified-current, 1 skipped as research)

  • CLI_DEVELOPMENT.md: client/src/*cli/src/*; JWT_EXPIRATION_MINUTESACCESS_TOKEN_EXPIRE_MINUTES.
  • EPISTEMIC-STATUS-FILTERING.md: major drift fix — replaced AFFIRMATIVE/CONTESTED/CONTRADICTORY/UNCLASSIFIED with current 7-status set (WELL_GROUNDED, MIXED_GROUNDING, WEAK_GROUNDING, POORLY_GROUNDED, CONTRADICTED, HISTORICAL, INSUFFICIENT_DATA); updated thresholds and example values; fixed api/api/lib/api/app/lib/.
  • understanding-grounding.md: same epistemic-status rewrite; kg concept detailskg search show.
  • querying.md, exploring.md: fixed --type--types; replaced fictitious kg auth token with real OAuth client-credentials flow; corrected MCP server path (mcp/dist/index.jscli/dist/mcp-server.js).
  • DEPLOYMENT.md: removed reference to non-existent operator/setup/configure-db-profile.sh.
  • DOCSTRING_COVERAGE.md: documented the --json flag.
  • QUERY_SAFETY_BASELINE.md: scripts/lint_queries.pyscripts/development/lint/lint_queries.py.
  • SCHEDULED-JOBS.md: api/api/launchers/*api/app/launchers/*.
  • VOCABULARY_LIFECYCLE.md: default --threshold 0.85--threshold 0.90.

language (5 files; all already in sync after first pass)

All five GraphProgram DSL docs (README, specification, lifecycle, validation, security) were verified against api/app/{models,services,routes}/program* — implementation locations identified in the report. Fixes (test count → 109; V003 wording; POST /programs/execute body shape; DEFINITION_TYPES includes program; ConditionalOp marked Implemented) were applied in earlier commits and re-verified.

manual (16 files modified, 3 verified-current, 5 flagged as forward-looking)

  • Getting started: rewrote kg job list options; fixed kg ingest subcommands (--no-approve, --parallel, --wait); added missing ingest directory / ingest image sections.
  • Configuration: replaced 30+ instances of ./scripts/services/{start,stop}-api.sh with ./operator.sh restart api; replaced ./scripts/configure-ai.sh with ./operator.sh ai-provider; corrected wrong "needs restart" claim — extraction config is loaded per-job; documented real embedding subcommand workflow (createactivatereload).
  • Integration: fixed MCP Claude Desktop example to use actual KG_OAUTH_CLIENT_{ID,SECRET} env vars (not bogus KG_USERNAME/KG_PASSWORD); cd clientcd cli.
  • Security & access: corrected schema table ag_catalog.system_api_keyskg_api.system_api_keys; pointed password-recovery doc at ./operator/admin/reset-password.sh.
  • Maintenance: replaced python -m src.admin.{backup,restore,prune,stitch,check_integrity} (~14 occurrences) with kg admin equivalents; removed Neo4j references.
  • Schema reference: corrected kg_auth table list (actual: users, roles, user_roles, resources, permissions, oauth_clients); users.roleusers.primary_role; removed bogus "admin/admin" default credentials; added platform_admin role.
  • Examples / opencypher: replaced python cli.py (~10 occurrences) with kg search; -U postgres-U admin; replaced "Neo4j Browser" with Apache AGE / web visualizer.

Flagged for human triage (not auto-fixed)

Stale generated artifact

  • docs/reference/openapi.json is stale. Still advertises /auth/login and uses tokenUrl: /auth/login; missing /users/me*, /admin/workers/*, /auth/oauth/* token flows, /admin/embedding/export, /admin/storage/*, /auth/oauth/login-and-authorize. Needs regeneration from the running FastAPI app.

Missing doc coverage (code has, docs don't)

  • CLI commands without commands/*.md: mcp-config, document, source, concept, edge, batch, polarity, projection, artifact, group, query-def, program, storage (listed in cli/README.md but no per-command doc).
  • API endpoints undocumented in ADMIN-ENDPOINTS.md: /admin/workers/* (ADR-100), /admin/storage/{health,stats,objects,...}, /admin/models/* (ADR-800), /admin/providers/*, plus the full surface area of artifacts, grants, epochs, programs, query-definitions, edges, concepts, graph, documents, projection routers.
  • FUSE README is a module-level docstring dump; no user-facing command reference.
  • tests/api/test_{concepts,edges,batch,database_counters,programs,ontology_routes,providers_routes}.py exist on disk but contain zero def test_* — placeholders. Now documented as such in TEST_COVERAGE.md.

Code-side issues surfaced (out of scope for this docs PR)

  • cli/src/cli/search.ts help text for --include-epistemic still lists the old status names (AFFIRMATIVE / CONTESTED / CONTRADICTORY / HISTORICAL). API expects the new 7-status set. Likely a code-side fix.
  • pytest.ini markers list omits security even though tests/security/ exists; the marker is registered via tests/conftest.py.
  • tests/api/test_ontology.py is entirely @pytest.mark.skip("Requires database connection") — flagged in coverage table but skip reason may be stale.
  • docs/guides/DEPLOYMENT.md uses low-level operator/lib/*.sh scripts throughout. They still exist but ./operator.sh init|start|stop|upgrade is now canonical. Future pass could switch examples over.

Forward-looking / aspirational content (preserved intentionally)

  • docs/manual/02-configuration/05-LOCAL_INFERENCE_IMPLEMENTATION.md — Phase 1 done, Phases 2-4 planned. Framing is explicit; left as-is.
  • docs/language/specification.md §2.2 metadata enrichment, §7 Text DSL — explicitly forward-looking.
  • docs/guides/SEMANTIC_PATH_GRADIENTS.md — marked "Experimental"; placeholder SQL table names left intact.
  • docs/reference/OPERATOR_ARCHITECTURE.md "Proposed Architecture (Future)" section — Option A is mostly implemented; could be retired in a follow-up.

Commit boundaries

Sections committed individually so reviewers can skim by area. Some sections have multiple commits because in-flight agents added more edits after the first batch was committed. Final commit list:

a946cc4 docs(manual): final alignment pass across remaining chapters
a3d1343 docs(manual): additional security chapter fixes
cf2aebb docs(manual): align MCP setup, vocab consolidation, security
d42c9fd docs(guides): align exploring/querying/grounding with current statuses
aa4fe03 docs(reference): list missing CLI command docs in cli/README.md
9736e44 docs(manual): further embedding configuration fixes
baa79aa docs(guides): final querying.md alignment pass
48b5c7b docs(guides): further querying.md fixes
f659aa8 docs(manual): align embedding configuration chapter
e4f07c5 docs(guides): align querying.md with current query endpoints
5cab3ac docs(reference): additional alignment passes on arch and MCP README
55cd259 docs(manual): align extraction configuration / provider-switching
78108c1 docs(guides): align exploring.md with current query commands
9982773 docs(reference): follow-up arch updates + remove stale MCP tool docs
dc09bce docs(manual): align ingestion and AI providers chapters
e16a767 docs(language): additional spec fixes from second-pass review
93f3efb docs(guides): align query-safety, scheduled-jobs, vocabulary-lifecycle
f0740d4 docs(features): align rest-api and mcp-server with current code
1e46bc9 docs(reference): align API and CLI reference with current code
204cf85 docs(reference): align top-level reference architecture docs
6fa51f4 docs(language): align query language spec with implementation
162ae9d docs(guides): align CLI/deployment/docstring/epistemic guides
3b26ee2 docs(features): align CLI feature doc with current command surface
2064091 docs(operating): align operator/install docs with operator.sh
ec42c90 docs(testing): fix stale paths/commands in integration test docs
bb67c78 docs(testing): align TEST_COVERAGE.md with containerized workflow

Test plan

  • Reviewer skims each section's commit(s) for voice/conceptual fidelity
  • Verify spot-check: ./operator.sh restart api actually works (used as replacement for ~30 scripts/services/* invocations)
  • Decide on regenerating docs/reference/openapi.json (separate task)
  • Decide on creating issue(s) for: missing CLI/API doc coverage, cli/src/cli/search.ts epistemic-status drift, pytest.ini security marker

Generated by Claude Code

claude added 26 commits May 25, 2026 05:59
Update pytest commands to use the canonical container-based invocation
(./tests/run.sh and `docker exec kg-api-dev pytest ...`) and document
the actual test markers and directory layout.
- INTEGRATION_TEST_PLAN.md: replace removed scripts (migrate-db.sh,
  configure-ai.sh, start-api.sh) with current operator.sh equivalents;
  fix psql user to match .env; fix client/ -> cli/ install path.
- INTEGRATION_TEST_NOTES.md: update renamed cli paths
  (client/src/* -> cli/src/*) with historical "formerly" notes.
- TEST_COVERAGE.md: document actual tests/ layout (api/, unit/,
  security/, manual/), flag empty placeholder test files, correct
  the security marker registration note.
Fix stale commands, container names, paths, and links across operating/
and the root docs/README.md to match the current operator.sh workflow.
Update docs/features/cli.md to match current kg commands and flags.
Fix stale paths, commands, and references in the first batch of guides
reviewed; remaining guides will follow as the alignment review continues.
Update language/{README,specification,lifecycle,validation}.md to match
the current parser/validator implementation.
Update reference/{README,ARCHITECTURE_OVERVIEW,OPERATOR_ARCHITECTURE,
RECURSIVE_UPSERT_ARCHITECTURE,STORAGE-ARCHITECTURE}.md to match the
current api/, operator/, schema/, and storage implementations.
Update reference/api/{README,ADMIN-ENDPOINTS}.md and
reference/cli/commands/{admin,vocabulary}.md against current router
and command implementations.
Update features/{cli,rest-api,mcp-server}.md to match current API routes,
MCP tool definitions, and CLI command surface.
Continue alignment of docs/guides/ against current implementations.
Update manual/01-getting-started/03-INGESTION.md and
manual/02-configuration/01-AI_PROVIDERS.md against current ingestion
flow and provider implementations.
- Additional touch-ups to reference/ARCHITECTURE_OVERVIEW.md and
  reference/README.md from the second alignment pass.
- Update reference/fuse/README.md against current fuse/ implementation.
- Remove reference/mcp/tools/{ingest-directory,ingest-file,inspect-file}.md
  for MCP tools no longer present in the server.
Update epistemic-status references to the 7 current names
(WELL_GROUNDED, MIXED_GROUNDING, etc.) and replace stale CLI flags.
Align introduction, getting-started, configuration, security/access,
maintenance, and reference chapters with current code. Key fixes:

- CLI usage: corrected ingest subcommands and flags (--no-approve,
  --parallel, --wait); added directory/image variants.
- AI providers/extraction config: replaced stale script-based workflow
  with ./operator.sh + kg admin equivalents; corrected the "needs
  restart" claim for per-job extraction config.
- Embedding config: documented real subcommand surface
  (create/activate/reload/export/protect/...).
- MCP setup: fixed env var names to actual KG_OAUTH_CLIENT_* used by
  the MCP server.
- Vocabulary consolidation: removed nonexistent --auto flag; restored
  --threshold documentation.
- Auth / password recovery: pointed to actual operator/admin scripts
  and operator.sh init flow.
- Backup/restore + migrations: replaced removed python -m src.admin.*
  invocations with kg admin / operator scripts; removed Neo4j refs.
- Schema reference: corrected kg_auth tables, users.primary_role, and
  removed bogus default admin/admin credentials.
- Examples + concepts + opencypher: replaced python cli.py and Neo4j
  references with kg CLI and Apache AGE equivalents.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants