docs: align documentation with code across all sections#415
Open
aaronsb wants to merge 26 commits into
Open
Conversation
Update pytest commands to use the canonical container-based invocation (./tests/run.sh and `docker exec kg-api-dev pytest ...`) and document the actual test markers and directory layout.
- INTEGRATION_TEST_PLAN.md: replace removed scripts (migrate-db.sh, configure-ai.sh, start-api.sh) with current operator.sh equivalents; fix psql user to match .env; fix client/ -> cli/ install path. - INTEGRATION_TEST_NOTES.md: update renamed cli paths (client/src/* -> cli/src/*) with historical "formerly" notes. - TEST_COVERAGE.md: document actual tests/ layout (api/, unit/, security/, manual/), flag empty placeholder test files, correct the security marker registration note.
Fix stale commands, container names, paths, and links across operating/ and the root docs/README.md to match the current operator.sh workflow.
Update docs/features/cli.md to match current kg commands and flags.
Fix stale paths, commands, and references in the first batch of guides reviewed; remaining guides will follow as the alignment review continues.
Update language/{README,specification,lifecycle,validation}.md to match
the current parser/validator implementation.
Update reference/{README,ARCHITECTURE_OVERVIEW,OPERATOR_ARCHITECTURE,
RECURSIVE_UPSERT_ARCHITECTURE,STORAGE-ARCHITECTURE}.md to match the
current api/, operator/, schema/, and storage implementations.
Update reference/api/{README,ADMIN-ENDPOINTS}.md and
reference/cli/commands/{admin,vocabulary}.md against current router
and command implementations.
Update features/{cli,rest-api,mcp-server}.md to match current API routes,
MCP tool definitions, and CLI command surface.
Continue alignment of docs/guides/ against current implementations.
Update manual/01-getting-started/03-INGESTION.md and manual/02-configuration/01-AI_PROVIDERS.md against current ingestion flow and provider implementations.
- Additional touch-ups to reference/ARCHITECTURE_OVERVIEW.md and
reference/README.md from the second alignment pass.
- Update reference/fuse/README.md against current fuse/ implementation.
- Remove reference/mcp/tools/{ingest-directory,ingest-file,inspect-file}.md
for MCP tools no longer present in the server.
Update epistemic-status references to the 7 current names (WELL_GROUNDED, MIXED_GROUNDING, etc.) and replace stale CLI flags.
Align introduction, getting-started, configuration, security/access, maintenance, and reference chapters with current code. Key fixes: - CLI usage: corrected ingest subcommands and flags (--no-approve, --parallel, --wait); added directory/image variants. - AI providers/extraction config: replaced stale script-based workflow with ./operator.sh + kg admin equivalents; corrected the "needs restart" claim for per-job extraction config. - Embedding config: documented real subcommand surface (create/activate/reload/export/protect/...). - MCP setup: fixed env var names to actual KG_OAUTH_CLIENT_* used by the MCP server. - Vocabulary consolidation: removed nonexistent --auto flag; restored --threshold documentation. - Auth / password recovery: pointed to actual operator/admin scripts and operator.sh init flow. - Backup/restore + migrations: replaced removed python -m src.admin.* invocations with kg admin / operator scripts; removed Neo4j refs. - Schema reference: corrected kg_auth tables, users.primary_role, and removed bogus default admin/admin credentials. - Examples + concepts + opencypher: replaced python cli.py and Neo4j references with kg CLI and Apache AGE equivalents.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Cross-cutting alignment of
docs/against the current code. 26 commits, 64 docs files touched, +1875 / -1462 lines. Treats code as ground truth; conceptual framing and authorial voice preserved.Approach
Eight parallel sub-agents were launched, each scoped to a disjoint slice of
docs/. Each agent read its docs, verified claims against the implementing code (routers, CLI commands, schema, scripts), and fixed verified drift in place. Edits committed per-section so the diff is reviewable as a sequence.Scope rules per agent:
Excluded by design (historical / dated content where drift is acceptable):
docs/architecture/— all ADRsdocs/research/— research snapshotsdocs/testing/{API_AUTH_AUDIT_*, SCHEMA_MIGRATION_TEST_REPORT, ADR-200-PHASE-3A-EXERCISE-REPORT}.md— dated audit reportsdocs/PRODUCT_REVIEW_2026-02-07.mddocs/guides/adr-{045-046, 068, 074}-*.md— historical ADR implementation plansdocs/manual/06-reference/{04-github_project_history, 08-DISTRIBUTED_SHARDING_RESEARCH}.mdChanges by section
testing (3 files)
TEST_COVERAGE.md: replaced venv-based pytest with containerized workflow (./tests/run.sh,docker exec kg-api-dev pytest); documented actualtests/layout (api/,unit/,security/,manual/); flagged empty placeholder test files; correctedsecuritymarker registration note.INTEGRATION_TEST_PLAN.md: replaced removed scripts (migrate-db.sh,configure-ai.sh,start-api.sh) with./operator.shequivalents; fixed psql user to match.env; fixedclient/→cli/.INTEGRATION_TEST_NOTES.md: updatedclient/src/*→cli/src/*with historical "formerly" notes.operating + deployment (8 files + root README)
./operator.sh backup,./operator.sh restore,--versionflag); pointed at the real operator-container scripts (/workspace/operator/database/{backup,restore,migrate}-database.sh).kg-*(only the prod overlay does).operator.sh get_compose_cmd.configure.pysubcommands (show,create-user,reset-password,list-users) with real ones (status,admin,embedding,models,oauth)..sqlto.dump(pg_dump custom format); fixed test-restore example to useapache/ageimage (AGE not inpostgres:16).kg-internal→knowledge-graph-network).kg auth login/kg config show(don't exist) withkg login/kg config get.docs/README.md: ADR count 96 → 107; corrected 00-introduction listing.features + concepts (3 files modified, 6 verified-current)
features/cli.md: replacedkg search details→kg search show;kg source show→kg source get;--format json→--json; fixedkg job approveinvocation; fixed config path (~/.config/kg-cli/→~/.config/kg/).features/rest-api.md: full endpoint-table rewrite — removed/auth/login,/ingest/file,/ingest/directory,/jobs/{id}/cancel; routed/search/*and/concepts/*to actual/query/*paths;/ontologies→/ontology(singular); replaced PUT with PATCH on graph CRUD.features/mcp-server.md: added missing Programs & Session Tools section (program,session_context,session_ingest,epoch).reference architecture (5 files)
ARCHITECTURE_OVERVIEW.md: dropped Phase 1/Phase 2 framing; rewrote topology to actual 5-container layout (kg-{postgres,garage,api,web,operator}); replaced "in-memory + SQLite queue" withPostgreSQLJobQueue(ADR-100); replaced placeholder auth with OAuth 2.0 + JWT (ADR-054); updated AI provider/model lists; corrected concept-matching to two-tier (0.85 strict / 0.75 label-boosted).OPERATOR_ARCHITECTURE.md: corrected container paths/app/operator/*→/workspace/operator/*; expandedoperator.shcommand matrix (versions,restart,query,garage,self-update,recert); updated file inventories.RECURSIVE_UPSERT_ARCHITECTURE.md: documented the two-tier matcher actually implemented.STORAGE-ARCHITECTURE.md: addedkg_logs.*schema; corrected Garage key patterns (sources/{ontology}/{hash32}.{ext}).reference/README.md:client/→cli/; expanded REST API tag list.reference: API/CLI/MCP/FUSE (10 files modified, 3 deleted, 24 verified-current)
api/ADMIN-ENDPOINTS.md: removed dead/auth/{login,token,device/code}; added/auth/oauth/*,/users/me*,/sources*,/jobs/{id}/stream,POST /users/{id}/reset-password.api/README.md: replaced JWT/API-key blurb with OAuth 2.0.cli/README.md: added missing top-level commands (mcp-config,document,source,vocabulary,concept,edge,batch,admin,polarity,projection,artifact,group,query-def,program,storage).cli/commands/admin.md: backup defaultjson→archive; added--confirmon restore; addedworkers/lanes(ADR-100); expanded embedding subcommand surface.cli/commands/vocabulary.md: removed deprecatedconfig-update; addedsearch; nestedprofiles {list,show,create,delete}.fuse/README.md: added missingkg-fuse resetreferences.mcp/README.md: addedsession_context,session_ingest.mcp/tools/{ingest-file,ingest-directory,inspect-file}.md— these became actions on the unifiedingesttool.guides (9 files modified, 8 verified-current, 1 skipped as research)
CLI_DEVELOPMENT.md:client/src/*→cli/src/*;JWT_EXPIRATION_MINUTES→ACCESS_TOKEN_EXPIRE_MINUTES.EPISTEMIC-STATUS-FILTERING.md: major drift fix — replacedAFFIRMATIVE/CONTESTED/CONTRADICTORY/UNCLASSIFIEDwith current 7-status set (WELL_GROUNDED,MIXED_GROUNDING,WEAK_GROUNDING,POORLY_GROUNDED,CONTRADICTED,HISTORICAL,INSUFFICIENT_DATA); updated thresholds and example values; fixedapi/api/lib/→api/app/lib/.understanding-grounding.md: same epistemic-status rewrite;kg concept details→kg search show.querying.md,exploring.md: fixed--type→--types; replaced fictitiouskg auth tokenwith real OAuth client-credentials flow; corrected MCP server path (mcp/dist/index.js→cli/dist/mcp-server.js).DEPLOYMENT.md: removed reference to non-existentoperator/setup/configure-db-profile.sh.DOCSTRING_COVERAGE.md: documented the--jsonflag.QUERY_SAFETY_BASELINE.md:scripts/lint_queries.py→scripts/development/lint/lint_queries.py.SCHEDULED-JOBS.md:api/api/launchers/*→api/app/launchers/*.VOCABULARY_LIFECYCLE.md: default--threshold 0.85→--threshold 0.90.language (5 files; all already in sync after first pass)
All five GraphProgram DSL docs (
README,specification,lifecycle,validation,security) were verified againstapi/app/{models,services,routes}/program*— implementation locations identified in the report. Fixes (test count → 109; V003 wording;POST /programs/executebody shape;DEFINITION_TYPESincludesprogram; ConditionalOp marked Implemented) were applied in earlier commits and re-verified.manual (16 files modified, 3 verified-current, 5 flagged as forward-looking)
kg job listoptions; fixedkg ingestsubcommands (--no-approve,--parallel,--wait); added missingingest directory/ingest imagesections../scripts/services/{start,stop}-api.shwith./operator.sh restart api; replaced./scripts/configure-ai.shwith./operator.sh ai-provider; corrected wrong "needs restart" claim — extraction config is loaded per-job; documented real embedding subcommand workflow (create→activate→reload).KG_OAUTH_CLIENT_{ID,SECRET}env vars (not bogusKG_USERNAME/KG_PASSWORD);cd client→cd cli.ag_catalog.system_api_keys→kg_api.system_api_keys; pointed password-recovery doc at./operator/admin/reset-password.sh.python -m src.admin.{backup,restore,prune,stitch,check_integrity}(~14 occurrences) withkg adminequivalents; removed Neo4j references.kg_authtable list (actual:users, roles, user_roles, resources, permissions, oauth_clients);users.role→users.primary_role; removed bogus "admin/admin" default credentials; addedplatform_adminrole.python cli.py(~10 occurrences) withkg search;-U postgres→-U admin; replaced "Neo4j Browser" with Apache AGE / web visualizer.Flagged for human triage (not auto-fixed)
Stale generated artifact
docs/reference/openapi.jsonis stale. Still advertises/auth/loginand usestokenUrl: /auth/login; missing/users/me*,/admin/workers/*,/auth/oauth/*token flows,/admin/embedding/export,/admin/storage/*,/auth/oauth/login-and-authorize. Needs regeneration from the running FastAPI app.Missing doc coverage (code has, docs don't)
commands/*.md:mcp-config,document,source,concept,edge,batch,polarity,projection,artifact,group,query-def,program,storage(listed incli/README.mdbut no per-command doc).ADMIN-ENDPOINTS.md:/admin/workers/*(ADR-100),/admin/storage/{health,stats,objects,...},/admin/models/*(ADR-800),/admin/providers/*, plus the full surface area of artifacts, grants, epochs, programs, query-definitions, edges, concepts, graph, documents, projection routers.tests/api/test_{concepts,edges,batch,database_counters,programs,ontology_routes,providers_routes}.pyexist on disk but contain zerodef test_*— placeholders. Now documented as such inTEST_COVERAGE.md.Code-side issues surfaced (out of scope for this docs PR)
cli/src/cli/search.tshelp text for--include-epistemicstill lists the old status names (AFFIRMATIVE / CONTESTED / CONTRADICTORY / HISTORICAL). API expects the new 7-status set. Likely a code-side fix.pytest.inimarkerslist omitssecurityeven thoughtests/security/exists; the marker is registered viatests/conftest.py.tests/api/test_ontology.pyis entirely@pytest.mark.skip("Requires database connection")— flagged in coverage table but skip reason may be stale.docs/guides/DEPLOYMENT.mduses low-leveloperator/lib/*.shscripts throughout. They still exist but./operator.sh init|start|stop|upgradeis now canonical. Future pass could switch examples over.Forward-looking / aspirational content (preserved intentionally)
docs/manual/02-configuration/05-LOCAL_INFERENCE_IMPLEMENTATION.md— Phase 1 done, Phases 2-4 planned. Framing is explicit; left as-is.docs/language/specification.md§2.2 metadata enrichment, §7 Text DSL — explicitly forward-looking.docs/guides/SEMANTIC_PATH_GRADIENTS.md— marked "Experimental"; placeholder SQL table names left intact.docs/reference/OPERATOR_ARCHITECTURE.md"Proposed Architecture (Future)" section — Option A is mostly implemented; could be retired in a follow-up.Commit boundaries
Sections committed individually so reviewers can skim by area. Some sections have multiple commits because in-flight agents added more edits after the first batch was committed. Final commit list:
Test plan
./operator.sh restart apiactually works (used as replacement for ~30scripts/services/*invocations)docs/reference/openapi.json(separate task)cli/src/cli/search.tsepistemic-status drift,pytest.inisecurity markerGenerated by Claude Code