fix(query-facade): promote execute_raw patterns to typed methods (#355) by aaronsb · Pull Request #413 · aaronsb/knowledge-graph-system

aaronsb · 2026-05-25T04:15:02Z

Closes #355.

Why

GraphQueryFacade (ADR-048) exposes a typed match_* / count_* DSL, but several hot-path callers reached for the execute_raw escape hatch for shapes the DSL didn't yet cover. Every escape-hatch call logs a WARNING — so a standard ingest cycle produced hundreds of false alarms and muddied the signal when something genuinely raw appeared.

What changed in the facade

DSL widened so the real call sites fit. All match_* methods now share the same signature: (where, params, limit, return_clause, order_by). Query construction follows the same order (MATCH → WHERE → RETURN → ORDER BY → LIMIT). Docstrings standardized to describe where / order_by as "without the keyword".

Method	Change
`match_concepts`	`+ order_by` (already had `return_clause`)
`match_sources`	`+ return_clause, order_by`
`match_vocab_types`	`+ return_clause, order_by`
`match_vocab_categories`	`+ limit, return_clause, order_by`
`match_instances`	`+ return_clause, order_by`
`count_sources`	new
`update_source_properties(source_id, props)`	new — first typed mutation primitive; SET clause derived from the property map
`get_vocab_type_stats()`	new — collapses the active/inactive aggregation

find_vocabulary_synonyms and match_concept_relationships stay as-is — they're DSL consumers with task-specific signatures, not primitives.

Callers migrated

workers/source_embedding_worker.py — all 10 escape-hatch sites moved to typed methods. Also fixes a latent bug where the synchronous fetch path indexed the AGE result positionally (source_data[1]) on what was actually a dict from RealDictCursor; switched to source_data[\"full_text\"].
launchers/vocab_consolidation.py — uses get_vocab_type_stats().

Left raw (intentionally)

lib/polarity_axis.py:discover_candidate_concepts — Cypher's variable-length pattern [*1..N] does not accept a bound parameter for the hop count; any typed wrapper would still have to string-substitute. Comment rewritten to explain why instead of just noting "namespace safety." Pole IDs are validated by validate_concept_id above.
routes/database.py admin raw-query endpoint — intentional raw surface per feat: promote execute_raw patterns to safe QueryFacade methods #355.

Test plan

9 new unit tests covering: return-clause + order-by composition for match_sources and parity peers; count_sources (with and without WHERE, empty-result fallback); update_source_properties (SET-clause construction, empty-props short-circuit, missing-match → None); get_vocab_type_stats (aggregate dict + empty case).
tests/test_query_facade.py — 46 passed, 1 skipped (integration, needs real DB).
Full tests/unit/ — 482 passed in ephemeral kg-api container. One unrelated error (test_provider_config.py needs POSTGRES_PASSWORD env var, pre-existing).
Docstring coverage on touched files: query_facade.py 22/23 (the missing one is pre-existing QueryAuditLog.__init__); the three caller files 4/4 + 10/10 + 4/4.
Post-merge: confirm the next ingest cycle no longer prints [QueryFacade] Using escape hatch ... warnings.

The GraphQueryFacade (ADR-048) exposes a `match_*` / `count_*` DSL but several hot-path callers reached for the `execute_raw` escape hatch for shapes the DSL didn't yet cover. Each escape-hatch call logs a WARNING, which made standard ingest cycles produce hundreds of false alarms and muddied the signal when something truly raw appeared. Surface widened so the DSL covers the real call sites: - `match_concepts/match_sources/match_vocab_types/match_vocab_categories/ match_instances` now share a consistent signature: `(where, params, limit, return_clause, order_by)`. Query construction follows the same MATCH → WHERE → RETURN → ORDER BY → LIMIT order; all docstrings now describe `where`/`order_by` as "without the keyword". - New `count_sources(where, params) -> int` mirrors `count_concepts` / `count_vocab_types`. - New `update_source_properties(source_id, properties)` — first typed mutation primitive. Builds a SET clause from the property map and returns the source_id on success (or None when the MATCH misses). - New `get_vocab_type_stats() -> {total, active, inactive}` collapses the vocab-spread aggregation into a single typed call. Callers migrated to the typed methods: - `workers/source_embedding_worker.py` — all 10 escape-hatch sites. Also fixes a latent bug where the synchronous fetch path indexed the result positionally (`source_data[1]`) on what was actually a dict from RealDictCursor; switched to `source_data["full_text"]`. - `launchers/vocab_consolidation.py` — uses `get_vocab_type_stats()`. Left as `execute_raw` with a clearer in-place rationale: - `lib/polarity_axis.py:discover_candidate_concepts` — Cypher's variable-length pattern `[*1..N]` does not accept a bound parameter for N, so any typed wrapper would still have to string-substitute the hop count. The pole IDs are validated by `validate_concept_id` above. - `routes/database.py` admin raw query endpoint — intentional raw surface per #355. Tests: 9 new unit tests covering the new and extended methods (return- clause + order_by composition, count_sources, update_source_properties including empty-props short-circuit and missing-match cases, vocab stats aggregation). Full unit suite: 482 passing in the ephemeral kg-api container; the 1 unrelated error (`test_provider_config.py` needs POSTGRES_PASSWORD env var, pre-existing).

Parametrized scan over every file in api/app/workers/. Two assertions per worker: 1. No facade.execute_raw(...) — the typed DSL on GraphQueryFacade is the only sanctioned shape. New patterns belong as new typed methods. 2. No age_client._execute_cypher(...) — that bypasses the facade audit log and namespace safety entirely. restore_worker.py is allowlisted for (2) with an inline rationale: it is the privileged dataset-authority worker; _clear_database issues DETACH DELETE against Concept/Source/Instance during a full-restore teardown with no parameter substitution. A facade wrapper would add no safety. The allowlist self-validates: if restore_worker.py ever stops calling _execute_cypher, the test fails and forces removal of the stale entry. Raw psycopg2 access to kg_api.* relational tables (job queue, proposals, artifact metadata, embedding storage) is explicitly out of scope — the facade is a graph DSL, not an ORM. 24 tests pass (12 workers × 2 checks).

aaronsb added 2 commits May 24, 2026 23:14

aaronsb merged commit 0722589 into main May 25, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(query-facade): promote execute_raw patterns to typed methods (#355)#413

fix(query-facade): promote execute_raw patterns to typed methods (#355)#413
aaronsb merged 2 commits into
mainfrom
fix/issue-355-promote-execute-raw

aaronsb commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aaronsb commented May 25, 2026

Why

What changed in the facade

Callers migrated

Left raw (intentionally)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant