fix(query-facade): promote execute_raw patterns to typed methods (#355)#413
Merged
Conversation
The GraphQueryFacade (ADR-048) exposes a `match_*` / `count_*` DSL but
several hot-path callers reached for the `execute_raw` escape hatch for
shapes the DSL didn't yet cover. Each escape-hatch call logs a WARNING,
which made standard ingest cycles produce hundreds of false alarms and
muddied the signal when something truly raw appeared.
Surface widened so the DSL covers the real call sites:
- `match_concepts/match_sources/match_vocab_types/match_vocab_categories/
match_instances` now share a consistent signature:
`(where, params, limit, return_clause, order_by)`. Query construction
follows the same MATCH → WHERE → RETURN → ORDER BY → LIMIT order; all
docstrings now describe `where`/`order_by` as "without the keyword".
- New `count_sources(where, params) -> int` mirrors `count_concepts` /
`count_vocab_types`.
- New `update_source_properties(source_id, properties)` — first typed
mutation primitive. Builds a SET clause from the property map and
returns the source_id on success (or None when the MATCH misses).
- New `get_vocab_type_stats() -> {total, active, inactive}` collapses the
vocab-spread aggregation into a single typed call.
Callers migrated to the typed methods:
- `workers/source_embedding_worker.py` — all 10 escape-hatch sites. Also
fixes a latent bug where the synchronous fetch path indexed the result
positionally (`source_data[1]`) on what was actually a dict from
RealDictCursor; switched to `source_data["full_text"]`.
- `launchers/vocab_consolidation.py` — uses `get_vocab_type_stats()`.
Left as `execute_raw` with a clearer in-place rationale:
- `lib/polarity_axis.py:discover_candidate_concepts` — Cypher's
variable-length pattern `[*1..N]` does not accept a bound parameter for
N, so any typed wrapper would still have to string-substitute the hop
count. The pole IDs are validated by `validate_concept_id` above.
- `routes/database.py` admin raw query endpoint — intentional raw surface
per #355.
Tests: 9 new unit tests covering the new and extended methods (return-
clause + order_by composition, count_sources, update_source_properties
including empty-props short-circuit and missing-match cases, vocab stats
aggregation). Full unit suite: 482 passing in the ephemeral kg-api
container; the 1 unrelated error (`test_provider_config.py` needs
POSTGRES_PASSWORD env var, pre-existing).
Parametrized scan over every file in api/app/workers/. Two assertions per worker: 1. No facade.execute_raw(...) — the typed DSL on GraphQueryFacade is the only sanctioned shape. New patterns belong as new typed methods. 2. No age_client._execute_cypher(...) — that bypasses the facade audit log and namespace safety entirely. restore_worker.py is allowlisted for (2) with an inline rationale: it is the privileged dataset-authority worker; _clear_database issues DETACH DELETE against Concept/Source/Instance during a full-restore teardown with no parameter substitution. A facade wrapper would add no safety. The allowlist self-validates: if restore_worker.py ever stops calling _execute_cypher, the test fails and forces removal of the stale entry. Raw psycopg2 access to kg_api.* relational tables (job queue, proposals, artifact metadata, embedding storage) is explicitly out of scope — the facade is a graph DSL, not an ORM. 24 tests pass (12 workers × 2 checks).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #355.
Why
GraphQueryFacade(ADR-048) exposes a typedmatch_* / count_*DSL, but several hot-path callers reached for theexecute_rawescape hatch for shapes the DSL didn't yet cover. Every escape-hatch call logs a WARNING — so a standard ingest cycle produced hundreds of false alarms and muddied the signal when something genuinely raw appeared.What changed in the facade
DSL widened so the real call sites fit. All
match_*methods now share the same signature:(where, params, limit, return_clause, order_by). Query construction follows the same order (MATCH → WHERE → RETURN → ORDER BY → LIMIT). Docstrings standardized to describewhere/order_byas "without the keyword".match_concepts+ order_by(already hadreturn_clause)match_sources+ return_clause, order_bymatch_vocab_types+ return_clause, order_bymatch_vocab_categories+ limit, return_clause, order_bymatch_instances+ return_clause, order_bycount_sourcesupdate_source_properties(source_id, props)get_vocab_type_stats()find_vocabulary_synonymsandmatch_concept_relationshipsstay as-is — they're DSL consumers with task-specific signatures, not primitives.Callers migrated
workers/source_embedding_worker.py— all 10 escape-hatch sites moved to typed methods. Also fixes a latent bug where the synchronous fetch path indexed the AGE result positionally (source_data[1]) on what was actually a dict fromRealDictCursor; switched tosource_data[\"full_text\"].launchers/vocab_consolidation.py— usesget_vocab_type_stats().Left raw (intentionally)
lib/polarity_axis.py:discover_candidate_concepts— Cypher's variable-length pattern[*1..N]does not accept a bound parameter for the hop count; any typed wrapper would still have to string-substitute. Comment rewritten to explain why instead of just noting "namespace safety." Pole IDs are validated byvalidate_concept_idabove.routes/database.pyadmin raw-query endpoint — intentional raw surface per feat: promote execute_raw patterns to safe QueryFacade methods #355.Test plan
match_sourcesand parity peers;count_sources(with and without WHERE, empty-result fallback);update_source_properties(SET-clause construction, empty-props short-circuit, missing-match → None);get_vocab_type_stats(aggregate dict + empty case).tests/test_query_facade.py— 46 passed, 1 skipped (integration, needs real DB).tests/unit/— 482 passed in ephemeral kg-api container. One unrelated error (test_provider_config.pyneedsPOSTGRES_PASSWORDenv var, pre-existing).query_facade.py22/23 (the missing one is pre-existingQueryAuditLog.__init__); the three caller files 4/4 + 10/10 + 4/4.[QueryFacade] Using escape hatch ...warnings.