Skip to content

fix(query-facade): promote execute_raw patterns to typed methods (#355)#413

Merged
aaronsb merged 2 commits into
mainfrom
fix/issue-355-promote-execute-raw
May 25, 2026
Merged

fix(query-facade): promote execute_raw patterns to typed methods (#355)#413
aaronsb merged 2 commits into
mainfrom
fix/issue-355-promote-execute-raw

Conversation

@aaronsb
Copy link
Copy Markdown
Owner

@aaronsb aaronsb commented May 25, 2026

Closes #355.

Why

GraphQueryFacade (ADR-048) exposes a typed match_* / count_* DSL, but several hot-path callers reached for the execute_raw escape hatch for shapes the DSL didn't yet cover. Every escape-hatch call logs a WARNING — so a standard ingest cycle produced hundreds of false alarms and muddied the signal when something genuinely raw appeared.

What changed in the facade

DSL widened so the real call sites fit. All match_* methods now share the same signature: (where, params, limit, return_clause, order_by). Query construction follows the same order (MATCH → WHERE → RETURN → ORDER BY → LIMIT). Docstrings standardized to describe where / order_by as "without the keyword".

Method Change
match_concepts + order_by (already had return_clause)
match_sources + return_clause, order_by
match_vocab_types + return_clause, order_by
match_vocab_categories + limit, return_clause, order_by
match_instances + return_clause, order_by
count_sources new
update_source_properties(source_id, props) new — first typed mutation primitive; SET clause derived from the property map
get_vocab_type_stats() new — collapses the active/inactive aggregation

find_vocabulary_synonyms and match_concept_relationships stay as-is — they're DSL consumers with task-specific signatures, not primitives.

Callers migrated

  • workers/source_embedding_worker.py — all 10 escape-hatch sites moved to typed methods. Also fixes a latent bug where the synchronous fetch path indexed the AGE result positionally (source_data[1]) on what was actually a dict from RealDictCursor; switched to source_data[\"full_text\"].
  • launchers/vocab_consolidation.py — uses get_vocab_type_stats().

Left raw (intentionally)

  • lib/polarity_axis.py:discover_candidate_concepts — Cypher's variable-length pattern [*1..N] does not accept a bound parameter for the hop count; any typed wrapper would still have to string-substitute. Comment rewritten to explain why instead of just noting "namespace safety." Pole IDs are validated by validate_concept_id above.
  • routes/database.py admin raw-query endpoint — intentional raw surface per feat: promote execute_raw patterns to safe QueryFacade methods #355.

Test plan

  • 9 new unit tests covering: return-clause + order-by composition for match_sources and parity peers; count_sources (with and without WHERE, empty-result fallback); update_source_properties (SET-clause construction, empty-props short-circuit, missing-match → None); get_vocab_type_stats (aggregate dict + empty case).
  • tests/test_query_facade.py — 46 passed, 1 skipped (integration, needs real DB).
  • Full tests/unit/ — 482 passed in ephemeral kg-api container. One unrelated error (test_provider_config.py needs POSTGRES_PASSWORD env var, pre-existing).
  • Docstring coverage on touched files: query_facade.py 22/23 (the missing one is pre-existing QueryAuditLog.__init__); the three caller files 4/4 + 10/10 + 4/4.
  • Post-merge: confirm the next ingest cycle no longer prints [QueryFacade] Using escape hatch ... warnings.

aaronsb added 2 commits May 24, 2026 23:14
The GraphQueryFacade (ADR-048) exposes a `match_*` / `count_*` DSL but
several hot-path callers reached for the `execute_raw` escape hatch for
shapes the DSL didn't yet cover. Each escape-hatch call logs a WARNING,
which made standard ingest cycles produce hundreds of false alarms and
muddied the signal when something truly raw appeared.

Surface widened so the DSL covers the real call sites:

- `match_concepts/match_sources/match_vocab_types/match_vocab_categories/
   match_instances` now share a consistent signature:
   `(where, params, limit, return_clause, order_by)`. Query construction
   follows the same MATCH → WHERE → RETURN → ORDER BY → LIMIT order; all
   docstrings now describe `where`/`order_by` as "without the keyword".
- New `count_sources(where, params) -> int` mirrors `count_concepts` /
  `count_vocab_types`.
- New `update_source_properties(source_id, properties)` — first typed
  mutation primitive. Builds a SET clause from the property map and
  returns the source_id on success (or None when the MATCH misses).
- New `get_vocab_type_stats() -> {total, active, inactive}` collapses the
  vocab-spread aggregation into a single typed call.

Callers migrated to the typed methods:

- `workers/source_embedding_worker.py` — all 10 escape-hatch sites. Also
  fixes a latent bug where the synchronous fetch path indexed the result
  positionally (`source_data[1]`) on what was actually a dict from
  RealDictCursor; switched to `source_data["full_text"]`.
- `launchers/vocab_consolidation.py` — uses `get_vocab_type_stats()`.

Left as `execute_raw` with a clearer in-place rationale:

- `lib/polarity_axis.py:discover_candidate_concepts` — Cypher's
  variable-length pattern `[*1..N]` does not accept a bound parameter for
  N, so any typed wrapper would still have to string-substitute the hop
  count. The pole IDs are validated by `validate_concept_id` above.
- `routes/database.py` admin raw query endpoint — intentional raw surface
  per #355.

Tests: 9 new unit tests covering the new and extended methods (return-
clause + order_by composition, count_sources, update_source_properties
including empty-props short-circuit and missing-match cases, vocab stats
aggregation). Full unit suite: 482 passing in the ephemeral kg-api
container; the 1 unrelated error (`test_provider_config.py` needs
POSTGRES_PASSWORD env var, pre-existing).
Parametrized scan over every file in api/app/workers/. Two assertions per
worker:

1. No facade.execute_raw(...) — the typed DSL on GraphQueryFacade is the
   only sanctioned shape. New patterns belong as new typed methods.
2. No age_client._execute_cypher(...) — that bypasses the facade audit
   log and namespace safety entirely.

restore_worker.py is allowlisted for (2) with an inline rationale: it is
the privileged dataset-authority worker; _clear_database issues DETACH
DELETE against Concept/Source/Instance during a full-restore teardown
with no parameter substitution. A facade wrapper would add no safety.

The allowlist self-validates: if restore_worker.py ever stops calling
_execute_cypher, the test fails and forces removal of the stale entry.

Raw psycopg2 access to kg_api.* relational tables (job queue, proposals,
artifact metadata, embedding storage) is explicitly out of scope — the
facade is a graph DSL, not an ORM.

24 tests pass (12 workers × 2 checks).
@aaronsb aaronsb merged commit 0722589 into main May 25, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: promote execute_raw patterns to safe QueryFacade methods

1 participant