Skip to content

feat(mcp): surface run_id in run-backed tool results for citation (DRC-3532)#1418

Merged
kentwelcome merged 10 commits into
mainfrom
feature/drc-3532-mcp-expose-run-id
Jun 29, 2026
Merged

feat(mcp): surface run_id in run-backed tool results for citation (DRC-3532)#1418
kentwelcome merged 10 commits into
mainfrom
feature/drc-3532-mcp-expose-run-id

Conversation

@iamcxa

@iamcxa iamcxa commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Closes DRC-3532.

Why (cross-repo, DRC-3532)

The recce-cloud summary agent wants to deep-link each factual statement in its PR
summary to the exact run it executed, via deterministic {{run:<run_id>}} inline
markers (server-side marker replacement already exists in recce-cloud-infra's
recce_task_func). But the agent never received the run_id: the cloud-backend
MCP tools that back the agent's analysis (row_count_diff, profile_diff,
value_diff, query, query_diff, top_k_diff, histogram_diff) route through
_tool_run_backed, which returned only run["result"] and dropped run_id.

Without run_id the agent cannot cite runs deterministically, forcing fragile
server-side fuzzy prose matching (low/unstable coverage, occasional wrong links).

What

Cloud-backend path (_tool_run_backed)

_tool_run_backed now merges run_id into the result dict (additive — existing
result fields preserved). Only added when the response actually carries a run_id
and the result is a dict, so run-less or non-dict responses are untouched (the
agent must never be handed, or synthesize, a run_id it was not given).

Local-mode path (_tool_run_backed_local, DRC-3634)

The larger half of the mcp_server.py diff: in local mode, row_count_diff,
query, query_diff, profile_diff, and value_diff are rerouted through a new
_tool_run_backed_local. When an in-process RecceContext is available
(default_context()), it executes via the same submit_run machinery as the
recce server's run_router, so the Run is persisted to default_context().runs
(and therefore included in the S3-exported state the infra comment-builder reads)
and the returned dict carries a top-level run_id. When default_context() is
None (the cloud-backend SSE path), it falls back to plain in-executor execution
with no run_id — identical to pre-DRC-3634 behaviour.

Error semantics: submit_run.fn records run.status=FAILED / run.error then
returns None on any non-DuckDBExternalAccessBlocked exception, so the awaited
future resolves cleanly. _tool_run_backed_local inspects run.status and
re-raises RecceException on a FAILED run, so call_tool restores
isError=True and _classify_db_error can classify the failure — instead of
collapsing a failed run into a success-shaped {"run_id": ...}. The persisted
FAILED Run stays in context.runs for citation. The 5 run-backed tool
descriptions document the additive top-level run_id field (MCP tool
description = LLM agent contract).

Test

tests/test_mcp_cloud_backend.py:

  • new: run-backed result surfaces run_id (merged, fields preserved)
  • new: run-less response leaves the result untouched (no invented run_id)
  • updated: the existing run-tool routing test now expects the surfaced run_id

tests/test_mcp_server.py (TestLocalModeRunBacked):

  • run-backed local mode surfaces run_id for each of the 5 tools; out-of-scope
    tools (e.g. value_diff_detail) do not
  • new: a failing run surfaces the error (message preserved for
    _classify_db_error), NOT a success-shaped {"run_id": ...}
  • new: a failing run still persists the FAILED Run to context.runs for citation

137 local mcp_server tests + 40 cloud-backend tests green.

Paired change

recce-cloud-infra: the summary agent prompt emits {{run:<run_id>}} markers
using this run_id; fuzzy linkify is demoted to a legacy fallback. Tracked under
DRC-3532. Durable structured-citation design is DRC-3634.

🤖 Generated with Claude Code


Consumed by (cross-repo)

recce-cloud-infra PR DataRecce/recce-cloud-infra#1427 (Summary v2: trust architecture — value-grounding + run citations) depends on this PR. The instance-launcher image bakes the recce build from this branch; #1427's run-citation deep links ([value](…/runs/<id>)) and profile_diff-grounded row counts require the changes here. Suggested merge order: this PR → recce release/image → #1427.

Also on this branch — profile_diff lowercase-column fix (DRC-3674 B)

Commit 889a01b7 (fix(profile): resolve lowercase columns to physical names on case-folding warehouses): ProfileDiffTask / ProfileTask filtered requested columns with a case-sensitive column.name in selected_columns test. On Snowflake (physical names UPPERCASE) the agent's lowercase column names matched nothing → the filter dropped every column → empty profile → the cloud summary suppressed the whole analysis. Fix lowercases both sides of the membership test (no-op on duckdb; SQL path unchanged). Regression test reproduces the empty-profile case with an uppercase-header duckdb table. This unblocks #1427's CLV/profile grounding (which was intermittently empty in UAT depending on the LLM's column casing).

…C-3532)

The cloud-backend MCP tools (row_count_diff, profile_diff, value_diff, query,
query_diff, top_k_diff, histogram_diff) routed through _tool_run_backed returned
only run["result"], dropping run_id. The recce-cloud summary agent therefore
never saw the run_id and could not emit deterministic {{run:<run_id>}} citation
markers, forcing fragile server-side fuzzy prose matching.

Merge run_id into the result dict (additive; existing result fields preserved).
Only added when the response carries a run_id and the result is a dict, so
run-less or non-dict responses are untouched (never synthesize a run_id).

Cross-repo (DRC-3532): the recce-cloud summary agent prompt is updated to emit
the markers; server-side marker replacement already exists in recce_task_func.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Kent <iamcxa@gmail.com>
@codecov

codecov Bot commented Jun 3, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 94.58333% with 13 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
recce/mcp_server.py 70.45% 13 Missing ⚠️
Files with missing lines Coverage Δ
recce/tasks/profile.py 83.33% <100.00%> (+0.28%) ⬆️
tests/tasks/test_profile.py 99.13% <100.00%> (+0.21%) ⬆️
tests/test_mcp_cloud_backend.py 100.00% <100.00%> (ø)
tests/test_mcp_server.py 99.87% <100.00%> (+0.01%) ⬆️
recce/mcp_server.py 90.45% <70.45%> (-1.08%) ⬇️

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

iamcxa and others added 2 commits June 13, 2026 00:05
…s state

Local-mode ad-hoc diff tools (row_count_diff, profile_diff, query,
query_diff, value_diff) now route through _tool_run_backed_local, which
uses the same submit_run machinery as the recce server's run_router to
persist Runs to default_context().runs.  The tool result carries run_id
alongside the diff output, matching the CloudBackend._tool_run_backed
shape (commit 98670f9).

When default_context() is None (cloud SSE path already handled by
CloudBackend), the method falls back to direct task execution without
run_id — identical to pre-DRC-3634 behaviour, no double execution.

The MCP server remains the sole state exporter; runs created mid-session
via _tool_run_backed_local are included in the exported payload because
they land in the same in-process RecceContext.runs list that
export_state() serialises.

Tests: 12 new tests in TestLocalModeRunBacked covering run_id presence,
run persistence in context.runs, run type mapping (query/query_base), and
the isolation boundary (value_diff_detail stays outside run-backed scope).
175 MCP tests green (135 server + 40 cloud backend); ruff clean.

Signed-off-by: Kent <iamcxa@gmail.com>
…ding warehouses

profile_diff/profile filtered the requested `columns` with a case-sensitive
membership test (`column.name in selected_columns`). get_columns() returns
physical catalog names, which case-folding warehouses (e.g. Snowflake) store
UPPERCASE, while the Recce Cloud summary agent supplies lowercase
manifest-convention names. The exact-case filter dropped every column, so the
per-column profiling loop never ran and the tool returned a completely empty
result ({"base":{"columns":[],"data":[]},"current":{"columns":[],"data":[]}}).
This made the cloud summary non-deterministic: lowercase column names yielded an
empty profile that the downstream trust gate suppressed.

Fix: lowercase both sides of the membership test before filtering, mirroring the
case-insensitive normalisation in valuediff's _build_column_case_lookup. The SQL
path is unchanged — it still profiles via the physical column.name — so lowercase
input now resolves to the same physical (uppercase) columns and returns the same
non-empty profile as uppercase input. No-op on already-lowercase duckdb models and
on quoted/case-sensitive identifiers; applied to both ProfileDiffTask and
ProfileTask.

Adds regression tests using an UPPERCASE duckdb CSV header (duckdb preserves
header casing, reproducing Snowflake physical-name casing end-to-end with no
mocking): asserts lowercase and uppercase column requests return identical
non-empty profiles, plus an adapter-safety no-op test on lowercase physical names.

DRC-3674

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Kent <iamcxa@gmail.com>
@iamcxa iamcxa requested a review from kentwelcome June 15, 2026 04:05
iamcxa added 2 commits June 15, 2026 12:08
…sertions

test_tool_row_count_diff and test_tool_query_with_base_flag asserted exact
equality with the mock result, but run-backed local mode (DRC-3532, _tool_run_backed_local)
now merges a run_id key into the tool result. Compare on the result fields
ignoring run_id; the dedicated TestLocalModeRunBacked tests cover run_id surfacing.
Fixes the Test DBT Versions CI failure on PR #1418. 135 test_mcp_server tests green.

Signed-off-by: Kent <iamcxa@gmail.com>

@kentwelcome kentwelcome left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review: 1 blocker (silent error swallowing in local run-backed tools), 2 issues. See review comment.

@kentwelcome

kentwelcome commented Jun 15, 2026

Copy link
Copy Markdown
Member

Code Review: PR #1418

SHA 2e8dd51d · Verdict NO-GO

Blockers

  1. recce/mcp_server.py:613-624 — Failed runs return a success-shaped result, silently swallowing the error.
    Evidence: submit_run.fn (recce/apis/run_func.py:245-258) catches every exception except DuckDBExternalAccessBlocked, stores it in run.error, and return None — so await asyncio.wrap_future(future) (line 611) never raises. _tool_run_backed_local then reads only run.result (None → {}) and never inspects run.status/run.error, returning {"run_id": ...}. Before this PR, task.execute() raised and call_tool re-raised (mcp_server.py:1473) so the MCP SDK set isError=True. Now a failing query/query_diff/profile_diff/value_diff/row_count_diff in local mode (bad SQL, missing model, permission denied) returns an empty success — the agent loses both the error and the _classify_db_error classification (TABLE_NOT_FOUND / PERMISSION_DENIED / SYNTAX_ERROR), and a citation agent can report "empty/no diff" for a query that actually errored.
    Pass D.

    Concrete example — agent calls query_diff with a non-existent column:

    query_diff({"sql_template": "SELECT bad_col FROM {{ ref('orders') }}", "primary_keys": ["id"]})
    
    • Before: QueryDiffTask.execute() raises duckdb BinderException('Referenced column "bad_col" not found'); call_tool re-raises → MCP response isError=True carrying the message. Agent knows it failed.
    • After: submit_run.fn catches the BinderException (it is not DuckDBExternalAccessBlocked), sets run.status=FAILED / run.error="Referenced column ... not found" / run.result=None, and returns None. Future resolves cleanly; _tool_run_backed_local maps run.result is None → {} and returns a success result:
      { "run_id": "a1b2c3d4-..." }
      No data, no error. The summary agent reads this as "ran fine, empty diff" and can emit {{run:a1b2c3d4}} citing a run that actually errored.

    Exact culprit lines (the failure is the absence of a status check across three lines):

    609    run, future = _submit_run_fn(run_type, params, triggered_by="recce_ai")
    610    # Await the executor future so run.result is populated before we return.
    611    await asyncio.wrap_future(future)        # <-- discards return value, never raises:
                                                    #     submit_run.fn returns None on error,
                                                    #     so the future resolves "successfully"
    612    # Normalise run.result to a plain dict (tasks may return Pydantic models).
    613    raw_result = run.result
    614    if raw_result is None:                    # <-- FAILED run (result=None, msg in run.error)
    615        result: Dict[str, Any] = {}           #     collapses into empty {}, status/error ignored
    ...
    624    return {**result, "run_id": str(run.run_id)}   # <-- success-shaped dict returned unconditionally
    • Line 611 awaits but never raises — the error signal is lost here (the line 610 comment assumes success).
    • Lines 614-615 treat a FAILED run identically to a genuinely-empty result; nothing reads run.status/run.error.
    • Line 624 returns {"run_id": ...} regardless of failure.

    Lines 614-615 are not wrong in isolation — a successful task can legitimately return empty. The bug is that nothing distinguishes "succeeded empty" from "failed," because line 611 already swallowed that signal.

    Suggested fix — add a guard between 611 and 613:

    await asyncio.wrap_future(future)
    if run.status == RunStatus.FAILED:
        raise RecceException(run.error or f"{run_type} run failed")
    raw_result = run.result

    This re-raises so call_tool (mcp_server.py:1450-1474) restores isError=True, while still leaving the persisted Run in the context for citation.

Issues

  1. recce/mcp_server.py (tool descriptions) — The 5 run-backed tools now return a top-level run_id, but none of their Tool(...description=...) blocks document it.
    Evidence: grep run_id over the descriptions finds only run_check (line 1207, unrelated). CLAUDE.md → "MCP tool description = LLM agent contract. Description MUST match actual response format" and "Prefer additive (_meta) over modifying existing field types." A new top-level field is a response-shape change the contract doesn't declare.
    Pass C.

  2. PR description omits the DRC-3634 local-mode rewrite — the largest change in the diff.
    Evidence: the "What" section describes only CloudBackend._tool_run_backed merging run_id (additive). The actual diff reroutes row_count_diff/query/query_diff/profile_diff/value_diff through a new _tool_run_backed_local that executes via submit_run, persists Runs to the in-process context, and changes return shape + error semantics. That behavior change is documented only in inline # DRC-3634 comments — exactly where the Blocker hides.
    Pass F.

Notes

  • tests/test_mcp_server.py::TestLocalModeRunBacked covers only happy paths (every task.execute is mocked to return a model). No test exercises a raising task, so the error-swallowing regression above is uncaught by the suite. Adding a "task raises → tool surfaces error (isError), not {run_id}" case would have flagged it. Pass E.
  • _tool_run_backed_local fallback branch (default_context() is None, lines 591-606) preserves the old raise-on-error behavior, but in live local mode default_context() is set, so the swallowing path (lines 609-624) is the one that runs.

Limits

  • Local environment is missing the mcp and dbt modules (pytest.importorskip("mcp") skipped the MCP suites; tests/tasks conftest fails on No module named 'dbt'). Could not execute the test suite here — author reports 40 cloud-backend + 123 mcp_server green; CI gates. Findings above are from static tracing, not a failed test run.

…DRC-3634)

Address PR #1418 review (Claude Code Review, NO-GO).

Blocker -- _tool_run_backed_local swallowed failed local runs. submit_run.fn
records run.status=FAILED / run.error then returns None on every exception
except DuckDBExternalAccessBlocked, so `await asyncio.wrap_future(future)`
resolves cleanly and a FAILED run (result=None) collapsed into a
success-shaped {"run_id": ...}. A failing query / query_diff / profile_diff /
value_diff / row_count_diff (bad SQL, missing model, permission denied) thus
returned an empty success -- losing the error and its _classify_db_error
classification, and letting a citation agent report "no diff" for a run that
actually errored. Add a `run.status == FAILED` guard that re-raises
RecceException so call_tool restores isError=True, while the already-persisted
FAILED run stays in context.runs for citation. Regression tests cover the
raise (message preserved) and the persistence.

Issue 1 -- the 5 run-backed tool descriptions now document the additive
top-level run_id field (MCP tool description = LLM agent contract).

Signed-off-by: Kent <iamcxa@gmail.com>
@iamcxa

iamcxa commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

@kentwelcome All review points addressed in d45ea317.

Blocker — failed local runs swallowed. Added a run.status == RunStatus.FAILED guard in _tool_run_backed_local immediately after await asyncio.wrap_future(future). It re-raises RecceException(run.error or ...) so call_tool restores isError=True and _classify_db_error still classifies the failure (the original message is preserved in run.error). The already-persisted FAILED Run stays in context.runs for citation, so no run is lost.

Issue 1 — undocumented run_id. All 5 run-backed tool descriptions (row_count_diff, query, query_diff, profile_diff, value_diff) now document the additive top-level run_id field and the {{run:<run_id>}} citation use.

Notes — happy-path-only tests. Added two TestLocalModeRunBacked regression tests: (1) a raising task surfaces the error (message preserved), not a success-shaped {"run_id": ...}; (2) a failing run still persists the FAILED Run to context.runs.

Issue 2 — PR description. Updated the description to document the _tool_run_backed_local local-mode rewrite (the larger half of the diff) and its error semantics.

Re: Limits — could not run the suite. Ran it locally with mcp/dbt available: tests/test_mcp_server.py 137 passed (including the two new failure-path tests), ruff check clean. The previously-uncovered "task raises → tool surfaces error, not {run_id}" path is now exercised.

@iamcxa iamcxa requested a review from kentwelcome June 26, 2026 16:29
@iamcxa iamcxa requested a review from wcchang1115 June 26, 2026 16:46

@wcchang1115 wcchang1115 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: PR #1418 (incremental)

SHA 53f05c96 · Verdict NO-GO · Incremental (2e8dd51d → HEAD)

Scope: branch was rebased since the anchor; PR-authored work since the last review touches recce/mcp_server.py + the two MCP test files only.

Blockers

None.

Issues

  1. recce/mcp_server.py — Rationale comments duplicate the _tool_run_backed_local docstring (mcp_server.py:606) instead of living there once:

    • :1734,1741,1753,1760,1767 — the same 2-line # DRC-3634: route through _tool_run_backed_local … comment is copy-pasted verbatim at all 5 run-backed call sites. The call await self._tool_run_backed_local(...) is self-documenting; editing the rationale means updating 6 places or leaving 5 stale.
    • :663 — the # Surface run_id … matching CloudBackend shape (commit 98670f9e) comment restates the docstring and narrates the obvious next line.

    Fix: delete all 6; let the docstring be the single source of truth. Keep the distinct base→run_type mapping comment in _tool_query. Removes ~12 lines, no technical content lost.

Notes

  1. recce/mcp_server.py:51VIEW_ALL_MAX_NODES comment claims the cap makes the result "cannot exceed" the 25k-token SDK cap, but only node count is bounded, not edges. The claim is technically false. Fix is wording only — e.g. "node-count bounded; edge count assumed to scale with nodes (holds for sparse dbt DAGs)". Do not add edge capping: disproportionate change, the trigger (~44k edges on 300 nodes) doesn't occur in dbt DAGs, and dropping edges would orphan nodes into false roots — worse than the current honest output.

Verification: pytest tests/test_mcp_server.py tests/test_mcp_cloud_backend.py → 180 passed.

Issue 1 is the only NO-GO driver and is follow-up-acceptable (zero runtime risk; pure maintainability).

@iamcxa iamcxa left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iamcxa Code review for PR #1418 at 53f05c96.

Verdict: NO-GO as-is. CI is green, but I found two concrete response-shape / selection regressions that should be fixed before merge.

Findings

  1. recce/mcp_server.py line 202, and local mirror at line 665 — top-level run_id can overwrite real row-count data.

Both cloud and local paths merge citation metadata with {**result, "run_id": ...}. That is not strictly additive for row_count_diff, whose response is a model-name-keyed dict. If a project has a dbt model named run_id, its row-count result is replaced by the UUID string, so the MCP response silently drops actual diff data while claiming existing result fields are preserved.

Suggested fix: add a collision regression test and choose a non-colliding shape for citation metadata, or otherwise preserve an existing run_id result key instead of overwriting it.

  1. recce/tasks/profile.py line 193, and ProfileTask line 301 — case-insensitive matching broadens exact selections on case-sensitive schemas.

The new filter lowercases every physical column name before matching. That fixes Snowflake-style uppercase physical names, but it regresses quoted/case-sensitive schemas with distinct columns such as name and Name: requesting name now profiles both columns. The new test comment says this is a no-op on quoted/case-sensitive identifiers, but the implementation does not preserve exact-case semantics.

Suggested fix: resolve exact matches first, then fall back to case-insensitive matching only when the lowercase key maps unambiguously to one physical column. Add a regression test with case-distinct physical columns.

Advisory test hardening

  • Add cloud coverage for a non-dict result, for example {"run_id": "run-xyz", "result": [{"ok": true}]}, and assert the list is returned unchanged.
  • Parameterize the local out-of-scope boundary test for value_diff_detail, top_k_diff, and histogram_diff, since tool descriptions only promise run_id for the five rerouted local tools.
  • Consider one handler-level failed-run test that invokes the MCP call_tool handler and asserts the response is isError while the FAILED Run remains persisted.

Verification

  • Freshness: live PR head was rechecked immediately before posting and remained 53f05c96de57fa41ae145911dc5c554ee05fb048.
  • Diff scope: 5 files, 475 additions / 56 deletions.
  • GitHub checks: all reported checks are SUCCESS or intentionally SKIPPED.
  • Local static check: git diff --check origin/main...HEAD passed.
  • I did not run local pytest because this read-only checkout has no .venv, and the repo instructions require venv activation before Python commands.

iamcxa and others added 2 commits June 27, 2026 19:43
…m (DRC-3532)

Address PR #1418 incremental review (wcchang1115, NO-GO driver + note).

- Delete the 2-line "DRC-3634: route through _tool_run_backed_local ..."
  comment copy-pasted verbatim at all 5 run-backed call sites and the
  "Surface run_id ... matching CloudBackend shape" comment at the local
  merge site. The `await self._tool_run_backed_local(...)` call is
  self-documenting and the rationale lives once in the method docstring;
  the duplicates meant editing 6 places or leaving 5 stale. The distinct
  base->run_type mapping comment in _tool_query is kept.
- Reword the VIEW_ALL_MAX_NODES comment: it claimed the node cap makes the
  serialized result "cannot exceed" the 25k-token SDK cap, but only the node
  count is bounded, not edges. State honestly that edge count is assumed to
  scale with nodes (holds for sparse dbt DAGs); no edge capping added.

Signed-off-by: Kent <iamcxa@gmail.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tests (DRC-3532)

Address PR #1418 incremental review (response-shape + test hardening).

- run_id collision: row_count_diff returns a model-name-keyed dict, so
  {**result, "run_id": ...} would overwrite the row-count data of a dbt model
  literally named "run_id" while claiming existing fields are preserved. Guard
  both the cloud (_tool_run_backed) and local (_tool_run_backed_local) merge
  sites with `"run_id" not in result` so a real result key is never clobbered by
  the citation UUID. Pathological (no real dbt model is named run_id), but the
  guard is free and keeps the additive contract honest.
- Tests:
  - local + cloud collision regression (a model named "run_id" keeps its data)
  - cloud non-dict result is returned unchanged (isinstance guard)
  - parameterize the out-of-scope boundary test over value_diff_detail,
    top_k_diff, histogram_diff (descriptions promise run_id only for the five
    rerouted local tools)

Signed-off-by: Kent <iamcxa@gmail.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@iamcxa

iamcxa commented Jun 27, 2026

Copy link
Copy Markdown
Contributor Author

@wcchang1115 Round-2 review addressed across two commits (58969123, 8b7d4ff0).

From your review:

  • Issue 1 — duplicated rationale comments. Deleted the 2-line # DRC-3634: route through _tool_run_backed_local … comment at all 5 run-backed call sites and the # Surface run_id … matching CloudBackend shape comment at the local merge site (6 places). The _tool_run_backed_local docstring is now the single source of truth; the distinct base→run_type mapping comment in _tool_query is kept. (58969123)
  • Note 1 — VIEW_ALL_MAX_NODES "cannot exceed". Reworded: only the node count is bounded, edge count is assumed to scale with nodes (holds for sparse dbt DAGs). No edge capping added, per your note. (58969123)

From the incremental self-review at 53f05c96:

  • run_id can clobber a model named run_id. Valid: row_count_diff returns a model-name-keyed dict, so {**result, "run_id": ...} would overwrite the row-count data of a dbt model literally named run_id. Guarded both merge sites (cloud _tool_run_backed, local _tool_run_backed_local) with "run_id" not in result, so a real result key is never overwritten by the citation UUID. Added local + cloud collision regression tests. (8b7d4ff0)
  • Test hardening. Added the cloud non-dict-result case (list returned unchanged) and parameterized the out-of-scope boundary test over value_diff_detail, top_k_diff, histogram_diff. (8b7d4ff0)
  • profile.py case-insensitive matching broadens exact selections — deferred. Valid but intentionally not in this PR. The lowercasing was the load-bearing DRC-3674 fix for empty profiles on case-folding warehouses (Snowflake), which feat(value-diff): enable column sorting in summary grid #1427's CLV/profile grounding depends on. The over-broadening only triggers on a schema with two columns differing solely in case (e.g. name and Name) in one model — extremely rare in dbt, and the consequence is an extra column profiled, not data loss or a wrong result. The proper fix (exact-match first, case-insensitive fallback only when unambiguous) reworks a production fix and is better done as a focused follow-up than bundled into the PR that is currently gating feat(value-diff): enable column sorting in summary grid #1427. Will track separately.

Verification: pytest tests/test_mcp_server.py tests/test_mcp_cloud_backend.py tests/tasks/test_profile.py196 passed; ruff check clean.

One advisory left open: a handler-level failed-run test (invoke call_tool and assert isError + persisted FAILED Run). The failure path is covered at the _tool_* level today; happy to add the handler-level case if you'd like it in this PR.

@iamcxa iamcxa requested a review from wcchang1115 June 27, 2026 11:49
…ls (DRC-3532)

Close the last advisory from the PR #1418 incremental review: exercise the full
call_tool handler path (not just _tool_*-level) for a failing run-backed local
tool. Asserts the registered handler returns isError=True with the original
message surfaced (so _classify_db_error / the agent can see it), and the FAILED
Run is still persisted to context.runs for citation. Reuses
TestCallToolHandler._invoke_call_tool.

Signed-off-by: Kent <iamcxa@gmail.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@iamcxa

iamcxa commented Jun 27, 2026

Copy link
Copy Markdown
Contributor Author

Added the handler-level failed-run test in 53d2b1d7 (the last open advisory).

It drives the full call_tool handler path for a failing run-backed local tool and asserts: isError=True, the original error message is surfaced in the response content (so _classify_db_error / the agent can read it), and the FAILED Run is still persisted to context.runs for citation. Reuses the existing TestCallToolHandler._invoke_call_tool helper.

@wcchang1115

Copy link
Copy Markdown
Collaborator

Code Review: PR #1418

SHA 53d2b1d · Verdict GO · Incremental (53f05c96..53d2b1d7)

Re-review of the 3 commits since the 53f05c96 self-review. All blocking findings from that round are resolved; one finding remains a tracked follow-up (accepted).

Blockers

None.

Issues

None blocking in the incremental scope.

Resolution of prior findings

  • run_id clobbers model-name-keyed result (mcp_server.py) — Resolved in 8b7d4ff0. Both merge sites guard with "run_id" not in result (local :671, cloud :204), preserving the real key. Collision regression tests added (local + cloud); verified non-vacuous.
  • Advisories — all addressed: cloud non-dict passthrough test, parametrized non-run-backed boundary test (value_diff_detail/top_k/histogram), and the handler-level failed-run test (53d2b1d7, isError=True + FAILED Run persisted).
  • profile.py case-insensitive over-selection (profile.py:191, :300) — Deferred, accepted. Real (a model with name + Name profiles both when only name is requested), but the trigger is pathological for dbt, the consequence is one extra column profiled (no data loss / no diff misalignment), and the lowercasing is the load-bearing DRC-3674 fix that feat(value-diff): enable column sorting in summary grid #1427 depends on. Track the exact-match-first fix as the committed follow-up.

Verification

  • pytest tests/test_mcp_server.py::TestLocalModeRunBacked tests/test_mcp_cloud_backend.py61 passed
  • flake8 on changed files → clean
  • VIEW_ALL_MAX_NODES comment matches code (nodes capped :319/:1602; edges filtered to surviving nodes; dense-graph caveat stated)
  • run_id injection isolated to the MCP return value — persisted run.result not mutated, so summary.py / RowCountDiffResultDiffer consumers unaffected.

@wcchang1115 wcchang1115 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review: GO (incremental 53f05c9..53d2b1d). Prior blockers resolved; profile.py over-selection accepted as tracked follow-up. See review comment.

@kentwelcome kentwelcome left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kentwelcome kentwelcome merged commit 903fe5e into main Jun 29, 2026
22 checks passed
kentwelcome pushed a commit that referenced this pull request Jun 29, 2026
…sertions

test_tool_row_count_diff and test_tool_query_with_base_flag asserted exact
equality with the mock result, but run-backed local mode (DRC-3532, _tool_run_backed_local)
now merges a run_id key into the tool result. Compare on the result fields
ignoring run_id; the dedicated TestLocalModeRunBacked tests cover run_id surfacing.
Fixes the Test DBT Versions CI failure on PR #1418. 135 test_mcp_server tests green.

Signed-off-by: Kent <iamcxa@gmail.com>
kentwelcome pushed a commit that referenced this pull request Jun 29, 2026
…DRC-3634)

Address PR #1418 review (Claude Code Review, NO-GO).

Blocker -- _tool_run_backed_local swallowed failed local runs. submit_run.fn
records run.status=FAILED / run.error then returns None on every exception
except DuckDBExternalAccessBlocked, so `await asyncio.wrap_future(future)`
resolves cleanly and a FAILED run (result=None) collapsed into a
success-shaped {"run_id": ...}. A failing query / query_diff / profile_diff /
value_diff / row_count_diff (bad SQL, missing model, permission denied) thus
returned an empty success -- losing the error and its _classify_db_error
classification, and letting a citation agent report "no diff" for a run that
actually errored. Add a `run.status == FAILED` guard that re-raises
RecceException so call_tool restores isError=True, while the already-persisted
FAILED run stays in context.runs for citation. Regression tests cover the
raise (message preserved) and the persistence.

Issue 1 -- the 5 run-backed tool descriptions now document the additive
top-level run_id field (MCP tool description = LLM agent contract).

Signed-off-by: Kent <iamcxa@gmail.com>
kentwelcome pushed a commit that referenced this pull request Jun 29, 2026
…m (DRC-3532)

Address PR #1418 incremental review (wcchang1115, NO-GO driver + note).

- Delete the 2-line "DRC-3634: route through _tool_run_backed_local ..."
  comment copy-pasted verbatim at all 5 run-backed call sites and the
  "Surface run_id ... matching CloudBackend shape" comment at the local
  merge site. The `await self._tool_run_backed_local(...)` call is
  self-documenting and the rationale lives once in the method docstring;
  the duplicates meant editing 6 places or leaving 5 stale. The distinct
  base->run_type mapping comment in _tool_query is kept.
- Reword the VIEW_ALL_MAX_NODES comment: it claimed the node cap makes the
  serialized result "cannot exceed" the 25k-token SDK cap, but only the node
  count is bounded, not edges. State honestly that edge count is assumed to
  scale with nodes (holds for sparse dbt DAGs); no edge capping added.

Signed-off-by: Kent <iamcxa@gmail.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@kentwelcome kentwelcome deleted the feature/drc-3532-mcp-expose-run-id branch June 29, 2026 01:14
kentwelcome pushed a commit that referenced this pull request Jun 29, 2026
…tests (DRC-3532)

Address PR #1418 incremental review (response-shape + test hardening).

- run_id collision: row_count_diff returns a model-name-keyed dict, so
  {**result, "run_id": ...} would overwrite the row-count data of a dbt model
  literally named "run_id" while claiming existing fields are preserved. Guard
  both the cloud (_tool_run_backed) and local (_tool_run_backed_local) merge
  sites with `"run_id" not in result` so a real result key is never clobbered by
  the citation UUID. Pathological (no real dbt model is named run_id), but the
  guard is free and keeps the additive contract honest.
- Tests:
  - local + cloud collision regression (a model named "run_id" keeps its data)
  - cloud non-dict result is returned unchanged (isinstance guard)
  - parameterize the out-of-scope boundary test over value_diff_detail,
    top_k_diff, histogram_diff (descriptions promise run_id only for the five
    rerouted local tools)

Signed-off-by: Kent <iamcxa@gmail.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants