feat(mcp): surface run_id in run-backed tool results for citation (DRC-3532) by iamcxa · Pull Request #1418 · DataRecce/recce

iamcxa · 2026-06-03T09:54:52Z

Closes DRC-3532.

Why (cross-repo, DRC-3532)

The recce-cloud summary agent wants to deep-link each factual statement in its PR
summary to the exact run it executed, via deterministic {{run:<run_id>}} inline
markers (server-side marker replacement already exists in recce-cloud-infra's
recce_task_func). But the agent never received the run_id: the cloud-backend
MCP tools that back the agent's analysis (row_count_diff, profile_diff,
value_diff, query, query_diff, top_k_diff, histogram_diff) route through
_tool_run_backed, which returned only run["result"] and dropped run_id.

Without run_id the agent cannot cite runs deterministically, forcing fragile
server-side fuzzy prose matching (low/unstable coverage, occasional wrong links).

What

Cloud-backend path (`_tool_run_backed`)

_tool_run_backed now merges run_id into the result dict (additive — existing
result fields preserved). Only added when the response actually carries a run_id
and the result is a dict, so run-less or non-dict responses are untouched (the
agent must never be handed, or synthesize, a run_id it was not given).

Local-mode path (`_tool_run_backed_local`, DRC-3634)

The larger half of the mcp_server.py diff: in local mode, row_count_diff,
query, query_diff, profile_diff, and value_diff are rerouted through a new
_tool_run_backed_local. When an in-process RecceContext is available
(default_context()), it executes via the same submit_run machinery as the
recce server's run_router, so the Run is persisted to default_context().runs
(and therefore included in the S3-exported state the infra comment-builder reads)
and the returned dict carries a top-level run_id. When default_context() is
None (the cloud-backend SSE path), it falls back to plain in-executor execution
with no run_id — identical to pre-DRC-3634 behaviour.

Error semantics: submit_run.fn records run.status=FAILED / run.error then
returns None on any non-DuckDBExternalAccessBlocked exception, so the awaited
future resolves cleanly. _tool_run_backed_local inspects run.status and
re-raises RecceException on a FAILED run, so call_tool restores
isError=True and _classify_db_error can classify the failure — instead of
collapsing a failed run into a success-shaped {"run_id": ...}. The persisted
FAILED Run stays in context.runs for citation. The 5 run-backed tool
descriptions document the additive top-level run_id field (MCP tool
description = LLM agent contract).

Test

tests/test_mcp_cloud_backend.py:

new: run-backed result surfaces run_id (merged, fields preserved)
new: run-less response leaves the result untouched (no invented run_id)
updated: the existing run-tool routing test now expects the surfaced run_id

tests/test_mcp_server.py (TestLocalModeRunBacked):

run-backed local mode surfaces run_id for each of the 5 tools; out-of-scope
tools (e.g. value_diff_detail) do not
new: a failing run surfaces the error (message preserved for
_classify_db_error), NOT a success-shaped {"run_id": ...}
new: a failing run still persists the FAILED Run to context.runs for citation

137 local mcp_server tests + 40 cloud-backend tests green.

Paired change

recce-cloud-infra: the summary agent prompt emits {{run:<run_id>}} markers
using this run_id; fuzzy linkify is demoted to a legacy fallback. Tracked under
DRC-3532. Durable structured-citation design is DRC-3634.

🤖 Generated with Claude Code

Consumed by (cross-repo)

recce-cloud-infra PR DataRecce/recce-cloud-infra#1427 (Summary v2: trust architecture — value-grounding + run citations) depends on this PR. The instance-launcher image bakes the recce build from this branch; #1427's run-citation deep links ([value](…/runs/<id>)) and profile_diff-grounded row counts require the changes here. Suggested merge order: this PR → recce release/image → #1427.

Also on this branch — `profile_diff` lowercase-column fix (DRC-3674 B)

Commit 889a01b7 (fix(profile): resolve lowercase columns to physical names on case-folding warehouses): ProfileDiffTask / ProfileTask filtered requested columns with a case-sensitive column.name in selected_columns test. On Snowflake (physical names UPPERCASE) the agent's lowercase column names matched nothing → the filter dropped every column → empty profile → the cloud summary suppressed the whole analysis. Fix lowercases both sides of the membership test (no-op on duckdb; SQL path unchanged). Regression test reproduces the empty-profile case with an uppercase-header duckdb table. This unblocks #1427's CLV/profile grounding (which was intermittently empty in UAT depending on the LLM's column casing).

…C-3532) The cloud-backend MCP tools (row_count_diff, profile_diff, value_diff, query, query_diff, top_k_diff, histogram_diff) routed through _tool_run_backed returned only run["result"], dropping run_id. The recce-cloud summary agent therefore never saw the run_id and could not emit deterministic {{run:<run_id>}} citation markers, forcing fragile server-side fuzzy prose matching. Merge run_id into the result dict (additive; existing result fields preserved). Only added when the response carries a run_id and the result is a dict, so run-less or non-dict responses are untouched (never synthesize a run_id). Cross-repo (DRC-3532): the recce-cloud summary agent prompt is updated to emit the markers; server-side marker replacement already exists in recce_task_func. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Kent <iamcxa@gmail.com>

codecov · 2026-06-03T10:03:30Z

Codecov Report

❌ Patch coverage is 94.58333% with 13 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
recce/mcp_server.py	70.45%	13 Missing ⚠️

Files with missing lines	Coverage Δ
recce/tasks/profile.py	`83.33% <100.00%> (+0.28%)`	⬆️
tests/tasks/test_profile.py	`99.13% <100.00%> (+0.21%)`	⬆️
tests/test_mcp_cloud_backend.py	`100.00% <100.00%> (ø)`
tests/test_mcp_server.py	`99.87% <100.00%> (+0.01%)`	⬆️
recce/mcp_server.py	`90.45% <70.45%> (-1.08%)`	⬇️

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…s state Local-mode ad-hoc diff tools (row_count_diff, profile_diff, query, query_diff, value_diff) now route through _tool_run_backed_local, which uses the same submit_run machinery as the recce server's run_router to persist Runs to default_context().runs. The tool result carries run_id alongside the diff output, matching the CloudBackend._tool_run_backed shape (commit 98670f9). When default_context() is None (cloud SSE path already handled by CloudBackend), the method falls back to direct task execution without run_id — identical to pre-DRC-3634 behaviour, no double execution. The MCP server remains the sole state exporter; runs created mid-session via _tool_run_backed_local are included in the exported payload because they land in the same in-process RecceContext.runs list that export_state() serialises. Tests: 12 new tests in TestLocalModeRunBacked covering run_id presence, run persistence in context.runs, run type mapping (query/query_base), and the isolation boundary (value_diff_detail stays outside run-backed scope). 175 MCP tests green (135 server + 40 cloud backend); ruff clean. Signed-off-by: Kent <iamcxa@gmail.com>

…ding warehouses profile_diff/profile filtered the requested `columns` with a case-sensitive membership test (`column.name in selected_columns`). get_columns() returns physical catalog names, which case-folding warehouses (e.g. Snowflake) store UPPERCASE, while the Recce Cloud summary agent supplies lowercase manifest-convention names. The exact-case filter dropped every column, so the per-column profiling loop never ran and the tool returned a completely empty result ({"base":{"columns":[],"data":[]},"current":{"columns":[],"data":[]}}). This made the cloud summary non-deterministic: lowercase column names yielded an empty profile that the downstream trust gate suppressed. Fix: lowercase both sides of the membership test before filtering, mirroring the case-insensitive normalisation in valuediff's _build_column_case_lookup. The SQL path is unchanged — it still profiles via the physical column.name — so lowercase input now resolves to the same physical (uppercase) columns and returns the same non-empty profile as uppercase input. No-op on already-lowercase duckdb models and on quoted/case-sensitive identifiers; applied to both ProfileDiffTask and ProfileTask. Adds regression tests using an UPPERCASE duckdb CSV header (duckdb preserves header casing, reproducing Snowflake physical-name casing end-to-end with no mocking): asserts lowercase and uppercase column requests return identical non-empty profiles, plus an adapter-safety no-op test on lowercase physical names. DRC-3674 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Kent <iamcxa@gmail.com>

…sertions test_tool_row_count_diff and test_tool_query_with_base_flag asserted exact equality with the mock result, but run-backed local mode (DRC-3532, _tool_run_backed_local) now merges a run_id key into the tool result. Compare on the result fields ignoring run_id; the dedicated TestLocalModeRunBacked tests cover run_id surfacing. Fixes the Test DBT Versions CI failure on PR #1418. 135 test_mcp_server tests green. Signed-off-by: Kent <iamcxa@gmail.com>

kentwelcome

Claude Code Review: 1 blocker (silent error swallowing in local run-backed tools), 2 issues. See review comment.

kentwelcome · 2026-06-15T08:27:12Z

Code Review: PR #1418

SHA 2e8dd51d · Verdict NO-GO

Blockers

recce/mcp_server.py:613-624 — Failed runs return a success-shaped result, silently swallowing the error.
Evidence: submit_run.fn (recce/apis/run_func.py:245-258) catches every exception except DuckDBExternalAccessBlocked, stores it in run.error, and return None — so await asyncio.wrap_future(future) (line 611) never raises. _tool_run_backed_local then reads only run.result (None → {}) and never inspects run.status/run.error, returning {"run_id": ...}. Before this PR, task.execute() raised and call_tool re-raised (mcp_server.py:1473) so the MCP SDK set isError=True. Now a failing query/query_diff/profile_diff/value_diff/row_count_diff in local mode (bad SQL, missing model, permission denied) returns an empty success — the agent loses both the error and the _classify_db_error classification (TABLE_NOT_FOUND / PERMISSION_DENIED / SYNTAX_ERROR), and a citation agent can report "empty/no diff" for a query that actually errored.
Pass D.

Concrete example — agent calls query_diff with a non-existent column:
```
query_diff({"sql_template": "SELECT bad_col FROM {{ ref('orders') }}", "primary_keys": ["id"]})
```
- Before: QueryDiffTask.execute() raises duckdb BinderException('Referenced column "bad_col" not found'); call_tool re-raises → MCP response isError=True carrying the message. Agent knows it failed.
- After: submit_run.fn catches the BinderException (it is not DuckDBExternalAccessBlocked), sets run.status=FAILED / run.error="Referenced column ... not found" / run.result=None, and returns None. Future resolves cleanly; _tool_run_backed_local maps run.result is None → {} and returns a success result:
```
{ "run_id": "a1b2c3d4-..." }
```
  No data, no error. The summary agent reads this as "ran fine, empty diff" and can emit {{run:a1b2c3d4}} citing a run that actually errored.
Exact culprit lines (the failure is the absence of a status check across three lines):
```
609    run, future = _submit_run_fn(run_type, params, triggered_by="recce_ai")
610    # Await the executor future so run.result is populated before we return.
611    await asyncio.wrap_future(future)        # <-- discards return value, never raises:
                                                #     submit_run.fn returns None on error,
                                                #     so the future resolves "successfully"
612    # Normalise run.result to a plain dict (tasks may return Pydantic models).
613    raw_result = run.result
614    if raw_result is None:                    # <-- FAILED run (result=None, msg in run.error)
615        result: Dict[str, Any] = {}           #     collapses into empty {}, status/error ignored
...
624    return {**result, "run_id": str(run.run_id)}   # <-- success-shaped dict returned unconditionally
```
- Line 611 awaits but never raises — the error signal is lost here (the line 610 comment assumes success).
- Lines 614-615 treat a FAILED run identically to a genuinely-empty result; nothing reads run.status/run.error.
- Line 624 returns {"run_id": ...} regardless of failure.
Lines 614-615 are not wrong in isolation — a successful task can legitimately return empty. The bug is that nothing distinguishes "succeeded empty" from "failed," because line 611 already swallowed that signal.

Suggested fix — add a guard between 611 and 613:
```
await asyncio.wrap_future(future)
if run.status == RunStatus.FAILED:
    raise RecceException(run.error or f"{run_type} run failed")
raw_result = run.result
```
This re-raises so call_tool (mcp_server.py:1450-1474) restores isError=True, while still leaving the persisted Run in the context for citation.

Issues

recce/mcp_server.py (tool descriptions) — The 5 run-backed tools now return a top-level run_id, but none of their Tool(...description=...) blocks document it.
Evidence: grep run_id over the descriptions finds only run_check (line 1207, unrelated). CLAUDE.md → "MCP tool description = LLM agent contract. Description MUST match actual response format" and "Prefer additive (_meta) over modifying existing field types." A new top-level field is a response-shape change the contract doesn't declare.
Pass C.
PR description omits the DRC-3634 local-mode rewrite — the largest change in the diff.
Evidence: the "What" section describes only CloudBackend._tool_run_backed merging run_id (additive). The actual diff reroutes row_count_diff/query/query_diff/profile_diff/value_diff through a new _tool_run_backed_local that executes via submit_run, persists Runs to the in-process context, and changes return shape + error semantics. That behavior change is documented only in inline # DRC-3634 comments — exactly where the Blocker hides.
Pass F.

Notes

tests/test_mcp_server.py::TestLocalModeRunBacked covers only happy paths (every task.execute is mocked to return a model). No test exercises a raising task, so the error-swallowing regression above is uncaught by the suite. Adding a "task raises → tool surfaces error (isError), not {run_id}" case would have flagged it. Pass E.
_tool_run_backed_local fallback branch (default_context() is None, lines 591-606) preserves the old raise-on-error behavior, but in live local mode default_context() is set, so the swallowing path (lines 609-624) is the one that runs.

Limits

Local environment is missing the mcp and dbt modules (pytest.importorskip("mcp") skipped the MCP suites; tests/tasks conftest fails on No module named 'dbt'). Could not execute the test suite here — author reports 40 cloud-backend + 123 mcp_server green; CI gates. Findings above are from static tracing, not a failed test run.

…DRC-3634) Address PR #1418 review (Claude Code Review, NO-GO). Blocker -- _tool_run_backed_local swallowed failed local runs. submit_run.fn records run.status=FAILED / run.error then returns None on every exception except DuckDBExternalAccessBlocked, so `await asyncio.wrap_future(future)` resolves cleanly and a FAILED run (result=None) collapsed into a success-shaped {"run_id": ...}. A failing query / query_diff / profile_diff / value_diff / row_count_diff (bad SQL, missing model, permission denied) thus returned an empty success -- losing the error and its _classify_db_error classification, and letting a citation agent report "no diff" for a run that actually errored. Add a `run.status == FAILED` guard that re-raises RecceException so call_tool restores isError=True, while the already-persisted FAILED run stays in context.runs for citation. Regression tests cover the raise (message preserved) and the persistence. Issue 1 -- the 5 run-backed tool descriptions now document the additive top-level run_id field (MCP tool description = LLM agent contract). Signed-off-by: Kent <iamcxa@gmail.com>

iamcxa · 2026-06-26T16:29:28Z

@kentwelcome All review points addressed in d45ea317.

Blocker — failed local runs swallowed. Added a run.status == RunStatus.FAILED guard in _tool_run_backed_local immediately after await asyncio.wrap_future(future). It re-raises RecceException(run.error or ...) so call_tool restores isError=True and _classify_db_error still classifies the failure (the original message is preserved in run.error). The already-persisted FAILED Run stays in context.runs for citation, so no run is lost.

Issue 1 — undocumented run_id. All 5 run-backed tool descriptions (row_count_diff, query, query_diff, profile_diff, value_diff) now document the additive top-level run_id field and the {{run:<run_id>}} citation use.

Notes — happy-path-only tests. Added two TestLocalModeRunBacked regression tests: (1) a raising task surfaces the error (message preserved), not a success-shaped {"run_id": ...}; (2) a failing run still persists the FAILED Run to context.runs.

Issue 2 — PR description. Updated the description to document the _tool_run_backed_local local-mode rewrite (the larger half of the diff) and its error semantics.

Re: Limits — could not run the suite. Ran it locally with mcp/dbt available: tests/test_mcp_server.py 137 passed (including the two new failure-path tests), ruff check clean. The previously-uncovered "task raises → tool surfaces error, not {run_id}" path is now exercised.

wcchang1115

Code Review: PR #1418 (incremental)

SHA 53f05c96 · Verdict NO-GO · Incremental (2e8dd51d → HEAD)

Scope: branch was rebased since the anchor; PR-authored work since the last review touches recce/mcp_server.py + the two MCP test files only.

Blockers

None.

Issues

recce/mcp_server.py — Rationale comments duplicate the _tool_run_backed_local docstring (mcp_server.py:606) instead of living there once:
- :1734,1741,1753,1760,1767 — the same 2-line # DRC-3634: route through _tool_run_backed_local … comment is copy-pasted verbatim at all 5 run-backed call sites. The call await self._tool_run_backed_local(...) is self-documenting; editing the rationale means updating 6 places or leaving 5 stale.
- :663 — the # Surface run_id … matching CloudBackend shape (commit 98670f9e) comment restates the docstring and narrates the obvious next line.
Fix: delete all 6; let the docstring be the single source of truth. Keep the distinct base→run_type mapping comment in _tool_query. Removes ~12 lines, no technical content lost.

Notes

recce/mcp_server.py:51 — VIEW_ALL_MAX_NODES comment claims the cap makes the result "cannot exceed" the 25k-token SDK cap, but only node count is bounded, not edges. The claim is technically false. Fix is wording only — e.g. "node-count bounded; edge count assumed to scale with nodes (holds for sparse dbt DAGs)". Do not add edge capping: disproportionate change, the trigger (~44k edges on 300 nodes) doesn't occur in dbt DAGs, and dropping edges would orphan nodes into false roots — worse than the current honest output.

Verification: pytest tests/test_mcp_server.py tests/test_mcp_cloud_backend.py → 180 passed.

Issue 1 is the only NO-GO driver and is follow-up-acceptable (zero runtime risk; pure maintainability).

iamcxa

@iamcxa Code review for PR #1418 at 53f05c96.

Verdict: NO-GO as-is. CI is green, but I found two concrete response-shape / selection regressions that should be fixed before merge.

Findings

recce/mcp_server.py line 202, and local mirror at line 665 — top-level run_id can overwrite real row-count data.

Both cloud and local paths merge citation metadata with {**result, "run_id": ...}. That is not strictly additive for row_count_diff, whose response is a model-name-keyed dict. If a project has a dbt model named run_id, its row-count result is replaced by the UUID string, so the MCP response silently drops actual diff data while claiming existing result fields are preserved.

Suggested fix: add a collision regression test and choose a non-colliding shape for citation metadata, or otherwise preserve an existing run_id result key instead of overwriting it.

recce/tasks/profile.py line 193, and ProfileTask line 301 — case-insensitive matching broadens exact selections on case-sensitive schemas.

The new filter lowercases every physical column name before matching. That fixes Snowflake-style uppercase physical names, but it regresses quoted/case-sensitive schemas with distinct columns such as name and Name: requesting name now profiles both columns. The new test comment says this is a no-op on quoted/case-sensitive identifiers, but the implementation does not preserve exact-case semantics.

Suggested fix: resolve exact matches first, then fall back to case-insensitive matching only when the lowercase key maps unambiguously to one physical column. Add a regression test with case-distinct physical columns.

Advisory test hardening

Add cloud coverage for a non-dict result, for example {"run_id": "run-xyz", "result": [{"ok": true}]}, and assert the list is returned unchanged.
Parameterize the local out-of-scope boundary test for value_diff_detail, top_k_diff, and histogram_diff, since tool descriptions only promise run_id for the five rerouted local tools.
Consider one handler-level failed-run test that invokes the MCP call_tool handler and asserts the response is isError while the FAILED Run remains persisted.

Verification

Freshness: live PR head was rechecked immediately before posting and remained 53f05c96de57fa41ae145911dc5c554ee05fb048.
Diff scope: 5 files, 475 additions / 56 deletions.
GitHub checks: all reported checks are SUCCESS or intentionally SKIPPED.
Local static check: git diff --check origin/main...HEAD passed.
I did not run local pytest because this read-only checkout has no .venv, and the repo instructions require venv activation before Python commands.

…m (DRC-3532) Address PR #1418 incremental review (wcchang1115, NO-GO driver + note). - Delete the 2-line "DRC-3634: route through _tool_run_backed_local ..." comment copy-pasted verbatim at all 5 run-backed call sites and the "Surface run_id ... matching CloudBackend shape" comment at the local merge site. The `await self._tool_run_backed_local(...)` call is self-documenting and the rationale lives once in the method docstring; the duplicates meant editing 6 places or leaving 5 stale. The distinct base->run_type mapping comment in _tool_query is kept. - Reword the VIEW_ALL_MAX_NODES comment: it claimed the node cap makes the serialized result "cannot exceed" the 25k-token SDK cap, but only the node count is bounded, not edges. State honestly that edge count is assumed to scale with nodes (holds for sparse dbt DAGs); no edge capping added. Signed-off-by: Kent <iamcxa@gmail.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…tests (DRC-3532) Address PR #1418 incremental review (response-shape + test hardening). - run_id collision: row_count_diff returns a model-name-keyed dict, so {**result, "run_id": ...} would overwrite the row-count data of a dbt model literally named "run_id" while claiming existing fields are preserved. Guard both the cloud (_tool_run_backed) and local (_tool_run_backed_local) merge sites with `"run_id" not in result` so a real result key is never clobbered by the citation UUID. Pathological (no real dbt model is named run_id), but the guard is free and keeps the additive contract honest. - Tests: - local + cloud collision regression (a model named "run_id" keeps its data) - cloud non-dict result is returned unchanged (isinstance guard) - parameterize the out-of-scope boundary test over value_diff_detail, top_k_diff, histogram_diff (descriptions promise run_id only for the five rerouted local tools) Signed-off-by: Kent <iamcxa@gmail.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

iamcxa · 2026-06-27T11:49:04Z

@wcchang1115 Round-2 review addressed across two commits (58969123, 8b7d4ff0).

From your review:

Issue 1 — duplicated rationale comments. Deleted the 2-line # DRC-3634: route through _tool_run_backed_local … comment at all 5 run-backed call sites and the # Surface run_id … matching CloudBackend shape comment at the local merge site (6 places). The _tool_run_backed_local docstring is now the single source of truth; the distinct base→run_type mapping comment in _tool_query is kept. (58969123)
Note 1 — VIEW_ALL_MAX_NODES "cannot exceed". Reworded: only the node count is bounded, edge count is assumed to scale with nodes (holds for sparse dbt DAGs). No edge capping added, per your note. (58969123)

From the incremental self-review at 53f05c96:

run_id can clobber a model named run_id. Valid: row_count_diff returns a model-name-keyed dict, so {**result, "run_id": ...} would overwrite the row-count data of a dbt model literally named run_id. Guarded both merge sites (cloud _tool_run_backed, local _tool_run_backed_local) with "run_id" not in result, so a real result key is never overwritten by the citation UUID. Added local + cloud collision regression tests. (8b7d4ff0)
Test hardening. Added the cloud non-dict-result case (list returned unchanged) and parameterized the out-of-scope boundary test over value_diff_detail, top_k_diff, histogram_diff. (8b7d4ff0)
profile.py case-insensitive matching broadens exact selections — deferred. Valid but intentionally not in this PR. The lowercasing was the load-bearing DRC-3674 fix for empty profiles on case-folding warehouses (Snowflake), which feat(value-diff): enable column sorting in summary grid #1427's CLV/profile grounding depends on. The over-broadening only triggers on a schema with two columns differing solely in case (e.g. name and Name) in one model — extremely rare in dbt, and the consequence is an extra column profiled, not data loss or a wrong result. The proper fix (exact-match first, case-insensitive fallback only when unambiguous) reworks a production fix and is better done as a focused follow-up than bundled into the PR that is currently gating feat(value-diff): enable column sorting in summary grid #1427. Will track separately.

Verification: pytest tests/test_mcp_server.py tests/test_mcp_cloud_backend.py tests/tasks/test_profile.py → 196 passed; ruff check clean.

One advisory left open: a handler-level failed-run test (invoke call_tool and assert isError + persisted FAILED Run). The failure path is covered at the _tool_* level today; happy to add the handler-level case if you'd like it in this PR.

…ls (DRC-3532) Close the last advisory from the PR #1418 incremental review: exercise the full call_tool handler path (not just _tool_*-level) for a failing run-backed local tool. Asserts the registered handler returns isError=True with the original message surfaced (so _classify_db_error / the agent can see it), and the FAILED Run is still persisted to context.runs for citation. Reuses TestCallToolHandler._invoke_call_tool. Signed-off-by: Kent <iamcxa@gmail.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

iamcxa · 2026-06-27T11:55:21Z

Added the handler-level failed-run test in 53d2b1d7 (the last open advisory).

It drives the full call_tool handler path for a failing run-backed local tool and asserts: isError=True, the original error message is surfaced in the response content (so _classify_db_error / the agent can read it), and the FAILED Run is still persisted to context.runs for citation. Reuses the existing TestCallToolHandler._invoke_call_tool helper.

wcchang1115 · 2026-06-27T16:15:51Z

Code Review: PR #1418

SHA 53d2b1d · Verdict GO · Incremental (53f05c96..53d2b1d7)

Re-review of the 3 commits since the 53f05c96 self-review. All blocking findings from that round are resolved; one finding remains a tracked follow-up (accepted).

Blockers

None.

Issues

None blocking in the incremental scope.

Resolution of prior findings

run_id clobbers model-name-keyed result (mcp_server.py) — Resolved in 8b7d4ff0. Both merge sites guard with "run_id" not in result (local :671, cloud :204), preserving the real key. Collision regression tests added (local + cloud); verified non-vacuous.
Advisories — all addressed: cloud non-dict passthrough test, parametrized non-run-backed boundary test (value_diff_detail/top_k/histogram), and the handler-level failed-run test (53d2b1d7, isError=True + FAILED Run persisted).
profile.py case-insensitive over-selection (profile.py:191, :300) — Deferred, accepted. Real (a model with name + Name profiles both when only name is requested), but the trigger is pathological for dbt, the consequence is one extra column profiled (no data loss / no diff misalignment), and the lowercasing is the load-bearing DRC-3674 fix that feat(value-diff): enable column sorting in summary grid #1427 depends on. Track the exact-match-first fix as the committed follow-up.

Verification

pytest tests/test_mcp_server.py::TestLocalModeRunBacked tests/test_mcp_cloud_backend.py → 61 passed
flake8 on changed files → clean
VIEW_ALL_MAX_NODES comment matches code (nodes capped :319/:1602; edges filtered to surviving nodes; dense-graph caveat stated)
run_id injection isolated to the MCP return value — persisted run.result not mutated, so summary.py / RowCountDiffResultDiffer consumers unaffected.

wcchang1115

Claude Code Review: GO (incremental 53f05c9..53d2b1d). Prior blockers resolved; profile.py over-selection accepted as tracked follow-up. See review comment.

kentwelcome

LGTM

…sertions test_tool_row_count_diff and test_tool_query_with_base_flag asserted exact equality with the mock result, but run-backed local mode (DRC-3532, _tool_run_backed_local) now merges a run_id key into the tool result. Compare on the result fields ignoring run_id; the dedicated TestLocalModeRunBacked tests cover run_id surfacing. Fixes the Test DBT Versions CI failure on PR #1418. 135 test_mcp_server tests green. Signed-off-by: Kent <iamcxa@gmail.com>

…DRC-3634) Address PR #1418 review (Claude Code Review, NO-GO). Blocker -- _tool_run_backed_local swallowed failed local runs. submit_run.fn records run.status=FAILED / run.error then returns None on every exception except DuckDBExternalAccessBlocked, so `await asyncio.wrap_future(future)` resolves cleanly and a FAILED run (result=None) collapsed into a success-shaped {"run_id": ...}. A failing query / query_diff / profile_diff / value_diff / row_count_diff (bad SQL, missing model, permission denied) thus returned an empty success -- losing the error and its _classify_db_error classification, and letting a citation agent report "no diff" for a run that actually errored. Add a `run.status == FAILED` guard that re-raises RecceException so call_tool restores isError=True, while the already-persisted FAILED run stays in context.runs for citation. Regression tests cover the raise (message preserved) and the persistence. Issue 1 -- the 5 run-backed tool descriptions now document the additive top-level run_id field (MCP tool description = LLM agent contract). Signed-off-by: Kent <iamcxa@gmail.com>

…m (DRC-3532) Address PR #1418 incremental review (wcchang1115, NO-GO driver + note). - Delete the 2-line "DRC-3634: route through _tool_run_backed_local ..." comment copy-pasted verbatim at all 5 run-backed call sites and the "Surface run_id ... matching CloudBackend shape" comment at the local merge site. The `await self._tool_run_backed_local(...)` call is self-documenting and the rationale lives once in the method docstring; the duplicates meant editing 6 places or leaving 5 stale. The distinct base->run_type mapping comment in _tool_query is kept. - Reword the VIEW_ALL_MAX_NODES comment: it claimed the node cap makes the serialized result "cannot exceed" the 25k-token SDK cap, but only the node count is bounded, not edges. State honestly that edge count is assumed to scale with nodes (holds for sparse dbt DAGs); no edge capping added. Signed-off-by: Kent <iamcxa@gmail.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…tests (DRC-3532) Address PR #1418 incremental review (response-shape + test hardening). - run_id collision: row_count_diff returns a model-name-keyed dict, so {**result, "run_id": ...} would overwrite the row-count data of a dbt model literally named "run_id" while claiming existing fields are preserved. Guard both the cloud (_tool_run_backed) and local (_tool_run_backed_local) merge sites with `"run_id" not in result` so a real result key is never clobbered by the citation UUID. Pathological (no real dbt model is named run_id), but the guard is free and keeps the additive contract honest. - Tests: - local + cloud collision regression (a model named "run_id" keeps its data) - cloud non-dict result is returned unchanged (isinstance guard) - parameterize the out-of-scope boundary test over value_diff_detail, top_k_diff, histogram_diff (descriptions promise run_id only for the five rerouted local tools) Signed-off-by: Kent <iamcxa@gmail.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

iamcxa and others added 2 commits June 13, 2026 00:05

iamcxa requested a review from kentwelcome June 15, 2026 04:05

iamcxa added 2 commits June 15, 2026 12:08

Merge branch 'main' into feature/drc-3532-mcp-expose-run-id

2e8dd51

kentwelcome requested changes Jun 15, 2026

View reviewed changes

iamcxa requested a review from kentwelcome June 26, 2026 16:29

Merge branch 'main' into feature/drc-3532-mcp-expose-run-id

53f05c9

iamcxa requested a review from wcchang1115 June 26, 2026 16:46

wcchang1115 requested changes Jun 27, 2026

View reviewed changes

iamcxa commented Jun 27, 2026

View reviewed changes

iamcxa and others added 2 commits June 27, 2026 19:43

iamcxa requested a review from wcchang1115 June 27, 2026 11:49

wcchang1115 approved these changes Jun 27, 2026

View reviewed changes

kentwelcome approved these changes Jun 29, 2026

View reviewed changes

kentwelcome merged commit 903fe5e into main Jun 29, 2026
22 checks passed

kentwelcome deleted the feature/drc-3532-mcp-expose-run-id branch June 29, 2026 01:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(mcp): surface run_id in run-backed tool results for citation (DRC-3532)#1418

feat(mcp): surface run_id in run-backed tool results for citation (DRC-3532)#1418
kentwelcome merged 10 commits into
mainfrom
feature/drc-3532-mcp-expose-run-id

iamcxa commented Jun 3, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 3, 2026 •

edited

Loading

Uh oh!

kentwelcome left a comment

Uh oh!

kentwelcome commented Jun 15, 2026 •

edited

Loading

Uh oh!

iamcxa commented Jun 26, 2026

Uh oh!

wcchang1115 left a comment

Uh oh!

iamcxa left a comment

Uh oh!

iamcxa commented Jun 27, 2026

Uh oh!

iamcxa commented Jun 27, 2026

Uh oh!

wcchang1115 commented Jun 27, 2026

Uh oh!

wcchang1115 left a comment

Uh oh!

kentwelcome left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

iamcxa commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why (cross-repo, DRC-3532)

What

Cloud-backend path (_tool_run_backed)

Local-mode path (_tool_run_backed_local, DRC-3634)

Test

Paired change

Consumed by (cross-repo)

Also on this branch — profile_diff lowercase-column fix (DRC-3674 B)

Uh oh!

codecov Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

kentwelcome left a comment

Choose a reason for hiding this comment

Uh oh!

kentwelcome commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: PR #1418

Blockers

Issues

Notes

Limits

Uh oh!

iamcxa commented Jun 26, 2026

Uh oh!

wcchang1115 left a comment

Choose a reason for hiding this comment

Code Review: PR #1418 (incremental)

Blockers

Issues

Notes

Uh oh!

iamcxa left a comment

Choose a reason for hiding this comment

Findings

Advisory test hardening

Verification

Uh oh!

iamcxa commented Jun 27, 2026

Uh oh!

iamcxa commented Jun 27, 2026

Uh oh!

wcchang1115 commented Jun 27, 2026

Code Review: PR #1418

Blockers

Issues

Resolution of prior findings

Verification

Uh oh!

wcchang1115 left a comment

Choose a reason for hiding this comment

Uh oh!

kentwelcome left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

iamcxa commented Jun 3, 2026 •

edited

Loading

Cloud-backend path (`_tool_run_backed`)

Local-mode path (`_tool_run_backed_local`, DRC-3634)

Also on this branch — `profile_diff` lowercase-column fix (DRC-3674 B)

codecov Bot commented Jun 3, 2026 •

edited

Loading

kentwelcome commented Jun 15, 2026 •

edited

Loading