Playbook Agent: JSON extractor for LLM output parsing (6/8) by rashmiraghunandan · Pull Request #577 · roostorg/coop

rashmiraghunandan · 2026-05-23T00:10:01Z

depends on Playbook Agent: types and abstract interfaces (1/8) #572
depends on Playbook Agent: playbook loader with validation (2/8) #573
depends on Playbook Agent: query catalog (3/8) #574
depends on Playbook Agent: hard rule engine (4/8) #575
depends on Playbook Agent: grounded confidence engine (5/8) #576

Summary

Robust structured verdict parser that extracts JSON from free-form LLM text. LLMs wrap JSON in markdown fences, prose, or emit multiple objects — this handles all of it.

Part 6 of 8 — depends on #576 (confidence engine).

`jsonExtractor.ts`

extractJsonObjects(): balanced brace counting (more robust than regex)
- Handles nested objects, strings containing braces, escaped characters
Template filtering: rejects unfilled placeholders like [REQUIRED], [if available] (18 patterns)
extractVerdict(): returns the last valid, non-template verdict
Supports both FINAL_VERDICT wrapper and direct root-level fields
validateAgainstContract(): checks required fields from output contract
Defaults u_llm to 0.5 when LLM doesn't report uncertainty

`jsonExtractor.test.ts` — 9 tests

Single / multiple / nested JSON extraction
Handles strings containing braces
Skips invalid JSON
Extracts from FINAL_VERDICT wrapper and root-level fields
Returns last valid verdict (most recent)
Filters out template patterns
Validates against output contract

Test plan

npm test -- --testPathPattern=jsonExtractor passes (9 tests)

Depends on #576 Confidence engine. Merge #572, #573, #574, #575, #576 first.

Summary by CodeRabbit

New Features
- Playbook framework for investigation workflows with validated playbook parsing and versioned catalog references
- Confidence scoring that combines evidence coverage, signal agreement, data quality, and LLM uncertainty into 0–1 and 1–5 outputs
- Deterministic rule evaluation to short-circuit decisions based on query results
- Query catalog for managed SQL templates with parameter binding
- Verdict extraction from free-form model output (JSON extraction and contract validation)
Tests
- Added comprehensive test suites covering playbook parsing, confidence computation, rule evaluation, query catalog, and JSON extraction

Introduce the foundational type system for the playbook agent framework: playbookTypes.ts — TypeScript types for declarative investigation playbooks: - SemanticVersion with parsing and formatting - CatalogReference for versioned query references (no floating versions) - Playbook config: evidence ontology, labels, decision logic, confidence computation weights, output contracts - PlaybookVerdict and ConfidenceBreakdown result types - QueryResult/QueryResults for evidence store responses interfaces.ts — Abstract contracts that organizations implement: - ILLMAdapter: pluggable LLM provider (OpenAI, Anthropic, local, etc.) - IEvidenceStore: execute pre-approved queries by catalog ID - IArtifactStore: insert-only storage for immutable audit trail - IPlaybookRunner: core runner interface These types enforce the key safety property: playbooks reference queries by catalog_id@version and never contain SQL, table names, or field mappings. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Implement the playbook loader that converts raw YAML/JSON playbook configs into validated TypeScript Playbook objects. playbookLoader.ts: - parsePlaybook(): snake_case YAML input → camelCase TypeScript output - Cross-reference validation: - baseline_catalog_refs must be a subset of allowed_catalog_refs - Deep-dive module query refs must be in the allowed set - Hard rule outcomes must reference valid label IDs - Default label must exist in the labels list - Confidence computation weights must sum to ~1.0 - Validates semantic versions, catalog IDs, input field types, severities - Rejects forbidden floating versions (latest, head, main, master, current) playbookLoader.test.ts (7 tests): - Parses valid playbook with correct type conversion - Applies default values for optional confidence fields - Rejects baseline ref not in allowed set - Rejects invalid default label - Rejects confidence weights that don't sum to ~1.0 - Rejects invalid semantic versions - Validates hard rule outcomes against labels Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Implement the query catalog — the trust boundary between the AI agent and the database. The agent requests evidence by catalog_id@version; it never writes SQL or sees table names. queryCatalog.ts: - parseCatalogEntry(): extract metadata (catalog_id, version, description) from SQL comment headers and discover :param tokens - QueryCatalog class: - resolve(): look up CatalogReference → CatalogEntry - resolveByString(): look up "catalog_id@version" string - isAllowed(): validate a ref is in the playbook's allowed set - renderSql(): substitute :param tokens with positional $N bindings for safe parameterized execution (prevents SQL injection) queryCatalog.test.ts (7 tests): - Parses SQL template with metadata comments - Throws on missing catalog_id or version - Deduplicates repeated parameters - Resolves by CatalogReference and by string - Renders SQL with positional parameters and bindings array - Throws on missing required parameter Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Implement the hard rule engine — deterministic rules evaluated before the LLM runs. When a hard rule fires, its outcome is locked and the LLM verdict cannot override it. This is the key safety mechanism for critical cases. For example, a known CSAM series hash match always results in REPORT_AND_REMOVE regardless of what the LLM produces — no model drift, no hallucination risk. hardRuleEngine.ts: - evaluateHardRules(): evaluate rules in order, return first match - Simple dot-path expression format: "catalog_id.field_name == 'VALUE'" - Supports operators: ==, !=, >=, <=, >, < - Matches against ANY row in the query result set - Returns undefined if no rules match (LLM verdict is used) hardRuleEngine.test.ts (8 tests): - Returns first matching rule (priority order) - Falls through to second rule when first doesn't match - Returns undefined when no rules match - Handles missing query results gracefully - Handles failed queries gracefully - Matches across any row in the result set - Supports != operator - Returns undefined for empty rules list Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Implement the confidence scoring formula: C_final = C_ground × (1 − α × u_llm) The LLM can only REDUCE confidence — it can never inflate it. C_ground is computed entirely from objective, query-backed evidence factors: 1. Evidence completeness — required vs optional query coverage 2. Signal strength — data volume and investigation depth 3. Signal agreement — supporting vs contradicting evidence ratio 4. Data quality — query success rate, empty results, edge cases Each factor is weighted by playbook-configured values that must sum to ~1.0. The LLM's self-reported uncertainty (u_llm) is discounted by the playbook's alpha parameter (typically 0.35–0.5). confidenceEngine.ts: - computeConfidence(): full formula with all 4 C_ground components - Outputs 0–1 raw score and 1–5 bucketed score - Configurable penalties for critical missing evidence, contradictions, empty results, failed queries, and edge cases confidenceEngine.test.ts (6 tests): - High confidence when all evidence present and agrees - Reduces confidence when critical evidence is missing - Penalizes contradictory signals - LLM uncertainty can only reduce confidence (never inflate) - Handles query failures gracefully - Produces valid 1–5 bucket scores across full u_llm range Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-23T00:10:18Z

📝 Walkthrough

Walkthrough

This PR introduces a complete Playbook Agent Service framework for risk investigation playbooks. It defines a comprehensive type system covering playbook configuration (inputs, evidence ontology, decision logic, confidence computation), implements query catalog resolution with parameterized SQL templates, adds playbook validation with cross-reference checking, provides LLM verdict extraction and validation from free-form text, delivers deterministic hard-rule evaluation, and computes confidence scores from evidence factors and uncertainty metrics.

Changes

Playbook Agent Service

Layer / File(s)	Summary
Type system and domain model `server/services/playbookAgentService/playbookTypes.ts`, `server/services/playbookAgentService/interfaces.ts`	Defines semantic versioning, catalog references, input context fields, evidence ontology, evidence request policy, deep-dive modules, label definitions, hard rules, decision logic, confidence computation, phase budgets, output contracts, playbook structure, verdict shapes, confidence breakdown, query results, and playbook results. Establishes core abstraction layer with `ILLMAdapter`, `IEvidenceStore`, `IArtifactStore`, and `IPlaybookRunner` interfaces for pluggable implementations.
Query catalog parsing and resolution `server/services/playbookAgentService/queryCatalog.ts`, `server/services/playbookAgentService/queryCatalog.test.ts`	Parses SQL template metadata from header comments (catalog ID, version, description), extracts parameter names from `:field_name` placeholders, stores entries keyed by `catalogId@version`, resolves templates by `CatalogReference` or string key, validates allowed references against playbook, and renders SQL by substituting named parameters with positional `$N` placeholders while maintaining ordered bindings.
Playbook configuration loading and validation `server/services/playbookAgentService/playbookLoader.ts`, `server/services/playbookAgentService/playbookLoader.test.ts`	Parses raw YAML/JSON playbook definitions into strongly-typed domain objects with semantic version parsing and catalog reference normalization. Validates that baseline catalog refs are a subset of allowed refs, deep-dive query refs exist in the allowed set, hard-rule outcomes and default labels reference valid labels, confidence weights sum to 1.0, and forbidden catalog IDs are rejected. Throws detailed field-addressed error messages for validation failures.
LLM verdict extraction and validation `server/services/playbookAgentService/jsonExtractor.ts`, `server/services/playbookAgentService/jsonExtractor.test.ts`	Extracts JSON objects from mixed text using brace counting with string/escape awareness, filters template/placeholder patterns, validates required fields against output contracts (supporting both root-level and `FINAL_VERDICT`-wrapped formats), converts raw verdicts into `PlaybookVerdict` shape with defaults for optional fields (confidence metrics, executed queries, supporting/contradicting evidence flags, edge-case indicators), and returns the most recent valid verdict.
Hard rule evaluation engine `server/services/playbookAgentService/hardRuleEngine.ts`, `server/services/playbookAgentService/hardRuleEngine.test.ts`	Implements deterministic rule evaluation that parses dot-path conditions (e.g., `catalog_id.field_name == 'VALUE'`), converts expected literals into appropriate JS types (string, number, boolean, null), compares against query result rows using operators (`==`, `!=`, `>=`, `<=`, `>`, `<`), returns the first matching rule with metadata (ruleId, outcome, bypassLlm flag, matched condition/value), or undefined to fall back to LLM verdict.
Confidence scoring engine `server/services/playbookAgentService/confidenceEngine.ts`, `server/services/playbookAgentService/confidenceEngine.test.ts`	Computes confidence from evidence completeness (baseline vs executed query coverage), signal strength (successful row counts with depth bonus for additional queries), signal agreement (supporting vs contradicting evidence counts with extra dampening for contradictions), and data quality (baseline success rate with penalties for empty results and edge cases). Combines factors into grounded confidence, applies LLM uncertainty reduction formula `C_final = C_ground * (1 - alpha * u_llm)`, and produces both 0–1 and bucketed 1–5 scores with consistent rounding.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

vinaysrao1
cassidyjames
julietshen

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 45.83% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	The PR description covers the key changes (jsonExtractor.ts features and test coverage) with clear test instructions, but lacks a comprehensive narrative, context, and testing methodology.	Add a 'Context & Requests for Reviewers' section explaining the overall purpose and integration with the playbook agent system. Expand the 'Tests' section to describe how testing was performed beyond just listing test names. Clarify any deployment considerations.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Title check	✅ Passed	The PR title clearly and concisely summarizes the main change: adding a JSON extractor component for parsing LLM output verdicts as part of the Playbook Agent system (part 6 of 8).

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Introduces core building blocks for a “playbook agent” workflow: loading/validating playbooks, resolving pre-approved query templates, extracting structured verdict JSON from LLM output, evaluating deterministic hard rules, and computing grounded confidence scores.

Changes:

Added QueryCatalog for parsing/resolving SQL templates and rendering parameter bindings.
Added YAML-to-typed playbook parsing with cross-reference validation.
Added JSON extraction + hard rule evaluation + grounded confidence computation, with unit tests.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
server/services/playbookAgentService/queryCatalog.ts	Adds SQL template parsing, catalog lookup, allowlist enforcement, and parameter binding rendering.
server/services/playbookAgentService/queryCatalog.test.ts	Tests catalog entry parsing, lookup, and rendering behavior.
server/services/playbookAgentService/playbookTypes.ts	Defines playbook schema types, catalog ref formatting/validation, and result types.
server/services/playbookAgentService/playbookLoader.ts	Parses raw YAML objects into typed playbooks and validates cross-references.
server/services/playbookAgentService/playbookLoader.test.ts	Tests playbook parsing defaults and validation failures.
server/services/playbookAgentService/jsonExtractor.ts	Extracts JSON objects/verdicts from LLM output and validates against output contracts.
server/services/playbookAgentService/jsonExtractor.test.ts	Tests JSON extraction, verdict selection, template filtering, and contract validation.
server/services/playbookAgentService/interfaces.ts	Adds framework adapter interfaces for LLM, evidence execution, artifact storage, and runner.
server/services/playbookAgentService/hardRuleEngine.ts	Adds deterministic hard rule evaluation over query results.
server/services/playbookAgentService/hardRuleEngine.test.ts	Tests rule matching, operator handling, and graceful failure behavior.
server/services/playbookAgentService/confidenceEngine.ts	Adds confidence computation combining objective factors with LLM uncertainty.
server/services/playbookAgentService/confidenceEngine.test.ts	Tests confidence behavior across completeness, contradiction, uncertainty, and failures.

+  for (const line of lines) {
+    const match = METADATA_PATTERN.exec(line.trim());
+    if (match) {
+      metadata[match[1]] = match[2].trim();
+    } else if (!line.trim().startsWith('--') || sqlLines.length > 0) {
+      // Once we hit non-comment lines, include everything (including inline comments)
+      sqlLines.push(line);
+    }
+  }


+
+const METADATA_PATTERN =
+  /^--\s*(catalog_id|version|description):\s*(.+)$/;
+const PARAMETER_PATTERN = /:([a-z_][a-z0-9_]*)/g;


+   * Uses parameterized substitution (not string interpolation) to prevent injection.
+   * Returns the SQL string with `:param` tokens replaced by their values.


+  get catalogIds(): string[] {
+    return [...this.#entries.keys()];
+  }


+function toCatalogReference(raw: RawCatalogRef): CatalogReference {
+  const ref: CatalogReference = {
+    catalogId: raw.catalog_id,
+    version: parseSemanticVersion(raw.version),
+    catalogType: raw.catalog_type as CatalogType,
+  };
+  validateCatalogReference(ref);
+  return ref;
+}


+  const evidenceOntology: EvidenceOntology = {
+    version: parseSemanticVersion(r.evidence_ontology.version),
+    evidenceTypes: r.evidence_ontology.evidence_types.map((et) => ({


+  isAllowed(refString: string, playbook: Playbook): boolean {
+    const allowedKeys = new Set(
+      playbook.allowedCatalogRefs.map(formatCatalogReference),
+    );
+    return allowedKeys.has(refString) && this.#entries.has(refString);
+  }


+  it('returns empty array for empty rules', () => {
+    const match = evaluateHardRules([], { some_query: makeResult([{ a: 1 }]) });
+    expect(match).toBeUndefined();
+  });


coderabbitai

Actionable comments posted: 8

🧹 Nitpick comments (4)

server/services/playbookAgentService/queryCatalog.test.ts (1)

3-50: ⚡ Quick win

Add regression tests for parser/render edge cases.

Please add tests for:

SQL body lines like -- version: ... after first statement (must stay in entry.sql and not overwrite metadata), and
Postgres casts like value::jsonb (must not be treated as :jsonb parameters).

Also applies to: 87-101

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/services/playbookAgentService/queryCatalog.test.ts` around lines 3 -
50, Add two regression tests in queryCatalog.test.ts that exercise
parseCatalogEntry: 1) create a SQL string where a comment-looking line `--
version: ...` appears after the first statement (e.g. after a semicolon) and
assert that parseCatalogEntry keeps that line inside entry.sql and does not
treat it as metadata (use parseCatalogEntry and assert entry.sql contains that
comment and metadata version remains from header); 2) create a SQL string
containing a Postgres cast like `value::jsonb` and assert that parseCatalogEntry
does not treat `:jsonb` as a parameter (use parseCatalogEntry and assert
parameters does not include 'jsonb' and sql still contains `::jsonb`). Reference
parseCatalogEntry and queryCatalog.test.ts to add these tests near the existing
cases.

server/services/playbookAgentService/playbookLoader.test.ts (1)

62-157: ⚡ Quick win

Add regression tests for parser validation gaps.

Please add tests that reject invalid evidence_request_policy.query_selection_mode and out-of-range confidence bounded fields (e.g., llm_uncertainty_alpha: -0.1 / 1.5).

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/services/playbookAgentService/playbookLoader.test.ts` around lines 62
- 157, Add two regression tests to parsePlaybook in playbookLoader.test.ts that
assert the parser rejects invalid values: one should clone MINIMAL_PLAYBOOK, set
evidence_request_policy.query_selection_mode to an invalid string (e.g.,
'INVALID_MODE') and expect parsePlaybook(bad) toThrow a message about invalid
query_selection_mode; the other should clone MINIMAL_PLAYBOOK and set
confidence_computation.llm_uncertainty_alpha to out-of-range values (e.g., -0.1
and 1.5) and expect parsePlaybook(bad) toThrow messages about bounds for
llm_uncertainty_alpha (do this as separate it blocks reusing the
MINIMAL_PLAYBOOK and calling parsePlaybook to trigger the validation).

server/services/playbookAgentService/confidenceEngine.test.ts (1)

80-182: ⚡ Quick win

Add regression tests for confidence hardening paths.

Please add cases for: (1) out-of-range/non-finite uLlm and alpha clamping, and (2) ignoring LLM-claimed additional queries that do not exist as successful queryResults.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/services/playbookAgentService/confidenceEngine.test.ts` around lines
80 - 182, Add two regression tests in the existing confidenceEngine suite that
exercise the hardening paths of computeConfidence: (1) "clamps uLlm and alpha" —
call computeConfidence with makeVerdict inputs that set uLlm and alpha to
out-of-range and non-finite values (NaN, Infinity, -1, >1) and assert the
returned result normalizes/clamps these (confidenceScore0To1 within 0..1,
confidenceScore1To5 integer 1..5, and cGround/other internal scores are finite
and in expected ranges); (2) "ignores LLM-claimed additional queries that are
not successful" — create a verdict via makeVerdict that lists
additionalQueries/queriesExecuted including names missing or unsuccessful in
makeQueryResults, run computeConfidence and assert those claimed-but-missing
queries do not count toward baseCoverage/evidence counts (e.g., baseCoverage
does not increase, evidenceCompleteness/ dataQuality reflect only successful
queryResults) and overall scores remain within valid bounds; use existing
helpers (makeVerdict, makeQueryResults, makePlaybook, computeConfidence) and
mirror the style of neighboring tests.

server/services/playbookAgentService/jsonExtractor.test.ts (1)

52-103: ⚡ Quick win

Add tests for non-finite numeric inputs in verdict JSON.

Please cover u_llm: NaN/Infinity and confidence_score: NaN/Infinity to lock in rejection (or clamping for u_llm, if intended).

Also applies to: 105-129

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/services/playbookAgentService/jsonExtractor.test.ts` around lines 52 -
103, Add unit tests in jsonExtractor.test.ts that feed verdict JSON containing
non-finite numeric values (u_llm: NaN, u_llm: Infinity, confidence_score: NaN,
confidence_score: Infinity) into extractVerdict and assert the code's current
intended handling so the behavior is locked in; specifically add cases near the
existing "extracts verdict..." and "defaults u_llm..." tests (and the similar
block referred to at 105-129) that call extractVerdict with each non-finite
variant and assert the observed outcome (e.g., extractVerdict returns undefined
when confidence_score is non-finite, and for u_llm assert it either gets
clamped/replaced with the default 0.5 or results in undefined—match and assert
the current implementation so the contract is explicit).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@server/services/playbookAgentService/confidenceEngine.ts`:
- Around line 41-43: Clamp and validate numeric inputs before using them in the
confidence calculations: sanitize cc.llmUncertaintyAlpha (alpha) and
verdict.uLlm by ensuring they are finite numbers and fall inside safe bounds
(e.g., enforce a min>0 and a sensible max, and default to a safe constant if
NaN/Infinity), then use the sanitized values in the formula where alpha and uLlm
are referenced (the occurrences around alpha = cc.llmUncertaintyAlpha / uLlm =
verdict.uLlm and the similar usage at the lines near 171-172). Ensure you handle
zero/negative values that would break math operations and log or fallback
deterministically so the confidence invariants remain valid.
- Around line 55-58: The code is using the LLM-provided verdict.queriesExecuted
to credit additional queries, letting the model over-claim; instead, replace
usage of verdict.queriesExecuted (and the derived queriesExecuted /
additionalQueriesExecuted calculations) with a ground-truth set of executed
query IDs coming from the execution log/results (e.g., an executedQueryIds or
executionResults source you have in the service) and compute
additionalQueriesExecuted by intersecting that trusted executed set with
additionalQueryIds; apply the same change for the other occurrences mentioned
(the blocks around the current queriesExecuted computation at lines 75-79 and
111-117) so all credits are based on the execution log rather than the LLM
claim.

In `@server/services/playbookAgentService/jsonExtractor.ts`:
- Around line 136-149: The code currently checks numeric fields with typeof ===
'number' which allows NaN/Infinity; update the validation in jsonExtractor.ts so
numeric fields (e.g., confidenceScore, source['u_llm'], and the other numeric
occurrence around the later block) are validated with Number.isFinite(value)
instead of typeof checks, returning undefined on non-finite values; ensure the
returned object only includes confidence_score and u_llm when
Number.isFinite(confidenceScore) and Number.isFinite(source['u_llm'])
respectively (and apply the same finite-number guard to the numeric field at the
other occurrence referenced).

In `@server/services/playbookAgentService/playbookLoader.ts`:
- Around line 303-305: The code is hardcoding
evidenceRequestPolicy.querySelectionMode instead of using or validating the
incoming playbook value; update the construction of evidenceRequestPolicy (the
EvidenceRequestPolicy object) to read
r.evidence_request_policy.query_selection_mode, map it from snake_case to the
internal camelCase value (or allowed enum) and validate it against expected
options before assignment, throwing or logging a clear error if the value is
missing/invalid; ensure querySelectionMode is not always set to 'query_id' but
derived and validated from r.evidence_request_policy.
- Around line 363-374: The confidenceComputation object built from cc (in
playbookLoader.ts, variables cc and confidenceComputation) lacks validation of
bounded numeric fields; add range checks for llm_uncertainty_alpha (0..1),
baseCoverageWeight/additionalCoverageWeight (each 0..1 and sum <=1),
evidenceCompletenessWeight/signalStrengthWeight/signalAgreementWeight/dataQualityWeight
(each 0..1), criticalEvidenceMissingPenalty (0..1) and additionalQueryBonusMax
(0..1); if any value is missing use the existing defaults, and if any value is
out of range either clamp to the allowed range or throw a clear error (choose
consistent behavior), and centralize the checks in a small helper (e.g.,
validateConfidenceField or normalizeConfidenceConfig) called where
confidenceComputation is constructed to ensure downstream code receives only
valid bounded values.

In `@server/services/playbookAgentService/queryCatalog.ts`:
- Around line 41-42: The PARAMETER_PATTERN currently matches tokens in Postgres
casts like ::jsonb; update the pattern to only match a single leading colon (not
a double-colon cast) — e.g. replace /:([a-z_][a-z0-9_]*)/g with a pattern that
asserts the colon is not preceded by another colon such as
/(?<!:):([a-z_][a-z0-9_]*)/g or, if lookbehind is unavailable, use
/(^|[^:]):([a-z_][a-z0-9_]*)/g and adjust code to use capture group 2; apply the
same change for the other occurrences that use PARAMETER_PATTERN (the code
around the parsing/rendering helpers referenced in the diff) and, per
guidelines, prefer using the Knex query builder for Postgres access instead of
manual SQL string parameter substitution where feasible.
- Around line 93-97: The constructor for the class builds `#entries` by mapping
entries to keys `${e.catalogId}@${e.version}` and currently silently overwrites
duplicates; modify the constructor to detect duplicate keys when iterating the
provided entries (using the same `${e.catalogId}@${e.version}` key logic), and
if a duplicate is found throw an Error (or reject) that names the conflicting
catalogId and version so callers cannot accidentally supply conflicting
CatalogEntry objects; keep the Map population logic but only insert after
checking the key does not already exist to enforce uniqueness.
- Around line 55-61: The parser currently keeps applying METADATA_PATTERN to
every comment line and can mutate metadata after SQL has started; modify
parseCatalogEntry to introduce a bodyStarted flag (or similar) that is set when
a non-comment line is first pushed into sqlLines, then only attempt to match
METADATA_PATTERN and populate metadata when bodyStarted is false; once
bodyStarted is true, treat all subsequent lines (including lines starting with
'--') as SQL and push them to sqlLines without updating metadata (use the
existing METADATA_PATTERN, match variable, and sqlLines/metadata variables to
locate and change the behavior).

---

Nitpick comments:
In `@server/services/playbookAgentService/confidenceEngine.test.ts`:
- Around line 80-182: Add two regression tests in the existing confidenceEngine
suite that exercise the hardening paths of computeConfidence: (1) "clamps uLlm
and alpha" — call computeConfidence with makeVerdict inputs that set uLlm and
alpha to out-of-range and non-finite values (NaN, Infinity, -1, >1) and assert
the returned result normalizes/clamps these (confidenceScore0To1 within 0..1,
confidenceScore1To5 integer 1..5, and cGround/other internal scores are finite
and in expected ranges); (2) "ignores LLM-claimed additional queries that are
not successful" — create a verdict via makeVerdict that lists
additionalQueries/queriesExecuted including names missing or unsuccessful in
makeQueryResults, run computeConfidence and assert those claimed-but-missing
queries do not count toward baseCoverage/evidence counts (e.g., baseCoverage
does not increase, evidenceCompleteness/ dataQuality reflect only successful
queryResults) and overall scores remain within valid bounds; use existing
helpers (makeVerdict, makeQueryResults, makePlaybook, computeConfidence) and
mirror the style of neighboring tests.

In `@server/services/playbookAgentService/jsonExtractor.test.ts`:
- Around line 52-103: Add unit tests in jsonExtractor.test.ts that feed verdict
JSON containing non-finite numeric values (u_llm: NaN, u_llm: Infinity,
confidence_score: NaN, confidence_score: Infinity) into extractVerdict and
assert the code's current intended handling so the behavior is locked in;
specifically add cases near the existing "extracts verdict..." and "defaults
u_llm..." tests (and the similar block referred to at 105-129) that call
extractVerdict with each non-finite variant and assert the observed outcome
(e.g., extractVerdict returns undefined when confidence_score is non-finite, and
for u_llm assert it either gets clamped/replaced with the default 0.5 or results
in undefined—match and assert the current implementation so the contract is
explicit).

In `@server/services/playbookAgentService/playbookLoader.test.ts`:
- Around line 62-157: Add two regression tests to parsePlaybook in
playbookLoader.test.ts that assert the parser rejects invalid values: one should
clone MINIMAL_PLAYBOOK, set evidence_request_policy.query_selection_mode to an
invalid string (e.g., 'INVALID_MODE') and expect parsePlaybook(bad) toThrow a
message about invalid query_selection_mode; the other should clone
MINIMAL_PLAYBOOK and set confidence_computation.llm_uncertainty_alpha to
out-of-range values (e.g., -0.1 and 1.5) and expect parsePlaybook(bad) toThrow
messages about bounds for llm_uncertainty_alpha (do this as separate it blocks
reusing the MINIMAL_PLAYBOOK and calling parsePlaybook to trigger the
validation).

In `@server/services/playbookAgentService/queryCatalog.test.ts`:
- Around line 3-50: Add two regression tests in queryCatalog.test.ts that
exercise parseCatalogEntry: 1) create a SQL string where a comment-looking line
`-- version: ...` appears after the first statement (e.g. after a semicolon) and
assert that parseCatalogEntry keeps that line inside entry.sql and does not
treat it as metadata (use parseCatalogEntry and assert entry.sql contains that
comment and metadata version remains from header); 2) create a SQL string
containing a Postgres cast like `value::jsonb` and assert that parseCatalogEntry
does not treat `:jsonb` as a parameter (use parseCatalogEntry and assert
parameters does not include 'jsonb' and sql still contains `::jsonb`). Reference
parseCatalogEntry and queryCatalog.test.ts to add these tests near the existing
cases.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 978b00d5-3b43-4b49-90a8-89906a3f6d5d

📥 Commits

Reviewing files that changed from the base of the PR and between 9d6bf65 and fc3a5c2.

📒 Files selected for processing (12)

server/services/playbookAgentService/confidenceEngine.test.ts
server/services/playbookAgentService/confidenceEngine.ts
server/services/playbookAgentService/hardRuleEngine.test.ts
server/services/playbookAgentService/hardRuleEngine.ts
server/services/playbookAgentService/interfaces.ts
server/services/playbookAgentService/jsonExtractor.test.ts
server/services/playbookAgentService/jsonExtractor.ts
server/services/playbookAgentService/playbookLoader.test.ts
server/services/playbookAgentService/playbookLoader.ts
server/services/playbookAgentService/playbookTypes.ts
server/services/playbookAgentService/queryCatalog.test.ts
server/services/playbookAgentService/queryCatalog.ts

coderabbitai · 2026-05-23T00:14:18Z

+  const alpha = cc.llmUncertaintyAlpha;
+  const uLlm = verdict.uLlm;
+


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Clamp and sanitize alpha/uLlm before applying confidence formula.

Unbounded or non-finite values can violate confidence invariants and produce invalid results.

Suggested fix

- const alpha = cc.llmUncertaintyAlpha; - const uLlm = verdict.uLlm; + const alpha = clamp( + Number.isFinite(cc.llmUncertaintyAlpha) ? cc.llmUncertaintyAlpha : 0, + 0, + 1, + ); + const uLlm = clamp(Number.isFinite(verdict.uLlm) ? verdict.uLlm : 0.5, 0, 1);

Also applies to: 171-172

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/services/playbookAgentService/confidenceEngine.ts` around lines 41 - 43, Clamp and validate numeric inputs before using them in the confidence calculations: sanitize cc.llmUncertaintyAlpha (alpha) and verdict.uLlm by ensuring they are finite numbers and fall inside safe bounds (e.g., enforce a min>0 and a sensible max, and default to a safe constant if NaN/Infinity), then use the sanitized values in the formula where alpha and uLlm are referenced (the occurrences around alpha = cc.llmUncertaintyAlpha / uLlm = verdict.uLlm and the similar usage at the lines near 171-172). Ensure you handle zero/negative values that would break math operations and log or fallback deterministically so the confidence invariants remain valid.

coderabbitai · 2026-05-23T00:14:18Z

+  const queriesExecuted = new Set(verdict.queriesExecuted);
+  const additionalQueriesExecuted = new Set(
+    [...queriesExecuted].filter((id) => additionalQueryIds.has(id)),
+  );


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Additional-query credit is currently LLM-claim based, not execution-grounded.

Using verdict.queriesExecuted directly allows the model to over-claim additional queries and inflate completeness/bonus.

Suggested fix

- const queriesExecuted = new Set(verdict.queriesExecuted); - const additionalQueriesExecuted = new Set( - [...queriesExecuted].filter((id) => additionalQueryIds.has(id)), - ); + const additionalQueriesExecuted = new Set( + Object.entries(queryResults.results) + .filter(([queryId, result]) => additionalQueryIds.has(queryId) && result.success) + .map(([queryId]) => queryId), + );

Also applies to: 75-79, 111-117

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/services/playbookAgentService/confidenceEngine.ts` around lines 55 - 58, The code is using the LLM-provided verdict.queriesExecuted to credit additional queries, letting the model over-claim; instead, replace usage of verdict.queriesExecuted (and the derived queriesExecuted / additionalQueriesExecuted calculations) with a ground-truth set of executed query IDs coming from the execution log/results (e.g., an executedQueryIds or executionResults source you have in the service) and compute additionalQueriesExecuted by intersecting that trusted executed set with additionalQueryIds; apply the same change for the other occurrences mentioned (the blocks around the current queriesExecuted computation at lines 75-79 and 111-117) so all credits are based on the execution log rather than the LLM claim.

coderabbitai · 2026-05-23T00:14:18Z

+  const evidenceRequestPolicy: EvidenceRequestPolicy = {
+    querySelectionMode: 'query_id',
+    maxRounds: r.evidence_request_policy.max_rounds,


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

query_selection_mode is ignored during parsing.

querySelectionMode is currently hardcoded, so malformed or unexpected playbook input is silently accepted instead of validated/mapped.

Suggested fix

+ const querySelectionMode = r.evidence_request_policy.query_selection_mode; + if (querySelectionMode !== 'query_id') { + throw new Error( + `Invalid query_selection_mode '${querySelectionMode}', expected 'query_id'`, + ); + } + const evidenceRequestPolicy: EvidenceRequestPolicy = { - querySelectionMode: 'query_id', + querySelectionMode, maxRounds: r.evidence_request_policy.max_rounds, targetConfidence: r.evidence_request_policy.target_confidence,

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const evidenceRequestPolicy: EvidenceRequestPolicy = {

querySelectionMode: 'query_id',

maxRounds: r.evidence_request_policy.max_rounds,

const querySelectionMode = r.evidence_request_policy.query_selection_mode;

if (querySelectionMode !== 'query_id') {

throw new Error(

`Invalid query_selection_mode '${querySelectionMode}', expected 'query_id'`,

);

}

const evidenceRequestPolicy: EvidenceRequestPolicy = {

querySelectionMode,

maxRounds: r.evidence_request_policy.max_rounds,

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/services/playbookAgentService/playbookLoader.ts` around lines 303 - 305, The code is hardcoding evidenceRequestPolicy.querySelectionMode instead of using or validating the incoming playbook value; update the construction of evidenceRequestPolicy (the EvidenceRequestPolicy object) to read r.evidence_request_policy.query_selection_mode, map it from snake_case to the internal camelCase value (or allowed enum) and validate it against expected options before assignment, throwing or logging a clear error if the value is missing/invalid; ensure querySelectionMode is not always set to 'query_id' but derived and validated from r.evidence_request_policy.

coderabbitai · 2026-05-23T00:14:18Z

+  const cc = r.confidence_computation;
+  const confidenceComputation: ConfidenceComputation = {
+    evidenceCompletenessWeight: cc.evidence_completeness_weight,
+    signalStrengthWeight: cc.signal_strength_weight,
+    signalAgreementWeight: cc.signal_agreement_weight,
+    dataQualityWeight: cc.data_quality_weight,
+    llmUncertaintyAlpha: cc.llm_uncertainty_alpha,
+    baseCoverageWeight: cc.base_coverage_weight ?? 0.7,
+    additionalCoverageWeight: cc.additional_coverage_weight ?? 0.3,
+    criticalEvidenceMissingPenalty: cc.critical_evidence_missing_penalty ?? 0.3,
+    additionalQueryBonusMax: cc.additional_query_bonus_max ?? 0.2,
+  };


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Confidence config lacks range validation for bounded fields.

llm_uncertainty_alpha, coverage weights, penalties, and bonuses are accepted without bounds checks. Invalid values can produce non-physical confidence behavior downstream.

Suggested fix

const cc = r.confidence_computation; + const boundedFields: Array<[string, number]> = [ + ['llm_uncertainty_alpha', cc.llm_uncertainty_alpha], + ['base_coverage_weight', cc.base_coverage_weight ?? 0.7], + ['additional_coverage_weight', cc.additional_coverage_weight ?? 0.3], + ['critical_evidence_missing_penalty', cc.critical_evidence_missing_penalty ?? 0.3], + ['additional_query_bonus_max', cc.additional_query_bonus_max ?? 0.2], + ]; + for (const [name, value] of boundedFields) { + if (!Number.isFinite(value) || value < 0 || value > 1) { + throw new Error(`Invalid ${name}: ${value}. Expected a number in [0, 1]`); + } + }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/services/playbookAgentService/playbookLoader.ts` around lines 363 - 374, The confidenceComputation object built from cc (in playbookLoader.ts, variables cc and confidenceComputation) lacks validation of bounded numeric fields; add range checks for llm_uncertainty_alpha (0..1), baseCoverageWeight/additionalCoverageWeight (each 0..1 and sum <=1), evidenceCompletenessWeight/signalStrengthWeight/signalAgreementWeight/dataQualityWeight (each 0..1), criticalEvidenceMissingPenalty (0..1) and additionalQueryBonusMax (0..1); if any value is missing use the existing defaults, and if any value is out of range either clamp to the allowed range or throw a clear error (choose consistent behavior), and centralize the checks in a small helper (e.g., validateConfidenceField or normalizeConfidenceConfig) called where confidenceComputation is constructed to ensure downstream code receives only valid bounded values.

coderabbitai · 2026-05-23T00:14:18Z

+const PARAMETER_PATTERN = /:([a-z_][a-z0-9_]*)/g;
+


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Parameter regex breaks PostgreSQL ::type casts.

The current :param matcher also captures :jsonb from ::jsonb, so parsing/rendering can corrupt valid Postgres SQL and throw false missing-parameter errors.

Suggested fix

-const PARAMETER_PATTERN = /:([a-z_][a-z0-9_]*)/g; +const PARAMETER_PATTERN = /(?<!:):([a-z_][a-z0-9_]*)/g;

As per coding guidelines, "Use Knex query builder for Postgres data access".

Also applies to: 76-77, 144-153

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/services/playbookAgentService/queryCatalog.ts` around lines 41 - 42, The PARAMETER_PATTERN currently matches tokens in Postgres casts like ::jsonb; update the pattern to only match a single leading colon (not a double-colon cast) — e.g. replace /:([a-z_][a-z0-9_]*)/g with a pattern that asserts the colon is not preceded by another colon such as /(?<!:):([a-z_][a-z0-9_]*)/g or, if lookbehind is unavailable, use /(^|[^:]):([a-z_][a-z0-9_]*)/g and adjust code to use capture group 2; apply the same change for the other occurrences that use PARAMETER_PATTERN (the code around the parsing/rendering helpers referenced in the diff) and, per guidelines, prefer using the Knex query builder for Postgres access instead of manual SQL string parameter substitution where feasible.

coderabbitai · 2026-05-23T00:14:18Z

+    const match = METADATA_PATTERN.exec(line.trim());
+    if (match) {
+      metadata[match[1]] = match[2].trim();
+    } else if (!line.trim().startsWith('--') || sqlLines.length > 0) {
+      // Once we hit non-comment lines, include everything (including inline comments)
+      sqlLines.push(line);
+    }


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Stop parsing metadata once SQL body starts.

parseCatalogEntry still treats later -- version:/-- description: lines as header metadata after body parsing begins, which can mutate metadata and drop SQL lines.

Suggested fix

- for (const line of lines) { - const match = METADATA_PATTERN.exec(line.trim()); - if (match) { + for (const line of lines) { + const match = sqlLines.length === 0 ? METADATA_PATTERN.exec(line.trim()) : null; + if (match) { metadata[match[1]] = match[2].trim(); } else if (!line.trim().startsWith('--') || sqlLines.length > 0) { // Once we hit non-comment lines, include everything (including inline comments) sqlLines.push(line); } }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const match = METADATA_PATTERN.exec(line.trim());

if (match) {

metadata[match[1]] = match[2].trim();

} else if (!line.trim().startsWith('--') || sqlLines.length > 0) {

// Once we hit non-comment lines, include everything (including inline comments)

sqlLines.push(line);

}

const match = sqlLines.length === 0 ? METADATA_PATTERN.exec(line.trim()) : null;

if (match) {

metadata[match[1]] = match[2].trim();

} else if (!line.trim().startsWith('--') || sqlLines.length > 0) {

// Once we hit non-comment lines, include everything (including inline comments)

sqlLines.push(line);

}

🧰 Tools

🪛 OpenGrep (1.21.0)

[ERROR] 55-55: Dynamic command passed to child_process.exec/execSync. Use child_process.execFile or spawn with an argument array instead.

(coderabbit.command-injection.exec-js)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/services/playbookAgentService/queryCatalog.ts` around lines 55 - 61, The parser currently keeps applying METADATA_PATTERN to every comment line and can mutate metadata after SQL has started; modify parseCatalogEntry to introduce a bodyStarted flag (or similar) that is set when a non-comment line is first pushed into sqlLines, then only attempt to match METADATA_PATTERN and populate metadata when bodyStarted is false; once bodyStarted is true, treat all subsequent lines (including lines starting with '--') as SQL and push them to sqlLines without updating metadata (use the existing METADATA_PATTERN, match variable, and sqlLines/metadata variables to locate and change the behavior).

coderabbitai · 2026-05-23T00:14:18Z

+  constructor(entries: readonly CatalogEntry[]) {
+    this.#entries = new Map(
+      entries.map((e) => [`${e.catalogId}@${e.version}`, e]),
+    );
+  }


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject duplicate catalogId@version entries in constructor.

Current map construction silently overwrites earlier entries for the same key, which can hide conflicting catalog definitions.

Suggested fix

constructor(entries: readonly CatalogEntry[]) { - this.#entries = new Map( - entries.map((e) => [`${e.catalogId}@${e.version}`, e]), - ); + this.#entries = new Map(); + for (const entry of entries) { + const key = `${entry.catalogId}@${entry.version}`; + if (this.#entries.has(key)) { + throw new Error(`Duplicate catalog entry '${key}'`); + } + this.#entries.set(key, entry); + } }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/services/playbookAgentService/queryCatalog.ts` around lines 93 - 97, The constructor for the class builds `#entries` by mapping entries to keys `${e.catalogId}@${e.version}` and currently silently overwrites duplicates; modify the constructor to detect duplicate keys when iterating the provided entries (using the same `${e.catalogId}@${e.version}` key logic), and if a duplicate is found throw an Error (or reject) that names the conflicting catalogId and version so callers cannot accidentally supply conflicting CatalogEntry objects; keep the Map population logic but only insert after checking the key does not already exist to enforce uniqueness.

Implement the JSON extractor that parses structured verdicts from free-form LLM output text. LLMs often wrap JSON in markdown fences, prose, or emit multiple JSON objects — this handles all of it. jsonExtractor.ts: - extractJsonObjects(): balanced brace counting for robust extraction (handles nested objects, strings containing braces, escaped characters) - Template filtering: rejects unfilled placeholder patterns like "<NPG | PG | further review>", "[REQUIRED]", "[if available]" - extractVerdict(): returns the last valid, non-template verdict - Supports both FINAL_VERDICT wrapper and direct root-level fields - validateAgainstContract(): checks required fields from output contract - Defaults u_llm to 0.5 when LLM doesn't report uncertainty jsonExtractor.test.ts (9 tests): - Extracts single/multiple/nested JSON objects from text - Handles strings containing braces - Skips invalid JSON - Extracts from FINAL_VERDICT wrapper and root-level fields - Returns last valid verdict (most recent in output) - Filters out template patterns - Validates against output contract - Validates inside FINAL_VERDICT wrapper Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

🧹 Nitpick comments (1)

server/services/playbookAgentService/jsonExtractor.ts (1)

170-178: 💤 Low value

Array element types are not validated.

The arrays are checked with Array.isArray() but elements are cast to string[] without validation. If the LLM provides non-string elements (e.g., queries_executed: [1, 2]), downstream code expecting strings (e.g., Set lookups in confidenceEngine.ts) may behave unexpectedly due to type coercion.

Proposed fix to filter non-string elements

-    queries_executed: Array.isArray(source['queries_executed'])
-      ? (source['queries_executed'] as string[])
+    queries_executed: Array.isArray(source['queries_executed'])
+      ? (source['queries_executed'] as unknown[]).filter((x): x is string => typeof x === 'string')
       : undefined,
-    supporting_evidence: Array.isArray(source['supporting_evidence'])
-      ? (source['supporting_evidence'] as string[])
+    supporting_evidence: Array.isArray(source['supporting_evidence'])
+      ? (source['supporting_evidence'] as unknown[]).filter((x): x is string => typeof x === 'string')
       : undefined,
-    contradicting_evidence: Array.isArray(source['contradicting_evidence'])
-      ? (source['contradicting_evidence'] as string[])
+    contradicting_evidence: Array.isArray(source['contradicting_evidence'])
+      ? (source['contradicting_evidence'] as unknown[]).filter((x): x is string => typeof x === 'string')
       : undefined,

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/services/playbookAgentService/jsonExtractor.ts` around lines 170 -
178, The array fields parsed in jsonExtractor.ts (queries_executed,
supporting_evidence, contradicting_evidence) currently use Array.isArray(...)
then cast to string[] without validating element types; update the parsing to
filter each array to only include elements where typeof item === 'string' (e.g.,
Array.isArray(source['queries_executed']) ? (source['queries_executed'] as
any[]).filter(x => typeof x === 'string') : undefined) so downstream code (like
Set lookups in confidenceEngine.ts) receives a true string[]; apply the same
filtering pattern for supporting_evidence and contradicting_evidence and keep
undefined when the source is not an array or yields no string elements.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@server/services/playbookAgentService/jsonExtractor.ts`:
- Around line 170-178: The array fields parsed in jsonExtractor.ts
(queries_executed, supporting_evidence, contradicting_evidence) currently use
Array.isArray(...) then cast to string[] without validating element types;
update the parsing to filter each array to only include elements where typeof
item === 'string' (e.g., Array.isArray(source['queries_executed']) ?
(source['queries_executed'] as any[]).filter(x => typeof x === 'string') :
undefined) so downstream code (like Set lookups in confidenceEngine.ts) receives
a true string[]; apply the same filtering pattern for supporting_evidence and
contradicting_evidence and keep undefined when the source is not an array or
yields no string elements.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 47e94301-5b6b-42e9-8af4-bd606777cc9c

📥 Commits

Reviewing files that changed from the base of the PR and between fc3a5c2 and 9a2e08f.

📒 Files selected for processing (2)

server/services/playbookAgentService/jsonExtractor.test.ts
server/services/playbookAgentService/jsonExtractor.ts

rashmiraghunandan and others added 5 commits May 22, 2026 16:33

Copilot AI review requested due to automatic review settings May 23, 2026 00:10

rashmiraghunandan requested review from cassidyjames, dom-notion, juanmrad, julietshen and vinaysrao1 as code owners May 23, 2026 00:10

Copilot AI reviewed May 23, 2026

View reviewed changes

rashmiraghunandan mentioned this pull request May 23, 2026

Playbook Agent: runner, service, and barrel exports (7/8) #578

Open

1 task

coderabbitai Bot reviewed May 23, 2026

View reviewed changes

rashmiraghunandan force-pushed the playbook-agent/06-json-extractor branch from fc3a5c2 to 9a2e08f Compare May 23, 2026 01:56

coderabbitai Bot reviewed May 23, 2026

View reviewed changes

cassidyjames changed the title ~~[6/8] Playbook Agent: JSON extractor for LLM output parsing~~ Playbook Agent: JSON extractor for LLM output parsing (6/8) May 26, 2026

This was referenced May 26, 2026

Playbook Agent: examples and component walkthrough (8/8) #579

Open

Playbook Agent: types and abstract interfaces (1/8) #572

Open

		* Uses parameterized substitution (not string interpolation) to prevent injection.
		* Returns the SQL string with `:param` tokens replaced by their values.

		const alpha = cc.llmUncertaintyAlpha;
		const uLlm = verdict.uLlm;

Conversation

rashmiraghunandan commented May 23, 2026 • edited by cassidyjames Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

jsonExtractor.ts

jsonExtractor.test.ts — 9 tests

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rashmiraghunandan commented May 23, 2026 •

edited by cassidyjames

Loading

`jsonExtractor.ts`

`jsonExtractor.test.ts` — 9 tests

coderabbitai Bot commented May 23, 2026 •

edited

Loading