Skip to content

Playbook Agent: JSON extractor for LLM output parsing (6/8)#577

Open
rashmiraghunandan wants to merge 6 commits into
roostorg:mainfrom
rashmiraghunandan:playbook-agent/06-json-extractor
Open

Playbook Agent: JSON extractor for LLM output parsing (6/8)#577
rashmiraghunandan wants to merge 6 commits into
roostorg:mainfrom
rashmiraghunandan:playbook-agent/06-json-extractor

Conversation

@rashmiraghunandan
Copy link
Copy Markdown

@rashmiraghunandan rashmiraghunandan commented May 23, 2026

Summary

Robust structured verdict parser that extracts JSON from free-form LLM text. LLMs wrap JSON in markdown fences, prose, or emit multiple objects — this handles all of it.

Part 6 of 8 — depends on #576 (confidence engine).

jsonExtractor.ts

  • extractJsonObjects(): balanced brace counting (more robust than regex)
    • Handles nested objects, strings containing braces, escaped characters
  • Template filtering: rejects unfilled placeholders like [REQUIRED], [if available] (18 patterns)
  • extractVerdict(): returns the last valid, non-template verdict
  • Supports both FINAL_VERDICT wrapper and direct root-level fields
  • validateAgainstContract(): checks required fields from output contract
  • Defaults u_llm to 0.5 when LLM doesn't report uncertainty

jsonExtractor.test.ts — 9 tests

  • Single / multiple / nested JSON extraction
  • Handles strings containing braces
  • Skips invalid JSON
  • Extracts from FINAL_VERDICT wrapper and root-level fields
  • Returns last valid verdict (most recent)
  • Filters out template patterns
  • Validates against output contract

Test plan

  • npm test -- --testPathPattern=jsonExtractor passes (9 tests)

Depends on #576 Confidence engine. Merge #572, #573, #574, #575, #576 first.

Summary by CodeRabbit

  • New Features

    • Playbook framework for investigation workflows with validated playbook parsing and versioned catalog references
    • Confidence scoring that combines evidence coverage, signal agreement, data quality, and LLM uncertainty into 0–1 and 1–5 outputs
    • Deterministic rule evaluation to short-circuit decisions based on query results
    • Query catalog for managed SQL templates with parameter binding
    • Verdict extraction from free-form model output (JSON extraction and contract validation)
  • Tests

    • Added comprehensive test suites covering playbook parsing, confidence computation, rule evaluation, query catalog, and JSON extraction

Review Change Stack

rashmiraghunandan and others added 5 commits May 22, 2026 16:33
Introduce the foundational type system for the playbook agent framework:

playbookTypes.ts — TypeScript types for declarative investigation playbooks:
  - SemanticVersion with parsing and formatting
  - CatalogReference for versioned query references (no floating versions)
  - Playbook config: evidence ontology, labels, decision logic, confidence
    computation weights, output contracts
  - PlaybookVerdict and ConfidenceBreakdown result types
  - QueryResult/QueryResults for evidence store responses

interfaces.ts — Abstract contracts that organizations implement:
  - ILLMAdapter: pluggable LLM provider (OpenAI, Anthropic, local, etc.)
  - IEvidenceStore: execute pre-approved queries by catalog ID
  - IArtifactStore: insert-only storage for immutable audit trail
  - IPlaybookRunner: core runner interface

These types enforce the key safety property: playbooks reference queries by
catalog_id@version and never contain SQL, table names, or field mappings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implement the playbook loader that converts raw YAML/JSON playbook configs
into validated TypeScript Playbook objects.

playbookLoader.ts:
  - parsePlaybook(): snake_case YAML input → camelCase TypeScript output
  - Cross-reference validation:
    - baseline_catalog_refs must be a subset of allowed_catalog_refs
    - Deep-dive module query refs must be in the allowed set
    - Hard rule outcomes must reference valid label IDs
    - Default label must exist in the labels list
    - Confidence computation weights must sum to ~1.0
  - Validates semantic versions, catalog IDs, input field types, severities
  - Rejects forbidden floating versions (latest, head, main, master, current)

playbookLoader.test.ts (7 tests):
  - Parses valid playbook with correct type conversion
  - Applies default values for optional confidence fields
  - Rejects baseline ref not in allowed set
  - Rejects invalid default label
  - Rejects confidence weights that don't sum to ~1.0
  - Rejects invalid semantic versions
  - Validates hard rule outcomes against labels

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implement the query catalog — the trust boundary between the AI agent and
the database. The agent requests evidence by catalog_id@version; it never
writes SQL or sees table names.

queryCatalog.ts:
  - parseCatalogEntry(): extract metadata (catalog_id, version, description)
    from SQL comment headers and discover :param tokens
  - QueryCatalog class:
    - resolve(): look up CatalogReference → CatalogEntry
    - resolveByString(): look up "catalog_id@version" string
    - isAllowed(): validate a ref is in the playbook's allowed set
    - renderSql(): substitute :param tokens with positional $N bindings
      for safe parameterized execution (prevents SQL injection)

queryCatalog.test.ts (7 tests):
  - Parses SQL template with metadata comments
  - Throws on missing catalog_id or version
  - Deduplicates repeated parameters
  - Resolves by CatalogReference and by string
  - Renders SQL with positional parameters and bindings array
  - Throws on missing required parameter

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implement the hard rule engine — deterministic rules evaluated before the
LLM runs. When a hard rule fires, its outcome is locked and the LLM
verdict cannot override it.

This is the key safety mechanism for critical cases. For example, a known
CSAM series hash match always results in REPORT_AND_REMOVE regardless of
what the LLM produces — no model drift, no hallucination risk.

hardRuleEngine.ts:
  - evaluateHardRules(): evaluate rules in order, return first match
  - Simple dot-path expression format: "catalog_id.field_name == 'VALUE'"
  - Supports operators: ==, !=, >=, <=, >, <
  - Matches against ANY row in the query result set
  - Returns undefined if no rules match (LLM verdict is used)

hardRuleEngine.test.ts (8 tests):
  - Returns first matching rule (priority order)
  - Falls through to second rule when first doesn't match
  - Returns undefined when no rules match
  - Handles missing query results gracefully
  - Handles failed queries gracefully
  - Matches across any row in the result set
  - Supports != operator
  - Returns undefined for empty rules list

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implement the confidence scoring formula:
  C_final = C_ground × (1 − α × u_llm)

The LLM can only REDUCE confidence — it can never inflate it. C_ground is
computed entirely from objective, query-backed evidence factors:

  1. Evidence completeness — required vs optional query coverage
  2. Signal strength — data volume and investigation depth
  3. Signal agreement — supporting vs contradicting evidence ratio
  4. Data quality — query success rate, empty results, edge cases

Each factor is weighted by playbook-configured values that must sum to ~1.0.
The LLM's self-reported uncertainty (u_llm) is discounted by the playbook's
alpha parameter (typically 0.35–0.5).

confidenceEngine.ts:
  - computeConfidence(): full formula with all 4 C_ground components
  - Outputs 0–1 raw score and 1–5 bucketed score
  - Configurable penalties for critical missing evidence, contradictions,
    empty results, failed queries, and edge cases

confidenceEngine.test.ts (6 tests):
  - High confidence when all evidence present and agrees
  - Reduces confidence when critical evidence is missing
  - Penalizes contradictory signals
  - LLM uncertainty can only reduce confidence (never inflate)
  - Handles query failures gracefully
  - Produces valid 1–5 bucket scores across full u_llm range

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 23, 2026 00:10
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 23, 2026

📝 Walkthrough

Walkthrough

This PR introduces a complete Playbook Agent Service framework for risk investigation playbooks. It defines a comprehensive type system covering playbook configuration (inputs, evidence ontology, decision logic, confidence computation), implements query catalog resolution with parameterized SQL templates, adds playbook validation with cross-reference checking, provides LLM verdict extraction and validation from free-form text, delivers deterministic hard-rule evaluation, and computes confidence scores from evidence factors and uncertainty metrics.

Changes

Playbook Agent Service

Layer / File(s) Summary
Type system and domain model
server/services/playbookAgentService/playbookTypes.ts, server/services/playbookAgentService/interfaces.ts
Defines semantic versioning, catalog references, input context fields, evidence ontology, evidence request policy, deep-dive modules, label definitions, hard rules, decision logic, confidence computation, phase budgets, output contracts, playbook structure, verdict shapes, confidence breakdown, query results, and playbook results. Establishes core abstraction layer with ILLMAdapter, IEvidenceStore, IArtifactStore, and IPlaybookRunner interfaces for pluggable implementations.
Query catalog parsing and resolution
server/services/playbookAgentService/queryCatalog.ts, server/services/playbookAgentService/queryCatalog.test.ts
Parses SQL template metadata from header comments (catalog ID, version, description), extracts parameter names from :field_name placeholders, stores entries keyed by catalogId@version, resolves templates by CatalogReference or string key, validates allowed references against playbook, and renders SQL by substituting named parameters with positional $N placeholders while maintaining ordered bindings.
Playbook configuration loading and validation
server/services/playbookAgentService/playbookLoader.ts, server/services/playbookAgentService/playbookLoader.test.ts
Parses raw YAML/JSON playbook definitions into strongly-typed domain objects with semantic version parsing and catalog reference normalization. Validates that baseline catalog refs are a subset of allowed refs, deep-dive query refs exist in the allowed set, hard-rule outcomes and default labels reference valid labels, confidence weights sum to 1.0, and forbidden catalog IDs are rejected. Throws detailed field-addressed error messages for validation failures.
LLM verdict extraction and validation
server/services/playbookAgentService/jsonExtractor.ts, server/services/playbookAgentService/jsonExtractor.test.ts
Extracts JSON objects from mixed text using brace counting with string/escape awareness, filters template/placeholder patterns, validates required fields against output contracts (supporting both root-level and FINAL_VERDICT-wrapped formats), converts raw verdicts into PlaybookVerdict shape with defaults for optional fields (confidence metrics, executed queries, supporting/contradicting evidence flags, edge-case indicators), and returns the most recent valid verdict.
Hard rule evaluation engine
server/services/playbookAgentService/hardRuleEngine.ts, server/services/playbookAgentService/hardRuleEngine.test.ts
Implements deterministic rule evaluation that parses dot-path conditions (e.g., catalog_id.field_name == 'VALUE'), converts expected literals into appropriate JS types (string, number, boolean, null), compares against query result rows using operators (==, !=, >=, <=, >, <), returns the first matching rule with metadata (ruleId, outcome, bypassLlm flag, matched condition/value), or undefined to fall back to LLM verdict.
Confidence scoring engine
server/services/playbookAgentService/confidenceEngine.ts, server/services/playbookAgentService/confidenceEngine.test.ts
Computes confidence from evidence completeness (baseline vs executed query coverage), signal strength (successful row counts with depth bonus for additional queries), signal agreement (supporting vs contradicting evidence counts with extra dampening for contradictions), and data quality (baseline success rate with penalties for empty results and edge cases). Combines factors into grounded confidence, applies LLM uncertainty reduction formula C_final = C_ground * (1 - alpha * u_llm), and produces both 0–1 and bucketed 1–5 scores with consistent rounding.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • vinaysrao1
  • cassidyjames
  • julietshen
🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 45.83% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The PR description covers the key changes (jsonExtractor.ts features and test coverage) with clear test instructions, but lacks a comprehensive narrative, context, and testing methodology. Add a 'Context & Requests for Reviewers' section explaining the overall purpose and integration with the playbook agent system. Expand the 'Tests' section to describe how testing was performed beyond just listing test names. Clarify any deployment considerations.
✅ Passed checks (3 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The PR title clearly and concisely summarizes the main change: adding a JSON extractor component for parsing LLM output verdicts as part of the Playbook Agent system (part 6 of 8).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Introduces core building blocks for a “playbook agent” workflow: loading/validating playbooks, resolving pre-approved query templates, extracting structured verdict JSON from LLM output, evaluating deterministic hard rules, and computing grounded confidence scores.

Changes:

  • Added QueryCatalog for parsing/resolving SQL templates and rendering parameter bindings.
  • Added YAML-to-typed playbook parsing with cross-reference validation.
  • Added JSON extraction + hard rule evaluation + grounded confidence computation, with unit tests.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
server/services/playbookAgentService/queryCatalog.ts Adds SQL template parsing, catalog lookup, allowlist enforcement, and parameter binding rendering.
server/services/playbookAgentService/queryCatalog.test.ts Tests catalog entry parsing, lookup, and rendering behavior.
server/services/playbookAgentService/playbookTypes.ts Defines playbook schema types, catalog ref formatting/validation, and result types.
server/services/playbookAgentService/playbookLoader.ts Parses raw YAML objects into typed playbooks and validates cross-references.
server/services/playbookAgentService/playbookLoader.test.ts Tests playbook parsing defaults and validation failures.
server/services/playbookAgentService/jsonExtractor.ts Extracts JSON objects/verdicts from LLM output and validates against output contracts.
server/services/playbookAgentService/jsonExtractor.test.ts Tests JSON extraction, verdict selection, template filtering, and contract validation.
server/services/playbookAgentService/interfaces.ts Adds framework adapter interfaces for LLM, evidence execution, artifact storage, and runner.
server/services/playbookAgentService/hardRuleEngine.ts Adds deterministic hard rule evaluation over query results.
server/services/playbookAgentService/hardRuleEngine.test.ts Tests rule matching, operator handling, and graceful failure behavior.
server/services/playbookAgentService/confidenceEngine.ts Adds confidence computation combining objective factors with LLM uncertainty.
server/services/playbookAgentService/confidenceEngine.test.ts Tests confidence behavior across completeness, contradiction, uncertainty, and failures.

Comment on lines +54 to +62
for (const line of lines) {
const match = METADATA_PATTERN.exec(line.trim());
if (match) {
metadata[match[1]] = match[2].trim();
} else if (!line.trim().startsWith('--') || sqlLines.length > 0) {
// Once we hit non-comment lines, include everything (including inline comments)
sqlLines.push(line);
}
}

const METADATA_PATTERN =
/^--\s*(catalog_id|version|description):\s*(.+)$/;
const PARAMETER_PATTERN = /:([a-z_][a-z0-9_]*)/g;
Comment on lines +132 to +133
* Uses parameterized substitution (not string interpolation) to prevent injection.
* Returns the SQL string with `:param` tokens replaced by their values.
Comment on lines +162 to +164
get catalogIds(): string[] {
return [...this.#entries.keys()];
}
Comment on lines +139 to +147
function toCatalogReference(raw: RawCatalogRef): CatalogReference {
const ref: CatalogReference = {
catalogId: raw.catalog_id,
version: parseSemanticVersion(raw.version),
catalogType: raw.catalog_type as CatalogType,
};
validateCatalogReference(ref);
return ref;
}
Comment on lines +286 to +288
const evidenceOntology: EvidenceOntology = {
version: parseSemanticVersion(r.evidence_ontology.version),
evidenceTypes: r.evidence_ontology.evidence_types.map((et) => ({
Comment on lines +122 to +127
isAllowed(refString: string, playbook: Playbook): boolean {
const allowedKeys = new Set(
playbook.allowedCatalogRefs.map(formatCatalogReference),
);
return allowedKeys.has(refString) && this.#entries.has(refString);
}
Comment on lines +115 to +118
it('returns empty array for empty rules', () => {
const match = evaluateHardRules([], { some_query: makeResult([{ a: 1 }]) });
expect(match).toBeUndefined();
});
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (4)
server/services/playbookAgentService/queryCatalog.test.ts (1)

3-50: ⚡ Quick win

Add regression tests for parser/render edge cases.

Please add tests for:

  1. SQL body lines like -- version: ... after first statement (must stay in entry.sql and not overwrite metadata), and
  2. Postgres casts like value::jsonb (must not be treated as :jsonb parameters).

Also applies to: 87-101

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/services/playbookAgentService/queryCatalog.test.ts` around lines 3 -
50, Add two regression tests in queryCatalog.test.ts that exercise
parseCatalogEntry: 1) create a SQL string where a comment-looking line `--
version: ...` appears after the first statement (e.g. after a semicolon) and
assert that parseCatalogEntry keeps that line inside entry.sql and does not
treat it as metadata (use parseCatalogEntry and assert entry.sql contains that
comment and metadata version remains from header); 2) create a SQL string
containing a Postgres cast like `value::jsonb` and assert that parseCatalogEntry
does not treat `:jsonb` as a parameter (use parseCatalogEntry and assert
parameters does not include 'jsonb' and sql still contains `::jsonb`). Reference
parseCatalogEntry and queryCatalog.test.ts to add these tests near the existing
cases.
server/services/playbookAgentService/playbookLoader.test.ts (1)

62-157: ⚡ Quick win

Add regression tests for parser validation gaps.

Please add tests that reject invalid evidence_request_policy.query_selection_mode and out-of-range confidence bounded fields (e.g., llm_uncertainty_alpha: -0.1 / 1.5).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/services/playbookAgentService/playbookLoader.test.ts` around lines 62
- 157, Add two regression tests to parsePlaybook in playbookLoader.test.ts that
assert the parser rejects invalid values: one should clone MINIMAL_PLAYBOOK, set
evidence_request_policy.query_selection_mode to an invalid string (e.g.,
'INVALID_MODE') and expect parsePlaybook(bad) toThrow a message about invalid
query_selection_mode; the other should clone MINIMAL_PLAYBOOK and set
confidence_computation.llm_uncertainty_alpha to out-of-range values (e.g., -0.1
and 1.5) and expect parsePlaybook(bad) toThrow messages about bounds for
llm_uncertainty_alpha (do this as separate it blocks reusing the
MINIMAL_PLAYBOOK and calling parsePlaybook to trigger the validation).
server/services/playbookAgentService/confidenceEngine.test.ts (1)

80-182: ⚡ Quick win

Add regression tests for confidence hardening paths.

Please add cases for: (1) out-of-range/non-finite uLlm and alpha clamping, and (2) ignoring LLM-claimed additional queries that do not exist as successful queryResults.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/services/playbookAgentService/confidenceEngine.test.ts` around lines
80 - 182, Add two regression tests in the existing confidenceEngine suite that
exercise the hardening paths of computeConfidence: (1) "clamps uLlm and alpha" —
call computeConfidence with makeVerdict inputs that set uLlm and alpha to
out-of-range and non-finite values (NaN, Infinity, -1, >1) and assert the
returned result normalizes/clamps these (confidenceScore0To1 within 0..1,
confidenceScore1To5 integer 1..5, and cGround/other internal scores are finite
and in expected ranges); (2) "ignores LLM-claimed additional queries that are
not successful" — create a verdict via makeVerdict that lists
additionalQueries/queriesExecuted including names missing or unsuccessful in
makeQueryResults, run computeConfidence and assert those claimed-but-missing
queries do not count toward baseCoverage/evidence counts (e.g., baseCoverage
does not increase, evidenceCompleteness/ dataQuality reflect only successful
queryResults) and overall scores remain within valid bounds; use existing
helpers (makeVerdict, makeQueryResults, makePlaybook, computeConfidence) and
mirror the style of neighboring tests.
server/services/playbookAgentService/jsonExtractor.test.ts (1)

52-103: ⚡ Quick win

Add tests for non-finite numeric inputs in verdict JSON.

Please cover u_llm: NaN/Infinity and confidence_score: NaN/Infinity to lock in rejection (or clamping for u_llm, if intended).

Also applies to: 105-129

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/services/playbookAgentService/jsonExtractor.test.ts` around lines 52 -
103, Add unit tests in jsonExtractor.test.ts that feed verdict JSON containing
non-finite numeric values (u_llm: NaN, u_llm: Infinity, confidence_score: NaN,
confidence_score: Infinity) into extractVerdict and assert the code's current
intended handling so the behavior is locked in; specifically add cases near the
existing "extracts verdict..." and "defaults u_llm..." tests (and the similar
block referred to at 105-129) that call extractVerdict with each non-finite
variant and assert the observed outcome (e.g., extractVerdict returns undefined
when confidence_score is non-finite, and for u_llm assert it either gets
clamped/replaced with the default 0.5 or results in undefined—match and assert
the current implementation so the contract is explicit).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@server/services/playbookAgentService/confidenceEngine.ts`:
- Around line 41-43: Clamp and validate numeric inputs before using them in the
confidence calculations: sanitize cc.llmUncertaintyAlpha (alpha) and
verdict.uLlm by ensuring they are finite numbers and fall inside safe bounds
(e.g., enforce a min>0 and a sensible max, and default to a safe constant if
NaN/Infinity), then use the sanitized values in the formula where alpha and uLlm
are referenced (the occurrences around alpha = cc.llmUncertaintyAlpha / uLlm =
verdict.uLlm and the similar usage at the lines near 171-172). Ensure you handle
zero/negative values that would break math operations and log or fallback
deterministically so the confidence invariants remain valid.
- Around line 55-58: The code is using the LLM-provided verdict.queriesExecuted
to credit additional queries, letting the model over-claim; instead, replace
usage of verdict.queriesExecuted (and the derived queriesExecuted /
additionalQueriesExecuted calculations) with a ground-truth set of executed
query IDs coming from the execution log/results (e.g., an executedQueryIds or
executionResults source you have in the service) and compute
additionalQueriesExecuted by intersecting that trusted executed set with
additionalQueryIds; apply the same change for the other occurrences mentioned
(the blocks around the current queriesExecuted computation at lines 75-79 and
111-117) so all credits are based on the execution log rather than the LLM
claim.

In `@server/services/playbookAgentService/jsonExtractor.ts`:
- Around line 136-149: The code currently checks numeric fields with typeof ===
'number' which allows NaN/Infinity; update the validation in jsonExtractor.ts so
numeric fields (e.g., confidenceScore, source['u_llm'], and the other numeric
occurrence around the later block) are validated with Number.isFinite(value)
instead of typeof checks, returning undefined on non-finite values; ensure the
returned object only includes confidence_score and u_llm when
Number.isFinite(confidenceScore) and Number.isFinite(source['u_llm'])
respectively (and apply the same finite-number guard to the numeric field at the
other occurrence referenced).

In `@server/services/playbookAgentService/playbookLoader.ts`:
- Around line 303-305: The code is hardcoding
evidenceRequestPolicy.querySelectionMode instead of using or validating the
incoming playbook value; update the construction of evidenceRequestPolicy (the
EvidenceRequestPolicy object) to read
r.evidence_request_policy.query_selection_mode, map it from snake_case to the
internal camelCase value (or allowed enum) and validate it against expected
options before assignment, throwing or logging a clear error if the value is
missing/invalid; ensure querySelectionMode is not always set to 'query_id' but
derived and validated from r.evidence_request_policy.
- Around line 363-374: The confidenceComputation object built from cc (in
playbookLoader.ts, variables cc and confidenceComputation) lacks validation of
bounded numeric fields; add range checks for llm_uncertainty_alpha (0..1),
baseCoverageWeight/additionalCoverageWeight (each 0..1 and sum <=1),
evidenceCompletenessWeight/signalStrengthWeight/signalAgreementWeight/dataQualityWeight
(each 0..1), criticalEvidenceMissingPenalty (0..1) and additionalQueryBonusMax
(0..1); if any value is missing use the existing defaults, and if any value is
out of range either clamp to the allowed range or throw a clear error (choose
consistent behavior), and centralize the checks in a small helper (e.g.,
validateConfidenceField or normalizeConfidenceConfig) called where
confidenceComputation is constructed to ensure downstream code receives only
valid bounded values.

In `@server/services/playbookAgentService/queryCatalog.ts`:
- Around line 41-42: The PARAMETER_PATTERN currently matches tokens in Postgres
casts like ::jsonb; update the pattern to only match a single leading colon (not
a double-colon cast) — e.g. replace /:([a-z_][a-z0-9_]*)/g with a pattern that
asserts the colon is not preceded by another colon such as
/(?<!:):([a-z_][a-z0-9_]*)/g or, if lookbehind is unavailable, use
/(^|[^:]):([a-z_][a-z0-9_]*)/g and adjust code to use capture group 2; apply the
same change for the other occurrences that use PARAMETER_PATTERN (the code
around the parsing/rendering helpers referenced in the diff) and, per
guidelines, prefer using the Knex query builder for Postgres access instead of
manual SQL string parameter substitution where feasible.
- Around line 93-97: The constructor for the class builds `#entries` by mapping
entries to keys `${e.catalogId}@${e.version}` and currently silently overwrites
duplicates; modify the constructor to detect duplicate keys when iterating the
provided entries (using the same `${e.catalogId}@${e.version}` key logic), and
if a duplicate is found throw an Error (or reject) that names the conflicting
catalogId and version so callers cannot accidentally supply conflicting
CatalogEntry objects; keep the Map population logic but only insert after
checking the key does not already exist to enforce uniqueness.
- Around line 55-61: The parser currently keeps applying METADATA_PATTERN to
every comment line and can mutate metadata after SQL has started; modify
parseCatalogEntry to introduce a bodyStarted flag (or similar) that is set when
a non-comment line is first pushed into sqlLines, then only attempt to match
METADATA_PATTERN and populate metadata when bodyStarted is false; once
bodyStarted is true, treat all subsequent lines (including lines starting with
'--') as SQL and push them to sqlLines without updating metadata (use the
existing METADATA_PATTERN, match variable, and sqlLines/metadata variables to
locate and change the behavior).

---

Nitpick comments:
In `@server/services/playbookAgentService/confidenceEngine.test.ts`:
- Around line 80-182: Add two regression tests in the existing confidenceEngine
suite that exercise the hardening paths of computeConfidence: (1) "clamps uLlm
and alpha" — call computeConfidence with makeVerdict inputs that set uLlm and
alpha to out-of-range and non-finite values (NaN, Infinity, -1, >1) and assert
the returned result normalizes/clamps these (confidenceScore0To1 within 0..1,
confidenceScore1To5 integer 1..5, and cGround/other internal scores are finite
and in expected ranges); (2) "ignores LLM-claimed additional queries that are
not successful" — create a verdict via makeVerdict that lists
additionalQueries/queriesExecuted including names missing or unsuccessful in
makeQueryResults, run computeConfidence and assert those claimed-but-missing
queries do not count toward baseCoverage/evidence counts (e.g., baseCoverage
does not increase, evidenceCompleteness/ dataQuality reflect only successful
queryResults) and overall scores remain within valid bounds; use existing
helpers (makeVerdict, makeQueryResults, makePlaybook, computeConfidence) and
mirror the style of neighboring tests.

In `@server/services/playbookAgentService/jsonExtractor.test.ts`:
- Around line 52-103: Add unit tests in jsonExtractor.test.ts that feed verdict
JSON containing non-finite numeric values (u_llm: NaN, u_llm: Infinity,
confidence_score: NaN, confidence_score: Infinity) into extractVerdict and
assert the code's current intended handling so the behavior is locked in;
specifically add cases near the existing "extracts verdict..." and "defaults
u_llm..." tests (and the similar block referred to at 105-129) that call
extractVerdict with each non-finite variant and assert the observed outcome
(e.g., extractVerdict returns undefined when confidence_score is non-finite, and
for u_llm assert it either gets clamped/replaced with the default 0.5 or results
in undefined—match and assert the current implementation so the contract is
explicit).

In `@server/services/playbookAgentService/playbookLoader.test.ts`:
- Around line 62-157: Add two regression tests to parsePlaybook in
playbookLoader.test.ts that assert the parser rejects invalid values: one should
clone MINIMAL_PLAYBOOK, set evidence_request_policy.query_selection_mode to an
invalid string (e.g., 'INVALID_MODE') and expect parsePlaybook(bad) toThrow a
message about invalid query_selection_mode; the other should clone
MINIMAL_PLAYBOOK and set confidence_computation.llm_uncertainty_alpha to
out-of-range values (e.g., -0.1 and 1.5) and expect parsePlaybook(bad) toThrow
messages about bounds for llm_uncertainty_alpha (do this as separate it blocks
reusing the MINIMAL_PLAYBOOK and calling parsePlaybook to trigger the
validation).

In `@server/services/playbookAgentService/queryCatalog.test.ts`:
- Around line 3-50: Add two regression tests in queryCatalog.test.ts that
exercise parseCatalogEntry: 1) create a SQL string where a comment-looking line
`-- version: ...` appears after the first statement (e.g. after a semicolon) and
assert that parseCatalogEntry keeps that line inside entry.sql and does not
treat it as metadata (use parseCatalogEntry and assert entry.sql contains that
comment and metadata version remains from header); 2) create a SQL string
containing a Postgres cast like `value::jsonb` and assert that parseCatalogEntry
does not treat `:jsonb` as a parameter (use parseCatalogEntry and assert
parameters does not include 'jsonb' and sql still contains `::jsonb`). Reference
parseCatalogEntry and queryCatalog.test.ts to add these tests near the existing
cases.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 978b00d5-3b43-4b49-90a8-89906a3f6d5d

📥 Commits

Reviewing files that changed from the base of the PR and between 9d6bf65 and fc3a5c2.

📒 Files selected for processing (12)
  • server/services/playbookAgentService/confidenceEngine.test.ts
  • server/services/playbookAgentService/confidenceEngine.ts
  • server/services/playbookAgentService/hardRuleEngine.test.ts
  • server/services/playbookAgentService/hardRuleEngine.ts
  • server/services/playbookAgentService/interfaces.ts
  • server/services/playbookAgentService/jsonExtractor.test.ts
  • server/services/playbookAgentService/jsonExtractor.ts
  • server/services/playbookAgentService/playbookLoader.test.ts
  • server/services/playbookAgentService/playbookLoader.ts
  • server/services/playbookAgentService/playbookTypes.ts
  • server/services/playbookAgentService/queryCatalog.test.ts
  • server/services/playbookAgentService/queryCatalog.ts

Comment on lines +41 to +43
const alpha = cc.llmUncertaintyAlpha;
const uLlm = verdict.uLlm;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Clamp and sanitize alpha/uLlm before applying confidence formula.

Unbounded or non-finite values can violate confidence invariants and produce invalid results.

Suggested fix
-  const alpha = cc.llmUncertaintyAlpha;
-  const uLlm = verdict.uLlm;
+  const alpha = clamp(
+    Number.isFinite(cc.llmUncertaintyAlpha) ? cc.llmUncertaintyAlpha : 0,
+    0,
+    1,
+  );
+  const uLlm = clamp(Number.isFinite(verdict.uLlm) ? verdict.uLlm : 0.5, 0, 1);

Also applies to: 171-172

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/services/playbookAgentService/confidenceEngine.ts` around lines 41 -
43, Clamp and validate numeric inputs before using them in the confidence
calculations: sanitize cc.llmUncertaintyAlpha (alpha) and verdict.uLlm by
ensuring they are finite numbers and fall inside safe bounds (e.g., enforce a
min>0 and a sensible max, and default to a safe constant if NaN/Infinity), then
use the sanitized values in the formula where alpha and uLlm are referenced (the
occurrences around alpha = cc.llmUncertaintyAlpha / uLlm = verdict.uLlm and the
similar usage at the lines near 171-172). Ensure you handle zero/negative values
that would break math operations and log or fallback deterministically so the
confidence invariants remain valid.

Comment on lines +55 to +58
const queriesExecuted = new Set(verdict.queriesExecuted);
const additionalQueriesExecuted = new Set(
[...queriesExecuted].filter((id) => additionalQueryIds.has(id)),
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Additional-query credit is currently LLM-claim based, not execution-grounded.

Using verdict.queriesExecuted directly allows the model to over-claim additional queries and inflate completeness/bonus.

Suggested fix
-  const queriesExecuted = new Set(verdict.queriesExecuted);
-  const additionalQueriesExecuted = new Set(
-    [...queriesExecuted].filter((id) => additionalQueryIds.has(id)),
-  );
+  const additionalQueriesExecuted = new Set(
+    Object.entries(queryResults.results)
+      .filter(([queryId, result]) => additionalQueryIds.has(queryId) && result.success)
+      .map(([queryId]) => queryId),
+  );

Also applies to: 75-79, 111-117

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/services/playbookAgentService/confidenceEngine.ts` around lines 55 -
58, The code is using the LLM-provided verdict.queriesExecuted to credit
additional queries, letting the model over-claim; instead, replace usage of
verdict.queriesExecuted (and the derived queriesExecuted /
additionalQueriesExecuted calculations) with a ground-truth set of executed
query IDs coming from the execution log/results (e.g., an executedQueryIds or
executionResults source you have in the service) and compute
additionalQueriesExecuted by intersecting that trusted executed set with
additionalQueryIds; apply the same change for the other occurrences mentioned
(the blocks around the current queriesExecuted computation at lines 75-79 and
111-117) so all credits are based on the execution log rather than the LLM
claim.

Comment thread server/services/playbookAgentService/jsonExtractor.ts Outdated
Comment on lines +303 to +305
const evidenceRequestPolicy: EvidenceRequestPolicy = {
querySelectionMode: 'query_id',
maxRounds: r.evidence_request_policy.max_rounds,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

query_selection_mode is ignored during parsing.

querySelectionMode is currently hardcoded, so malformed or unexpected playbook input is silently accepted instead of validated/mapped.

Suggested fix
+  const querySelectionMode = r.evidence_request_policy.query_selection_mode;
+  if (querySelectionMode !== 'query_id') {
+    throw new Error(
+      `Invalid query_selection_mode '${querySelectionMode}', expected 'query_id'`,
+    );
+  }
+
   const evidenceRequestPolicy: EvidenceRequestPolicy = {
-    querySelectionMode: 'query_id',
+    querySelectionMode,
     maxRounds: r.evidence_request_policy.max_rounds,
     targetConfidence: r.evidence_request_policy.target_confidence,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const evidenceRequestPolicy: EvidenceRequestPolicy = {
querySelectionMode: 'query_id',
maxRounds: r.evidence_request_policy.max_rounds,
const querySelectionMode = r.evidence_request_policy.query_selection_mode;
if (querySelectionMode !== 'query_id') {
throw new Error(
`Invalid query_selection_mode '${querySelectionMode}', expected 'query_id'`,
);
}
const evidenceRequestPolicy: EvidenceRequestPolicy = {
querySelectionMode,
maxRounds: r.evidence_request_policy.max_rounds,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/services/playbookAgentService/playbookLoader.ts` around lines 303 -
305, The code is hardcoding evidenceRequestPolicy.querySelectionMode instead of
using or validating the incoming playbook value; update the construction of
evidenceRequestPolicy (the EvidenceRequestPolicy object) to read
r.evidence_request_policy.query_selection_mode, map it from snake_case to the
internal camelCase value (or allowed enum) and validate it against expected
options before assignment, throwing or logging a clear error if the value is
missing/invalid; ensure querySelectionMode is not always set to 'query_id' but
derived and validated from r.evidence_request_policy.

Comment on lines +363 to +374
const cc = r.confidence_computation;
const confidenceComputation: ConfidenceComputation = {
evidenceCompletenessWeight: cc.evidence_completeness_weight,
signalStrengthWeight: cc.signal_strength_weight,
signalAgreementWeight: cc.signal_agreement_weight,
dataQualityWeight: cc.data_quality_weight,
llmUncertaintyAlpha: cc.llm_uncertainty_alpha,
baseCoverageWeight: cc.base_coverage_weight ?? 0.7,
additionalCoverageWeight: cc.additional_coverage_weight ?? 0.3,
criticalEvidenceMissingPenalty: cc.critical_evidence_missing_penalty ?? 0.3,
additionalQueryBonusMax: cc.additional_query_bonus_max ?? 0.2,
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Confidence config lacks range validation for bounded fields.

llm_uncertainty_alpha, coverage weights, penalties, and bonuses are accepted without bounds checks. Invalid values can produce non-physical confidence behavior downstream.

Suggested fix
   const cc = r.confidence_computation;
+  const boundedFields: Array<[string, number]> = [
+    ['llm_uncertainty_alpha', cc.llm_uncertainty_alpha],
+    ['base_coverage_weight', cc.base_coverage_weight ?? 0.7],
+    ['additional_coverage_weight', cc.additional_coverage_weight ?? 0.3],
+    ['critical_evidence_missing_penalty', cc.critical_evidence_missing_penalty ?? 0.3],
+    ['additional_query_bonus_max', cc.additional_query_bonus_max ?? 0.2],
+  ];
+  for (const [name, value] of boundedFields) {
+    if (!Number.isFinite(value) || value < 0 || value > 1) {
+      throw new Error(`Invalid ${name}: ${value}. Expected a number in [0, 1]`);
+    }
+  }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/services/playbookAgentService/playbookLoader.ts` around lines 363 -
374, The confidenceComputation object built from cc (in playbookLoader.ts,
variables cc and confidenceComputation) lacks validation of bounded numeric
fields; add range checks for llm_uncertainty_alpha (0..1),
baseCoverageWeight/additionalCoverageWeight (each 0..1 and sum <=1),
evidenceCompletenessWeight/signalStrengthWeight/signalAgreementWeight/dataQualityWeight
(each 0..1), criticalEvidenceMissingPenalty (0..1) and additionalQueryBonusMax
(0..1); if any value is missing use the existing defaults, and if any value is
out of range either clamp to the allowed range or throw a clear error (choose
consistent behavior), and centralize the checks in a small helper (e.g.,
validateConfidenceField or normalizeConfidenceConfig) called where
confidenceComputation is constructed to ensure downstream code receives only
valid bounded values.

Comment on lines +41 to +42
const PARAMETER_PATTERN = /:([a-z_][a-z0-9_]*)/g;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Parameter regex breaks PostgreSQL ::type casts.

The current :param matcher also captures :jsonb from ::jsonb, so parsing/rendering can corrupt valid Postgres SQL and throw false missing-parameter errors.

Suggested fix
-const PARAMETER_PATTERN = /:([a-z_][a-z0-9_]*)/g;
+const PARAMETER_PATTERN = /(?<!:):([a-z_][a-z0-9_]*)/g;

As per coding guidelines, "Use Knex query builder for Postgres data access".

Also applies to: 76-77, 144-153

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/services/playbookAgentService/queryCatalog.ts` around lines 41 - 42,
The PARAMETER_PATTERN currently matches tokens in Postgres casts like ::jsonb;
update the pattern to only match a single leading colon (not a double-colon
cast) — e.g. replace /:([a-z_][a-z0-9_]*)/g with a pattern that asserts the
colon is not preceded by another colon such as /(?<!:):([a-z_][a-z0-9_]*)/g or,
if lookbehind is unavailable, use /(^|[^:]):([a-z_][a-z0-9_]*)/g and adjust code
to use capture group 2; apply the same change for the other occurrences that use
PARAMETER_PATTERN (the code around the parsing/rendering helpers referenced in
the diff) and, per guidelines, prefer using the Knex query builder for Postgres
access instead of manual SQL string parameter substitution where feasible.

Comment on lines +55 to +61
const match = METADATA_PATTERN.exec(line.trim());
if (match) {
metadata[match[1]] = match[2].trim();
} else if (!line.trim().startsWith('--') || sqlLines.length > 0) {
// Once we hit non-comment lines, include everything (including inline comments)
sqlLines.push(line);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Stop parsing metadata once SQL body starts.

parseCatalogEntry still treats later -- version:/-- description: lines as header metadata after body parsing begins, which can mutate metadata and drop SQL lines.

Suggested fix
-  for (const line of lines) {
-    const match = METADATA_PATTERN.exec(line.trim());
-    if (match) {
+  for (const line of lines) {
+    const match = sqlLines.length === 0 ? METADATA_PATTERN.exec(line.trim()) : null;
+    if (match) {
       metadata[match[1]] = match[2].trim();
     } else if (!line.trim().startsWith('--') || sqlLines.length > 0) {
       // Once we hit non-comment lines, include everything (including inline comments)
       sqlLines.push(line);
     }
   }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const match = METADATA_PATTERN.exec(line.trim());
if (match) {
metadata[match[1]] = match[2].trim();
} else if (!line.trim().startsWith('--') || sqlLines.length > 0) {
// Once we hit non-comment lines, include everything (including inline comments)
sqlLines.push(line);
}
const match = sqlLines.length === 0 ? METADATA_PATTERN.exec(line.trim()) : null;
if (match) {
metadata[match[1]] = match[2].trim();
} else if (!line.trim().startsWith('--') || sqlLines.length > 0) {
// Once we hit non-comment lines, include everything (including inline comments)
sqlLines.push(line);
}
🧰 Tools
🪛 OpenGrep (1.21.0)

[ERROR] 55-55: Dynamic command passed to child_process.exec/execSync. Use child_process.execFile or spawn with an argument array instead.

(coderabbit.command-injection.exec-js)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/services/playbookAgentService/queryCatalog.ts` around lines 55 - 61,
The parser currently keeps applying METADATA_PATTERN to every comment line and
can mutate metadata after SQL has started; modify parseCatalogEntry to introduce
a bodyStarted flag (or similar) that is set when a non-comment line is first
pushed into sqlLines, then only attempt to match METADATA_PATTERN and populate
metadata when bodyStarted is false; once bodyStarted is true, treat all
subsequent lines (including lines starting with '--') as SQL and push them to
sqlLines without updating metadata (use the existing METADATA_PATTERN, match
variable, and sqlLines/metadata variables to locate and change the behavior).

Comment on lines +93 to +97
constructor(entries: readonly CatalogEntry[]) {
this.#entries = new Map(
entries.map((e) => [`${e.catalogId}@${e.version}`, e]),
);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject duplicate catalogId@version entries in constructor.

Current map construction silently overwrites earlier entries for the same key, which can hide conflicting catalog definitions.

Suggested fix
   constructor(entries: readonly CatalogEntry[]) {
-    this.#entries = new Map(
-      entries.map((e) => [`${e.catalogId}@${e.version}`, e]),
-    );
+    this.#entries = new Map();
+    for (const entry of entries) {
+      const key = `${entry.catalogId}@${entry.version}`;
+      if (this.#entries.has(key)) {
+        throw new Error(`Duplicate catalog entry '${key}'`);
+      }
+      this.#entries.set(key, entry);
+    }
   }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/services/playbookAgentService/queryCatalog.ts` around lines 93 - 97,
The constructor for the class builds `#entries` by mapping entries to keys
`${e.catalogId}@${e.version}` and currently silently overwrites duplicates;
modify the constructor to detect duplicate keys when iterating the provided
entries (using the same `${e.catalogId}@${e.version}` key logic), and if a
duplicate is found throw an Error (or reject) that names the conflicting
catalogId and version so callers cannot accidentally supply conflicting
CatalogEntry objects; keep the Map population logic but only insert after
checking the key does not already exist to enforce uniqueness.

Implement the JSON extractor that parses structured verdicts from free-form
LLM output text. LLMs often wrap JSON in markdown fences, prose, or emit
multiple JSON objects — this handles all of it.

jsonExtractor.ts:
  - extractJsonObjects(): balanced brace counting for robust extraction
    (handles nested objects, strings containing braces, escaped characters)
  - Template filtering: rejects unfilled placeholder patterns like
    "<NPG | PG | further review>", "[REQUIRED]", "[if available]"
  - extractVerdict(): returns the last valid, non-template verdict
  - Supports both FINAL_VERDICT wrapper and direct root-level fields
  - validateAgainstContract(): checks required fields from output contract
  - Defaults u_llm to 0.5 when LLM doesn't report uncertainty

jsonExtractor.test.ts (9 tests):
  - Extracts single/multiple/nested JSON objects from text
  - Handles strings containing braces
  - Skips invalid JSON
  - Extracts from FINAL_VERDICT wrapper and root-level fields
  - Returns last valid verdict (most recent in output)
  - Filters out template patterns
  - Validates against output contract
  - Validates inside FINAL_VERDICT wrapper

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@rashmiraghunandan rashmiraghunandan force-pushed the playbook-agent/06-json-extractor branch from fc3a5c2 to 9a2e08f Compare May 23, 2026 01:56
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
server/services/playbookAgentService/jsonExtractor.ts (1)

170-178: 💤 Low value

Array element types are not validated.

The arrays are checked with Array.isArray() but elements are cast to string[] without validation. If the LLM provides non-string elements (e.g., queries_executed: [1, 2]), downstream code expecting strings (e.g., Set lookups in confidenceEngine.ts) may behave unexpectedly due to type coercion.

Proposed fix to filter non-string elements
-    queries_executed: Array.isArray(source['queries_executed'])
-      ? (source['queries_executed'] as string[])
+    queries_executed: Array.isArray(source['queries_executed'])
+      ? (source['queries_executed'] as unknown[]).filter((x): x is string => typeof x === 'string')
       : undefined,
-    supporting_evidence: Array.isArray(source['supporting_evidence'])
-      ? (source['supporting_evidence'] as string[])
+    supporting_evidence: Array.isArray(source['supporting_evidence'])
+      ? (source['supporting_evidence'] as unknown[]).filter((x): x is string => typeof x === 'string')
       : undefined,
-    contradicting_evidence: Array.isArray(source['contradicting_evidence'])
-      ? (source['contradicting_evidence'] as string[])
+    contradicting_evidence: Array.isArray(source['contradicting_evidence'])
+      ? (source['contradicting_evidence'] as unknown[]).filter((x): x is string => typeof x === 'string')
       : undefined,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/services/playbookAgentService/jsonExtractor.ts` around lines 170 -
178, The array fields parsed in jsonExtractor.ts (queries_executed,
supporting_evidence, contradicting_evidence) currently use Array.isArray(...)
then cast to string[] without validating element types; update the parsing to
filter each array to only include elements where typeof item === 'string' (e.g.,
Array.isArray(source['queries_executed']) ? (source['queries_executed'] as
any[]).filter(x => typeof x === 'string') : undefined) so downstream code (like
Set lookups in confidenceEngine.ts) receives a true string[]; apply the same
filtering pattern for supporting_evidence and contradicting_evidence and keep
undefined when the source is not an array or yields no string elements.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@server/services/playbookAgentService/jsonExtractor.ts`:
- Around line 170-178: The array fields parsed in jsonExtractor.ts
(queries_executed, supporting_evidence, contradicting_evidence) currently use
Array.isArray(...) then cast to string[] without validating element types;
update the parsing to filter each array to only include elements where typeof
item === 'string' (e.g., Array.isArray(source['queries_executed']) ?
(source['queries_executed'] as any[]).filter(x => typeof x === 'string') :
undefined) so downstream code (like Set lookups in confidenceEngine.ts) receives
a true string[]; apply the same filtering pattern for supporting_evidence and
contradicting_evidence and keep undefined when the source is not an array or
yields no string elements.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 47e94301-5b6b-42e9-8af4-bd606777cc9c

📥 Commits

Reviewing files that changed from the base of the PR and between fc3a5c2 and 9a2e08f.

📒 Files selected for processing (2)
  • server/services/playbookAgentService/jsonExtractor.test.ts
  • server/services/playbookAgentService/jsonExtractor.ts

@cassidyjames cassidyjames changed the title [6/8] Playbook Agent: JSON extractor for LLM output parsing Playbook Agent: JSON extractor for LLM output parsing (6/8) May 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants