Skip to content

Playbook Agent: playbook loader with validation (2/8)#573

Open
rashmiraghunandan wants to merge 2 commits into
roostorg:mainfrom
rashmiraghunandan:playbook-agent/02-playbook-loader
Open

Playbook Agent: playbook loader with validation (2/8)#573
rashmiraghunandan wants to merge 2 commits into
roostorg:mainfrom
rashmiraghunandan:playbook-agent/02-playbook-loader

Conversation

@rashmiraghunandan
Copy link
Copy Markdown

@rashmiraghunandan rashmiraghunandan commented May 23, 2026

Summary

Playbook loader that converts raw YAML/JSON configs into validated TypeScript Playbook objects with comprehensive cross-reference checking.

Part 2 of 8 — depends on #572 (types and interfaces).

playbookLoader.ts

  • parsePlaybook(): snake_case YAML input to camelCase TypeScript output
  • Cross-reference validation at load time:
    • baseline_catalog_refs must be a subset of allowed_catalog_refs
    • Deep-dive module query refs must be in the allowed set
    • Hard rule outcomes must reference valid label IDs
    • Default label must exist in labels
    • Confidence weights must sum to ~1.0
  • Rejects forbidden floating versions (latest, head, main)

playbookLoader.test.ts — 7 tests

  • Parses valid playbook with correct type conversion
  • Applies default values for optional confidence fields
  • Rejects baseline ref not in allowed set
  • Rejects invalid default label
  • Rejects confidence weights that don't sum to ~1.0
  • Rejects invalid semantic versions
  • Validates hard rule outcomes against labels

Test plan

  • npm test -- --testPathPattern=playbookLoader passes (7 tests)

Depends on #572 Types and interfaces. Merge #572 first.

Summary by CodeRabbit

  • New Features

    • Added a Playbook Agent framework for structured investigation workflows with playbook parsing, validation, and result types.
    • Added semantic versioning and catalog reference validation for playbooks.
    • Added configurable decision logic, confidence computation, hard rules, evidence policies, and output contracts.
  • Tests

    • Added unit tests validating playbook parsing, schema enforcement, cross-reference checks, and error reporting.

Review Change Stack

Copilot AI review requested due to automatic review settings May 23, 2026 00:05
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 23, 2026

📝 Walkthrough

Walkthrough

This PR adds core Playbook domain types and utilities, a playbook parser with cross-reference validation and aggregate error reporting plus tests, and service-layer TypeScript interfaces for LLM, evidence store, artifact storage, and playbook orchestration.

Changes

Playbook Agent Framework

Layer / File(s) Summary
Core domain types and utilities
server/services/playbookAgentService/playbookTypes.ts
Defines semantic version parsing/formatting, catalog reference validation/formatting, input context and evidence ontology types, evidence request policy, deep-dive module types, label/hard-rule/decision logic types, confidence computation and phase budget types, output contract, top-level Playbook, and PlaybookResult/verdict/confidence types.
Playbook parsing and validation
server/services/playbookAgentService/playbookLoader.ts, server/services/playbookAgentService/playbookLoader.test.ts
Adds raw snake_case input shapes, conversion helpers (toCatalogReference), PlaybookValidationError, validatePlaybookCrossRefs, and parsePlaybook(raw) which validates fields, parses/normalizes structures, applies defaults, checks cross-references (baseline ⊆ allowed, deep-dive refs in allowed, label IDs in hard rules/default), and enforces confidence weight sums; comprehensive Jest tests cover valid parsing and multiple rejection cases (invalid refs, labels, weights, versions, forbidden IDs, and hard-rule outcomes).
Service layer contracts and interfaces
server/services/playbookAgentService/interfaces.ts
Adds LLMRequest/LLMResponse and ILLMAdapter, EvidenceQuery and IEvidenceStore returning QueryResult, PlaybookArtifact and IArtifactStore for immutable artifact storage, and PlaybookRunInput / IPlaybookRunner returning PlaybookResult.

Sequence Diagram(s)

sequenceDiagram
  participant Loader as parsePlaybook
  participant RawTypes as RawPlaybook
  participant Converter as toCatalogReference
  participant CrossRef as validatePlaybookCrossRefs
  participant Result as Playbook
  
  Loader->>RawTypes: validate raw fields
  RawTypes->>Converter: convert catalog refs
  Converter->>Loader: parse semantic versions
  Loader->>CrossRef: check label IDs, refs, weights
  CrossRef->>CrossRef: baseline subset of allowed
  CrossRef->>CrossRef: validate hard rule outcomes
  CrossRef->>CrossRef: confidence weight sum ~1.0
  CrossRef->>Result: return parsed Playbook
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title clearly and specifically summarizes the main change: implementing a playbook loader with validation (part 2 of an 8-part series).
Description check ✅ Passed The PR description includes all required template sections with adequate detail about changes, but the Tests section lacks specific manual testing or screenshot information.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Introduces a typed, versioned “playbook” schema and a YAML-to-Playbook loader with validation to support deterministic risk investigation workflows.

Changes:

  • Added TypeScript schema/types for playbooks, catalog references, confidence computation, and results.
  • Implemented a playbook loader that converts snake_case YAML objects into typed Playbook objects and validates cross-references.
  • Added unit tests for loader validation behavior and defaults; defined public adapter interfaces for LLM/evidence/artifact stores.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
server/services/playbookAgentService/playbookTypes.ts Adds core playbook/type system, semantic version helpers, and catalog reference validation utilities.
server/services/playbookAgentService/playbookLoader.ts Adds YAML object parsing + structural/cross-ref validation to produce validated Playbook objects.
server/services/playbookAgentService/playbookLoader.test.ts Adds unit coverage for successful parsing, defaults, and key validation failures.
server/services/playbookAgentService/interfaces.ts Adds framework adapter interfaces for LLM, evidence execution, artifact storage, and runner API.

Comment on lines +139 to +147
function toCatalogReference(raw: RawCatalogRef): CatalogReference {
const ref: CatalogReference = {
catalogId: raw.catalog_id,
version: parseSemanticVersion(raw.version),
catalogType: raw.catalog_type as CatalogType,
};
validateCatalogReference(ref);
return ref;
}
Comment on lines +303 to +312
const evidenceRequestPolicy: EvidenceRequestPolicy = {
querySelectionMode: 'query_id',
maxRounds: r.evidence_request_policy.max_rounds,
targetConfidence: r.evidence_request_policy.target_confidence,
maxAdditionalQueries: r.evidence_request_policy.max_additional_queries,
stopIfNoNewEvidence: r.evidence_request_policy.stop_if_no_new_evidence,
triggerDeepDiveIf: r.evidence_request_policy.trigger_deep_dive_if,
stopInvestigationIf: r.evidence_request_policy.stop_investigation_if,
guidance: r.evidence_request_policy.guidance,
};
Comment on lines +42 to +46
export type EvidenceQuery = {
readonly catalogId: string;
readonly version: string;
readonly parameters: Record<string, unknown>;
};
Comment on lines +62 to +72
describe('parsePlaybook', () => {
it('parses a valid playbook', () => {
const playbook = parsePlaybook(MINIMAL_PLAYBOOK);

expect(playbook.useCaseId).toBe('test_investigation');
expect(playbook.version).toEqual({ major: 1, minor: 0, patch: 0 });
expect(playbook.baselineCatalogRefs).toHaveLength(1);
expect(playbook.allowedCatalogRefs).toHaveLength(2);
expect(playbook.labels).toHaveLength(2);
expect(playbook.confidenceComputation.llmUncertaintyAlpha).toBe(0.4);
});
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (2)
server/services/playbookAgentService/interfaces.ts (1)

42-45: 🏗️ Heavy lift

Use domain-level catalog/version types in EvidenceQuery to prevent contract drift.

catalogId and version are currently plain string, which weakens guarantees already modeled in the domain layer. Prefer reusing the typed catalog/version entities from playbookTypes.ts in this interface contract.

Proposed direction
-import type { PlaybookResult, QueryResult } from './playbookTypes.js';
+import type {
+  PlaybookResult,
+  QueryResult,
+  CatalogReference,
+  SemanticVersion,
+} from './playbookTypes.js';

 export type EvidenceQuery = {
-  readonly catalogId: string;
-  readonly version: string;
+  readonly catalogId: CatalogReference['catalogId'];
+  readonly version: SemanticVersion;
   readonly parameters: Record<string, unknown>;
 };
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/services/playbookAgentService/interfaces.ts` around lines 42 - 45, The
EvidenceQuery type uses plain strings for catalogId and version; replace those
with the domain-level types exported from playbookTypes.ts (e.g., CatalogId and
CatalogVersion or the corresponding names used there) by importing them into
server/services/playbookAgentService/interfaces.ts and updating EvidenceQuery's
catalogId and version properties to use those imported types while leaving
parameters as Record<string, unknown>.
server/services/playbookAgentService/playbookLoader.test.ts (1)

62-157: ⚡ Quick win

Add regression tests for catalog type and query-selection validation paths.

Given the parser invariants, add negative tests for: invalid catalog_type, mismatched catalogType between baseline and allowed refs, and unsupported query_selection_mode.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/services/playbookAgentService/playbookLoader.test.ts` around lines 62
- 157, Add three negative unit tests in the parsePlaybook suite using
MINIMAL_PLAYBOOK: one that sets a baseline_catalog_refs entry with an invalid
catalog_type (e.g., 'invalid_type') and asserts parsePlaybook throws about
invalid catalog_type; one that creates a mismatch between
baseline_catalog_refs[0].catalog_type and an allowed_catalog_refs entry’s
catalog_type (different values) and asserts parsePlaybook throws about
mismatched catalog types; and one that sets query_selection_mode to an
unsupported value (e.g., 'UNKNOWN_MODE') and asserts parsePlaybook throws about
unsupported query_selection_mode. Reference parsePlaybook, MINIMAL_PLAYBOOK,
baseline_catalog_refs, allowed_catalog_refs, and query_selection_mode to locate
where to add each test.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@server/services/playbookAgentService/playbookLoader.ts`:
- Around line 303-305: The code creates evidenceRequestPolicy with a hard-coded
querySelectionMode 'query_id' which overwrites the authored value; update the
loader where evidenceRequestPolicy is constructed (evidenceRequestPolicy in
playbookLoader) to read the value from the authored config (e.g.,
r.evidence_request_policy.query_selection_mode) instead of hard-coding, and
validate it against allowed options (reject/throw a descriptive error if the
value is missing or not in the accepted set) so invalid configs are surfaced
rather than silently replaced.
- Around line 139-146: The function toCatalogReference currently type-casts
raw.catalog_type to CatalogType which skips runtime checks; instead perform a
runtime validation/parsing of raw.catalog_type (e.g., implement or call an
isValid/parse function) and set ref.catalogType only if the value is one of the
allowed CatalogType members, otherwise throw a descriptive error; keep
parseSemanticVersion(raw.version) and validateCatalogReference(ref) but remove
the "as CatalogType" cast and replace it with a runtime-validated assignment
(for example add a parseCatalogType(raw.catalog_type) or isCatalogType check
before constructing/returning the CatalogReference).
- Around line 177-199: The subset check builds allowedIds using
formatCatalogReference but only includes catalogId@version, allowing mismatched
catalogType to be accepted; update the matching key to incorporate catalogType
(e.g., catalogType:catalogId@version) and use that consistently when populating
allowedIds and when checking playbook.baselineCatalogRefs and each mod.queryRefs
in playbook.deepDiveModules; either extend formatCatalogReference to include
catalogType or add a new helper (referenced by allowedIds,
formatCatalogReference, playbook.allowedCatalogRefs,
playbook.baselineCatalogRefs, playbook.deepDiveModules, and mod.queryRefs) so
both creation and lookup use the same catalogType-aware key.

In `@server/services/playbookAgentService/playbookTypes.ts`:
- Around line 24-37: The parseSemanticVersion function currently uses
Number(...) which accepts malformed tokens (e.g., empty strings, "+1", " 1"), so
update parseSemanticVersion to first validate each token with a strict regex
(e.g., /^\d+$/) before converting to Number; if any token fails the regex, throw
the existing Invalid semantic version error (include input). Then convert tokens
to integers, check they are non-negative integers as before, and return the
SemanticVersion ({ major, minor, patch }). This change ensures
parseSemanticVersion rejects empty or non-digit tokens and only accepts strictly
numeric components.

---

Nitpick comments:
In `@server/services/playbookAgentService/interfaces.ts`:
- Around line 42-45: The EvidenceQuery type uses plain strings for catalogId and
version; replace those with the domain-level types exported from
playbookTypes.ts (e.g., CatalogId and CatalogVersion or the corresponding names
used there) by importing them into
server/services/playbookAgentService/interfaces.ts and updating EvidenceQuery's
catalogId and version properties to use those imported types while leaving
parameters as Record<string, unknown>.

In `@server/services/playbookAgentService/playbookLoader.test.ts`:
- Around line 62-157: Add three negative unit tests in the parsePlaybook suite
using MINIMAL_PLAYBOOK: one that sets a baseline_catalog_refs entry with an
invalid catalog_type (e.g., 'invalid_type') and asserts parsePlaybook throws
about invalid catalog_type; one that creates a mismatch between
baseline_catalog_refs[0].catalog_type and an allowed_catalog_refs entry’s
catalog_type (different values) and asserts parsePlaybook throws about
mismatched catalog types; and one that sets query_selection_mode to an
unsupported value (e.g., 'UNKNOWN_MODE') and asserts parsePlaybook throws about
unsupported query_selection_mode. Reference parsePlaybook, MINIMAL_PLAYBOOK,
baseline_catalog_refs, allowed_catalog_refs, and query_selection_mode to locate
where to add each test.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 53afe152-6371-47da-9b81-a20ec4a16843

📥 Commits

Reviewing files that changed from the base of the PR and between 9d6bf65 and 3a824d9.

📒 Files selected for processing (4)
  • server/services/playbookAgentService/interfaces.ts
  • server/services/playbookAgentService/playbookLoader.test.ts
  • server/services/playbookAgentService/playbookLoader.ts
  • server/services/playbookAgentService/playbookTypes.ts

Comment thread server/services/playbookAgentService/playbookLoader.ts
Comment on lines +177 to +199
const allowedIds = new Set(
playbook.allowedCatalogRefs.map(formatCatalogReference),
);
for (const ref of playbook.baselineCatalogRefs) {
const refStr = formatCatalogReference(ref);
if (!allowedIds.has(refStr)) {
errors.push({
field: 'baselineCatalogRefs',
message: `Baseline ref '${refStr}' not found in allowed_catalog_refs`,
});
}
}

// deep-dive query refs must be in allowed set
if (playbook.deepDiveModules) {
for (const mod of playbook.deepDiveModules) {
for (const ref of mod.queryRefs) {
const refStr = formatCatalogReference(ref);
if (!allowedIds.has(refStr)) {
errors.push({
field: `deepDiveModules.${mod.moduleId}`,
message: `Deep-dive ref '${refStr}' not found in allowed_catalog_refs`,
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Cross-reference keys should include catalogType.

Current subset checks compare only catalogId@version, so mismatched catalogType values can be treated as valid matches.

Suggested fix
+const referenceKey = (ref: CatalogReference): string =>
+  `${ref.catalogType}:${formatCatalogReference(ref)}`;
+
 function validatePlaybookCrossRefs(playbook: Playbook): PlaybookValidationError[] {
   const errors: PlaybookValidationError[] = [];
 
   // baseline_catalog_refs must be a subset of allowed_catalog_refs
-  const allowedIds = new Set(
-    playbook.allowedCatalogRefs.map(formatCatalogReference),
-  );
+  const allowedIds = new Set(playbook.allowedCatalogRefs.map(referenceKey));
   for (const ref of playbook.baselineCatalogRefs) {
-    const refStr = formatCatalogReference(ref);
-    if (!allowedIds.has(refStr)) {
+    const refStr = referenceKey(ref);
+    if (!allowedIds.has(refStr)) {
       errors.push({
         field: 'baselineCatalogRefs',
         message: `Baseline ref '${refStr}' not found in allowed_catalog_refs`,
       });
     }
   }
@@
       for (const ref of mod.queryRefs) {
-        const refStr = formatCatalogReference(ref);
+        const refStr = referenceKey(ref);
         if (!allowedIds.has(refStr)) {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const allowedIds = new Set(
playbook.allowedCatalogRefs.map(formatCatalogReference),
);
for (const ref of playbook.baselineCatalogRefs) {
const refStr = formatCatalogReference(ref);
if (!allowedIds.has(refStr)) {
errors.push({
field: 'baselineCatalogRefs',
message: `Baseline ref '${refStr}' not found in allowed_catalog_refs`,
});
}
}
// deep-dive query refs must be in allowed set
if (playbook.deepDiveModules) {
for (const mod of playbook.deepDiveModules) {
for (const ref of mod.queryRefs) {
const refStr = formatCatalogReference(ref);
if (!allowedIds.has(refStr)) {
errors.push({
field: `deepDiveModules.${mod.moduleId}`,
message: `Deep-dive ref '${refStr}' not found in allowed_catalog_refs`,
});
const referenceKey = (ref: CatalogReference): string =>
`${ref.catalogType}:${formatCatalogReference(ref)}`;
const allowedIds = new Set(
playbook.allowedCatalogRefs.map(referenceKey),
);
for (const ref of playbook.baselineCatalogRefs) {
const refStr = referenceKey(ref);
if (!allowedIds.has(refStr)) {
errors.push({
field: 'baselineCatalogRefs',
message: `Baseline ref '${refStr}' not found in allowed_catalog_refs`,
});
}
}
// deep-dive query refs must be in allowed set
if (playbook.deepDiveModules) {
for (const mod of playbook.deepDiveModules) {
for (const ref of mod.queryRefs) {
const refStr = referenceKey(ref);
if (!allowedIds.has(refStr)) {
errors.push({
field: `deepDiveModules.${mod.moduleId}`,
message: `Deep-dive ref '${refStr}' not found in allowed_catalog_refs`,
});
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/services/playbookAgentService/playbookLoader.ts` around lines 177 -
199, The subset check builds allowedIds using formatCatalogReference but only
includes catalogId@version, allowing mismatched catalogType to be accepted;
update the matching key to incorporate catalogType (e.g.,
catalogType:catalogId@version) and use that consistently when populating
allowedIds and when checking playbook.baselineCatalogRefs and each mod.queryRefs
in playbook.deepDiveModules; either extend formatCatalogReference to include
catalogType or add a new helper (referenced by allowedIds,
formatCatalogReference, playbook.allowedCatalogRefs,
playbook.baselineCatalogRefs, playbook.deepDiveModules, and mod.queryRefs) so
both creation and lookup use the same catalogType-aware key.

Comment on lines +303 to +305
const evidenceRequestPolicy: EvidenceRequestPolicy = {
querySelectionMode: 'query_id',
maxRounds: r.evidence_request_policy.max_rounds,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not silently overwrite query_selection_mode.

The loader currently ignores the authored value and always sets 'query_id', which hides invalid configs instead of rejecting them.

Suggested fix
+  if (r.evidence_request_policy.query_selection_mode !== 'query_id') {
+    throw new Error(
+      `Invalid query_selection_mode '${r.evidence_request_policy.query_selection_mode}', expected 'query_id'`,
+    );
+  }
+
   const evidenceRequestPolicy: EvidenceRequestPolicy = {
-    querySelectionMode: 'query_id',
+    querySelectionMode: r.evidence_request_policy.query_selection_mode,
     maxRounds: r.evidence_request_policy.max_rounds,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const evidenceRequestPolicy: EvidenceRequestPolicy = {
querySelectionMode: 'query_id',
maxRounds: r.evidence_request_policy.max_rounds,
if (r.evidence_request_policy.query_selection_mode !== 'query_id') {
throw new Error(
`Invalid query_selection_mode '${r.evidence_request_policy.query_selection_mode}', expected 'query_id'`,
);
}
const evidenceRequestPolicy: EvidenceRequestPolicy = {
querySelectionMode: r.evidence_request_policy.query_selection_mode,
maxRounds: r.evidence_request_policy.max_rounds,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/services/playbookAgentService/playbookLoader.ts` around lines 303 -
305, The code creates evidenceRequestPolicy with a hard-coded querySelectionMode
'query_id' which overwrites the authored value; update the loader where
evidenceRequestPolicy is constructed (evidenceRequestPolicy in playbookLoader)
to read the value from the authored config (e.g.,
r.evidence_request_policy.query_selection_mode) instead of hard-coding, and
validate it against allowed options (reject/throw a descriptive error if the
value is missing or not in the accepted set) so invalid configs are surfaced
rather than silently replaced.

Comment thread server/services/playbookAgentService/playbookTypes.ts Outdated
rashmiraghunandan and others added 2 commits May 22, 2026 18:00
Introduce the foundational type system for the playbook agent framework:

playbookTypes.ts — TypeScript types for declarative investigation playbooks:
  - SemanticVersion with parsing and formatting
  - CatalogReference for versioned query references (no floating versions)
  - Playbook config: evidence ontology, labels, decision logic, confidence
    computation weights, output contracts
  - PlaybookVerdict and ConfidenceBreakdown result types
  - QueryResult/QueryResults for evidence store responses

interfaces.ts — Abstract contracts that organizations implement:
  - ILLMAdapter: pluggable LLM provider (OpenAI, Anthropic, local, etc.)
  - IEvidenceStore: execute pre-approved queries by catalog ID
  - IArtifactStore: insert-only storage for immutable audit trail
  - IPlaybookRunner: core runner interface

These types enforce the key safety property: playbooks reference queries by
catalog_id@version and never contain SQL, table names, or field mappings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@rashmiraghunandan rashmiraghunandan force-pushed the playbook-agent/02-playbook-loader branch from 3a824d9 to f40b387 Compare May 23, 2026 01:56
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
server/services/playbookAgentService/playbookLoader.test.ts (1)

62-157: ⚡ Quick win

Add regressions for query_selection_mode and catalog_type-aware subset checks.

The suite misses two high-impact cases currently handled incorrectly in the loader: invalid query_selection_mode acceptance and cross-ref matching that ignores catalog_type.

Suggested tests
 describe('parsePlaybook', () => {
+  it('rejects invalid query_selection_mode', () => {
+    const bad = {
+      ...MINIMAL_PLAYBOOK,
+      evidence_request_policy: {
+        ...MINIMAL_PLAYBOOK.evidence_request_policy,
+        query_selection_mode: 'round_robin',
+      },
+    };
+
+    expect(() => parsePlaybook(bad)).toThrow('query_selection_mode');
+  });
+
+  it('treats catalog_type as part of allowed-set identity', () => {
+    const bad = {
+      ...MINIMAL_PLAYBOOK,
+      baseline_catalog_refs: [
+        { catalog_id: 'login_patterns', version: '1.0.0', catalog_type: 'query_bundle' },
+      ],
+      allowed_catalog_refs: [
+        { catalog_id: 'login_patterns', version: '1.0.0', catalog_type: 'query' },
+      ],
+    };
+
+    expect(() => parsePlaybook(bad)).toThrow('not found in allowed_catalog_refs');
+  });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/services/playbookAgentService/playbookLoader.test.ts` around lines 62
- 157, Add two tests to the parsePlaybook suite: one that sets
query_selection_mode on MINIMAL_PLAYBOOK to an invalid value and asserts
parsePlaybook throws (to cover validation of query_selection_mode), and another
that crafts baseline_catalog_refs and allowed_catalog_refs with the same
catalog_id but different catalog_type (e.g., one 'query' vs 'dataset') and
asserts parsePlaybook rejects the baseline because catalog_type must match (to
cover catalog_type-aware subset checks). Use parsePlaybook, MINIMAL_PLAYBOOK,
baseline_catalog_refs, allowed_catalog_refs, and query_selection_mode in the new
cases so they fail if validation is missing.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@server/services/playbookAgentService/playbookLoader.ts`:
- Around line 256-257: The function parsePlaybook currently casts raw to
Partial<RawPlaybook> (const r = raw as Partial<RawPlaybook>) and may access
properties when raw is null/undefined; add an explicit guard at the start of
parsePlaybook to verify raw is non-null and is an object (e.g., if (raw == null
|| typeof raw !== 'object') throw a validation/TypeError) before assigning to r
or reading properties; apply the same guard logic for the subsequent property
validations in the block that references r (lines around 259–274) so the
existing validation/error path is used instead of allowing a runtime TypeError
when parsePlaybook(null/undefined) is called.
- Around line 379-395: The optional confidence fields (base_coverage_weight,
additional_coverage_weight, critical_evidence_missing_penalty,
additional_query_bonus_max) are not being validated and can be outside [0,1];
add calls to validateRange for each optional field before building the
ConfidenceComputation object (use validateRange(cc.base_coverage_weight,
'base_coverage_weight') etc.), only invoking validateRange when the field is
defined, then construct ConfidenceComputation with the same defaults (??) as
now.

---

Nitpick comments:
In `@server/services/playbookAgentService/playbookLoader.test.ts`:
- Around line 62-157: Add two tests to the parsePlaybook suite: one that sets
query_selection_mode on MINIMAL_PLAYBOOK to an invalid value and asserts
parsePlaybook throws (to cover validation of query_selection_mode), and another
that crafts baseline_catalog_refs and allowed_catalog_refs with the same
catalog_id but different catalog_type (e.g., one 'query' vs 'dataset') and
asserts parsePlaybook rejects the baseline because catalog_type must match (to
cover catalog_type-aware subset checks). Use parsePlaybook, MINIMAL_PLAYBOOK,
baseline_catalog_refs, allowed_catalog_refs, and query_selection_mode in the new
cases so they fail if validation is missing.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 30645a54-1a30-4411-8edb-bb84b3f29ed5

📥 Commits

Reviewing files that changed from the base of the PR and between 3a824d9 and f40b387.

📒 Files selected for processing (4)
  • server/services/playbookAgentService/interfaces.ts
  • server/services/playbookAgentService/playbookLoader.test.ts
  • server/services/playbookAgentService/playbookLoader.ts
  • server/services/playbookAgentService/playbookTypes.ts

Comment on lines +256 to +257
export function parsePlaybook(raw: unknown): Playbook {
const r = raw as Partial<RawPlaybook>;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard raw before property access.

parsePlaybook(null)/parsePlaybook(undefined) can throw a runtime TypeError before your validation error path is reached.

Proposed fix
 export function parsePlaybook(raw: unknown): Playbook {
+  if (typeof raw !== 'object' || raw == null) {
+    throw new Error('Playbook must be an object');
+  }
   const r = raw as Partial<RawPlaybook>;

Also applies to: 259-274

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/services/playbookAgentService/playbookLoader.ts` around lines 256 -
257, The function parsePlaybook currently casts raw to Partial<RawPlaybook>
(const r = raw as Partial<RawPlaybook>) and may access properties when raw is
null/undefined; add an explicit guard at the start of parsePlaybook to verify
raw is non-null and is an object (e.g., if (raw == null || typeof raw !==
'object') throw a validation/TypeError) before assigning to r or reading
properties; apply the same guard logic for the subsequent property validations
in the block that references r (lines around 259–274) so the existing
validation/error path is used instead of allowing a runtime TypeError when
parsePlaybook(null/undefined) is called.

Comment on lines +379 to +395
validateRange(cc.llm_uncertainty_alpha, 'llm_uncertainty_alpha');
validateRange(cc.evidence_completeness_weight, 'evidence_completeness_weight');
validateRange(cc.signal_strength_weight, 'signal_strength_weight');
validateRange(cc.signal_agreement_weight, 'signal_agreement_weight');
validateRange(cc.data_quality_weight, 'data_quality_weight');

const confidenceComputation: ConfidenceComputation = {
evidenceCompletenessWeight: cc.evidence_completeness_weight,
signalStrengthWeight: cc.signal_strength_weight,
signalAgreementWeight: cc.signal_agreement_weight,
dataQualityWeight: cc.data_quality_weight,
llmUncertaintyAlpha: cc.llm_uncertainty_alpha,
baseCoverageWeight: cc.base_coverage_weight ?? 0.7,
additionalCoverageWeight: cc.additional_coverage_weight ?? 0.3,
criticalEvidenceMissingPenalty: cc.critical_evidence_missing_penalty ?? 0.3,
additionalQueryBonusMax: cc.additional_query_bonus_max ?? 0.2,
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate optional confidence fields when provided.

Only the required confidence fields are range-checked. Optional authored values can currently be out of [0, 1] and still be accepted.

Proposed fix
   validateRange(cc.llm_uncertainty_alpha, 'llm_uncertainty_alpha');
   validateRange(cc.evidence_completeness_weight, 'evidence_completeness_weight');
   validateRange(cc.signal_strength_weight, 'signal_strength_weight');
   validateRange(cc.signal_agreement_weight, 'signal_agreement_weight');
   validateRange(cc.data_quality_weight, 'data_quality_weight');
+
+  const baseCoverageWeight = cc.base_coverage_weight ?? 0.7;
+  const additionalCoverageWeight = cc.additional_coverage_weight ?? 0.3;
+  const criticalEvidenceMissingPenalty =
+    cc.critical_evidence_missing_penalty ?? 0.3;
+  const additionalQueryBonusMax = cc.additional_query_bonus_max ?? 0.2;
+
+  validateRange(baseCoverageWeight, 'base_coverage_weight');
+  validateRange(additionalCoverageWeight, 'additional_coverage_weight');
+  validateRange(
+    criticalEvidenceMissingPenalty,
+    'critical_evidence_missing_penalty',
+  );
+  validateRange(additionalQueryBonusMax, 'additional_query_bonus_max');
 
   const confidenceComputation: ConfidenceComputation = {
     evidenceCompletenessWeight: cc.evidence_completeness_weight,
     signalStrengthWeight: cc.signal_strength_weight,
     signalAgreementWeight: cc.signal_agreement_weight,
     dataQualityWeight: cc.data_quality_weight,
     llmUncertaintyAlpha: cc.llm_uncertainty_alpha,
-    baseCoverageWeight: cc.base_coverage_weight ?? 0.7,
-    additionalCoverageWeight: cc.additional_coverage_weight ?? 0.3,
-    criticalEvidenceMissingPenalty: cc.critical_evidence_missing_penalty ?? 0.3,
-    additionalQueryBonusMax: cc.additional_query_bonus_max ?? 0.2,
+    baseCoverageWeight,
+    additionalCoverageWeight,
+    criticalEvidenceMissingPenalty,
+    additionalQueryBonusMax,
   };
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
validateRange(cc.llm_uncertainty_alpha, 'llm_uncertainty_alpha');
validateRange(cc.evidence_completeness_weight, 'evidence_completeness_weight');
validateRange(cc.signal_strength_weight, 'signal_strength_weight');
validateRange(cc.signal_agreement_weight, 'signal_agreement_weight');
validateRange(cc.data_quality_weight, 'data_quality_weight');
const confidenceComputation: ConfidenceComputation = {
evidenceCompletenessWeight: cc.evidence_completeness_weight,
signalStrengthWeight: cc.signal_strength_weight,
signalAgreementWeight: cc.signal_agreement_weight,
dataQualityWeight: cc.data_quality_weight,
llmUncertaintyAlpha: cc.llm_uncertainty_alpha,
baseCoverageWeight: cc.base_coverage_weight ?? 0.7,
additionalCoverageWeight: cc.additional_coverage_weight ?? 0.3,
criticalEvidenceMissingPenalty: cc.critical_evidence_missing_penalty ?? 0.3,
additionalQueryBonusMax: cc.additional_query_bonus_max ?? 0.2,
};
validateRange(cc.llm_uncertainty_alpha, 'llm_uncertainty_alpha');
validateRange(cc.evidence_completeness_weight, 'evidence_completeness_weight');
validateRange(cc.signal_strength_weight, 'signal_strength_weight');
validateRange(cc.signal_agreement_weight, 'signal_agreement_weight');
validateRange(cc.data_quality_weight, 'data_quality_weight');
const baseCoverageWeight = cc.base_coverage_weight ?? 0.7;
const additionalCoverageWeight = cc.additional_coverage_weight ?? 0.3;
const criticalEvidenceMissingPenalty =
cc.critical_evidence_missing_penalty ?? 0.3;
const additionalQueryBonusMax = cc.additional_query_bonus_max ?? 0.2;
validateRange(baseCoverageWeight, 'base_coverage_weight');
validateRange(additionalCoverageWeight, 'additional_coverage_weight');
validateRange(
criticalEvidenceMissingPenalty,
'critical_evidence_missing_penalty',
);
validateRange(additionalQueryBonusMax, 'additional_query_bonus_max');
const confidenceComputation: ConfidenceComputation = {
evidenceCompletenessWeight: cc.evidence_completeness_weight,
signalStrengthWeight: cc.signal_strength_weight,
signalAgreementWeight: cc.signal_agreement_weight,
dataQualityWeight: cc.data_quality_weight,
llmUncertaintyAlpha: cc.llm_uncertainty_alpha,
baseCoverageWeight,
additionalCoverageWeight,
criticalEvidenceMissingPenalty,
additionalQueryBonusMax,
};
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/services/playbookAgentService/playbookLoader.ts` around lines 379 -
395, The optional confidence fields (base_coverage_weight,
additional_coverage_weight, critical_evidence_missing_penalty,
additional_query_bonus_max) are not being validated and can be outside [0,1];
add calls to validateRange for each optional field before building the
ConfidenceComputation object (use validateRange(cc.base_coverage_weight,
'base_coverage_weight') etc.), only invoking validateRange when the field is
defined, then construct ConfidenceComputation with the same defaults (??) as
now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants