Skip to content

feat: manifest-first entity recall for all platforms (#224)#226

Open
visahak wants to merge 8 commits intoAgentToolkit:mainfrom
visahak:feat/codex-recall-manifest-224
Open

feat: manifest-first entity recall for all platforms (#224)#226
visahak wants to merge 8 commits intoAgentToolkit:mainfrom
visahak:feat/codex-recall-manifest-224

Conversation

@visahak
Copy link
Copy Markdown
Collaborator

@visahak visahak commented Apr 28, 2026

⏺ ## Summary

  • Replace full-body entity injection with compact manifest output across all three platforms (Codex, Claude, Bob)
  • The UserPromptSubmit hook (Claude/Codex) and manual recall skill (Bob) now emit only path, type, and trigger per entity — full content is read on demand
  • Shared load_manifest and dedupe_manifest_entries helpers in entity_io.py power all three implementations
  • Claude/Codex output JSON lines; Bob outputs human-readable markdown (visible in Cline UI)

Changes

  • entity_io.py — Added _parse_frontmatter_only, load_manifest, dedupe_manifest_entries shared helpers
  • Codex retrieve_entities.py — Switched to manifest JSON output
  • Claude retrieve_entities.py — Switched to manifest JSON output
  • Bob retrieve_entities.py — Switched to human-readable manifest output
  • SKILL.md (all platforms) — Updated to document manifest-first two-step flow
  • Tests — New test_claude_retrieve_manifest.py, test_codex_retrieve_manifest.py; updated test_retrieve.py, test_bob_sharing.py, test_codex_sharing.py, test_sync.py, test_subscribe.py

Test plan

  • 294 platform_integrations tests passing
  • Manifest shape: entries contain only path, type, trigger
  • No full entity bodies in output
  • Deterministic ordering and deduplication
  • Symlinked entities filtered out
  • Subscribed and public entities included
  • Invalid stdin handled gracefully (Claude/Codex)

addressing issue #224

Summary by CodeRabbit

Release Notes

  • Refactor

    • Updated entity retrieval workflow across all integrations to use a manifest-first approach, loading only entity metadata (path, type, trigger) initially and expanding full content on-demand for improved efficiency.
    • Changed output format from markdown to JSON-based manifest entries.
  • Tests

    • Added comprehensive test coverage for manifest loading and JSON output validation across all platform integrations.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 28, 2026

Warning

Rate limit exceeded

@visahak has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 27 minutes and 59 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1c70d2ad-e9dc-4928-bf96-612afb316e70

📥 Commits

Reviewing files that changed from the base of the PR and between 369f2da and 87d1b15.

📒 Files selected for processing (16)
  • platform-integrations/bob/evolve-lite/skills/evolve-lite:recall/SKILL.md
  • platform-integrations/bob/evolve-lite/skills/evolve-lite:recall/scripts/retrieve_entities.py
  • platform-integrations/claude/plugins/evolve-lite/lib/entity_io.py
  • platform-integrations/claude/plugins/evolve-lite/skills/recall/SKILL.md
  • platform-integrations/claude/plugins/evolve-lite/skills/recall/scripts/retrieve_entities.py
  • platform-integrations/codex/plugins/evolve-lite/skills/recall/SKILL.md
  • platform-integrations/codex/plugins/evolve-lite/skills/recall/scripts/retrieve_entities.py
  • tests/platform_integrations/conftest.py
  • tests/platform_integrations/test_bob_sharing.py
  • tests/platform_integrations/test_claude_retrieve_manifest.py
  • tests/platform_integrations/test_codex_retrieve_manifest.py
  • tests/platform_integrations/test_codex_sharing.py
  • tests/platform_integrations/test_entity_io_core.py
  • tests/platform_integrations/test_retrieve.py
  • tests/platform_integrations/test_subscribe.py
  • tests/platform_integrations/test_sync.py
📝 Walkthrough

Walkthrough

This PR implements a manifest-first recall workflow across Claude, Codex, and Bob platform integrations. Instead of eagerly loading and parsing full entity markdown files, the system now first generates a lightweight manifest containing only each entity's path, type, and trigger field, then expands relevant entities on-demand by reading only the necessary files. Core changes include shared entity I/O utilities for manifest loading and deduplication, updated retrieval scripts, documentation, and comprehensive test coverage.

Changes

Cohort / File(s) Summary
Documentation Updates
platform-integrations/bob/evolve-lite/skills/evolve-lite:recall/SKILL.md, platform-integrations/claude/plugins/evolve-lite/skills/recall/SKILL.md, platform-integrations/codex/plugins/evolve-lite/skills/recall/SKILL.md
Updated "How It Works" sections to describe manifest-first retrieval: hook emits minimal manifest (path/type/trigger only), then full entities are expanded on demand. Removed references to eager loading and inline entity content/source annotations.
Shared Entity I/O Library
platform-integrations/claude/plugins/evolve-lite/lib/entity_io.py
Added two new functions: load_manifest(root_dir) reads YAML frontmatter from markdown files without parsing bodies, and dedupe_manifest_entries(entries) ensures deterministic deduplication by (path, type, trigger) tuples. Both skip symlinks and enforce required frontmatter fields.
Retrieval Scripts
platform-integrations/bob/evolve-lite/skills/evolve-lite:recall/scripts/retrieve_entities.py, platform-integrations/claude/plugins/evolve-lite/skills/recall/scripts/retrieve_entities.py, platform-integrations/codex/plugins/evolve-lite/skills/recall/scripts/retrieve_entities.py
Converted from markdown-file parsing with full entity content to manifest-based loading via load_manifest and dedupe_manifest_entries. Output changed from formatted bullet lists with entity bodies/rationale/metadata to JSON-serialized manifest entries (path/type/trigger only). Removed _source provenance annotations and body content.
Configuration
platform-integrations/install.sh
Simplified Codex user-prompt hook filtering logic to use truthiness check instead of explicit length comparison (functionally equivalent).
Test Fixtures & Integration Tests
tests/platform_integrations/conftest.py, tests/platform_integrations/test_bob_sharing.py, tests/platform_integrations/test_sync.py, tests/platform_integrations/test_subscribe.py
Updated guideline markdown fixtures to include trigger frontmatter field. Modified assertions to validate manifest-style output (trigger values present, entity body text absent) and verify symlink deduplication. One test scenario adjusted to expect graceful stderr warning for audit-write failures instead of fatal error.
New Manifest-First Test Suites
tests/platform_integrations/test_claude_retrieve_manifest.py, tests/platform_integrations/test_codex_retrieve_manifest.py, tests/platform_integrations/test_entity_io_core.py
Introduced comprehensive test modules validating manifest output: header format, JSON entry structure (path/type/trigger only), absence of body content/extra fields, determinism, deduplication, symlink skipping, and graceful stdin handling. Entity I/O tests cover load_manifest frontmatter parsing, relative path conversion, and dedupe_manifest_entries behavior.
Existing Test Suite Updates
tests/platform_integrations/test_retrieve.py, tests/platform_integrations/test_codex_sharing.py
Refactored assertions from validating raw entity text output to validating structured JSON manifest entries. Added project_dir subprocess context, JSON-line parsing helper, and updated test fixtures with trigger fields. Removed checks for source annotations ([from: ...]) and entity body content.

Sequence Diagram

sequenceDiagram
    autonumber
    participant Hook as Hook/<br/>Caller
    participant Scanner as Manifest<br/>Scanner
    participant FSys as File<br/>System
    participant Parser as Manifest<br/>Loader
    participant Filter as Dedup<br/>Filter
    participant Reader as Entity<br/>Expander
    participant Output as Output<br/>Formatter

    Note over Hook,Output: OLD: Eager Load All Entities
    Hook->>Scanner: Discover entity directories
    Scanner->>FSys: List all .md files recursively
    FSys-->>Scanner: File list
    loop For each markdown file
        Scanner->>Parser: Parse full markdown + frontmatter
        FSys->>Parser: Read complete file
        Parser->>Filter: Extract path, type, content, source
    end
    Filter-->>Output: All entity objects with content
    Output-->>Hook: Formatted text list (content included)

    rect rgba(100, 150, 200, 0.5)
    Note over Hook,Output: NEW: Manifest-First On-Demand
    Hook->>Scanner: Discover entity directories
    Scanner->>FSys: List all .md files recursively
    FSys-->>Scanner: File list
    loop For each markdown file
        Scanner->>Parser: Extract only YAML frontmatter
        Parser->>FSys: Read file header (minimal bytes)
        FSys-->>Parser: Frontmatter (path, type, trigger)
    end
    Parser->>Filter: Deduplicate by (path, type, trigger)
    Filter-->>Output: Minimal manifest list
    Output-->>Hook: JSON entries (no content)
    
    Hook->>Reader: Select relevant entities by trigger
    Reader->>FSys: Read full .md for matching triggers only
    FSys-->>Reader: Complete entity content + rationale
    Reader-->>Hook: Expanded entity details
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~55 minutes

Possibly related issues

Possibly related PRs

Suggested reviewers

  • illeatmyhat
  • gaodan-fang

Poem

🐰 Hoppy hooray, the manifest's here!
No more loading files, the path is clear—
Just path and type and trigger small,
Then expand on-demand, that's all!
Swift as carrots, light as air! 🥕✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 21.31% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: manifest-first entity recall for all platforms' clearly and concisely summarizes the main change: implementing a manifest-first approach for entity recall across multiple platforms (Codex, Claude, Bob).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot]

This comment was marked as resolved.

visahak and others added 7 commits April 28, 2026 16:38
Replace full-body entity injection with compact manifest output in
Claude's UserPromptSubmit hook. The hook now emits one JSON line per
entity containing only path, type, and trigger — Claude reads full
files on demand via the Read tool. Reuses the shared load_manifest
and dedupe_manifest_entries helpers already in entity_io.py.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
subscribe.py catches audit write failures and warns instead of failing.
Update the test to assert returncode 0, the warning message on stderr,
and that the clone is preserved.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace full-body entity injection with human-readable manifest output
in Bob's recall script. Uses shared load_manifest and dedupe helpers
from entity_io.py. Output format is markdown lines with path, type,
and trigger — Bob reads full files on demand via read_file.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Catch UnicodeDecodeError in _parse_frontmatter_only
- Reject files missing closing --- delimiter
- Validate stdin JSON is a dict before accessing keys
- Add e2e marker to manifest retrieval tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@visahak visahak force-pushed the feat/codex-recall-manifest-224 branch from 218edec to 46ab12c Compare April 28, 2026 20:44
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Vatche Isahagian <vatchei@ibm.com>
@visahak
Copy link
Copy Markdown
Collaborator Author

visahak commented Apr 28, 2026

Haven't tested this in the agent harnresses yet claude, codex, and IBM Bob.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant