Skip to content

feat: cache retrieved docs and add PyPI package docs lookup#9

Merged
ayhammouda merged 10 commits intomainfrom
fix/open-issues-cache-pypi-docs
May 9, 2026
Merged

feat: cache retrieved docs and add PyPI package docs lookup#9
ayhammouda merged 10 commits intomainfrom
fix/open-issues-cache-pypi-docs

Conversation

@ayhammouda
Copy link
Copy Markdown
Owner

@ayhammouda ayhammouda commented May 7, 2026

Summary

Fixes #8
Fixes #6

Changes

  • Adds an on-disk SQLite cache for completed get_docs results so retrieved docs survive MCP client/server restarts.
  • Keys cached docs by local index fingerprint, resolved Python docs version, slug, anchor, max_chars, and start_index.
  • Ignores stale cache entries automatically after the local index.db is rebuilt or replaced.
  • Adds lookup_package_docs, a controlled PyPI metadata lookup tool that returns only PyPI/project-declared docs/homepage/source URLs and makes the pypi-declared-metadata trust boundary explicit.
  • Documents cache location, cached fields, invalidation behavior, and the PyPI lookup scope.

Testing

  • uv run ruff check src/ tests/
  • uv run pyright src/
  • uv run pytest --tb=short -q — 247 passed, 3 skipped

Notes

  • lookup_package_docs does not crawl or fetch arbitrary package documentation pages. It queries the official PyPI JSON API and returns declared source URLs only, so it stays inside the controlled trust boundary instead of becoming a generic web search.

Summary by CodeRabbit

  • New Features

    • PyPI package docs lookup tool (returns only package-declared sources; controlled trust boundary)
    • Persistent on-disk cache for doc retrieval with identity scoping and corruption fallback
  • Bug Fixes

    • Restored MCP stdio framing so JSON-RPC appears only on stdout
    • Consistent page/section anchors (page-only hits emit null anchor)
  • Documentation

    • README: PyPI lookup, cache behavior, and when to use generic retrieval
  • Tests

    • New smoke and unit tests for cache persistence, corruption fallback, PyPI lookup, protocol framing, concurrency, and anchor behavior

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 8, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: ad474f26-c593-4c8e-aeec-33448f90f78a

📥 Commits

Reviewing files that changed from the base of the PR and between a525e7b and 67fba03.

📒 Files selected for processing (5)
  • src/mcp_server_python_docs/models.py
  • src/mcp_server_python_docs/services/package_docs.py
  • src/mcp_server_python_docs/services/persistent_cache.py
  • tests/test_package_docs.py
  • tests/test_persistent_docs_cache.py
🚧 Files skipped from review as they are similar to previous changes (5)
  • tests/test_package_docs.py
  • src/mcp_server_python_docs/models.py
  • tests/test_persistent_docs_cache.py
  • src/mcp_server_python_docs/services/persistent_cache.py
  • src/mcp_server_python_docs/services/package_docs.py

📝 Walkthrough

Walkthrough

Adds persistent SQLite caching for get_docs across restarts, a controlled PyPI package-docs lookup and models, DB-backed symbol slug/anchor resolution, server wiring and tool registration, restores sys.stdout for MCP framing, and adds unit/integration tests plus MCP-level test/fix plans.

Changes

Persistent Docs Cache and PyPI Package Lookup

Layer / File(s) Summary
Data Models
src/mcp_server_python_docs/models.py, tests/fixtures/schema-search_docs-output.json, README.md
SymbolHit.anchor becomes nullable; adds PackageKind, PackageDocsSource, PackageDocsResult and documents lookup_package_docs / trust boundary.
Persistent Cache Implementation
src/mcp_server_python_docs/services/persistent_cache.py
SQLite-backed cache for get_docs results keyed by index fingerprint, version, slug, anchor (sentinel for null), max_chars, and start_index; serialized get/put, stats, graceful disable on errors.
PyPI Service
src/mcp_server_python_docs/services/package_docs.py
PackageDocsService fetches bounded PyPI JSON metadata, normalizes package names, extracts allowlisted declared URLs, deduplicates sources, and returns controlled note messages on failures/oversize.
Retrieval / Ranker
src/mcp_server_python_docs/retrieval/ranker.py
Adds helpers to resolve symbol inventory rows to (slug, anchor) by querying sections/documents; ensures page-only hits emit anchor=None; updates symbol search/lookups to use resolved locations.
Service Integration
src/mcp_server_python_docs/services/content.py, src/mcp_server_python_docs/app_context.py
ContentService accepts optional PersistentDocsCache, attempts cache fast-path using resolved version/slug/anchor/max_chars/start_index, and persists computed GetDocsResult; AppContext extended with PackageDocsService and PersistentDocsCache.
Server Wiring
src/mcp_server_python_docs/server.py
Lifespan initializes PersistentDocsCache and PackageDocsService, injects them into AppContext, closes cache on teardown, and registers async lookup_package_docs tool with openWorldHint.
Transport Fix & Protocol Tests
src/mcp_server_python_docs/__main__.py, tests/test_stdio_smoke.py
Restores sys.stdout to sys.__stdout__ after fd restoration; enhances stdio smoke tests to assert JSON-RPC frames appear only on stdout and drives subprocess stdin line-paced.
Persistent Cache Tests
tests/test_persistent_docs_cache.py
Adds concurrency test for put() serialization; retains coverage for persistence, version isolation, index-change invalidation, anchor null/empty distinction, budget/pagination keying, and corrupt-cache fallback.
PyPI Service Tests
tests/test_package_docs.py
Unit tests cover official PyPI endpoint usage, package-name normalization, trust boundary, allowlisted source filtering, missing-package 404, non-404 HTTP errors, oversized responses, and retrieval/parse errors.
MCP Smoke Tests
tests/test_mcp_get_docs_cache_smoke.py
Subprocess-based integration smoke: verifies full-page and section get_docs, cache row creation, restart reuse, invalid-anchor error, and fallback after cache corruption.
Tool Registration & Retrieval Tests
tests/test_services.py, tests/test_retrieval.py
Tests updated to expect five MCP tools registered; adds symbol-retrieval tests for extensionless slugs and regression asserting anchor=None for page-only hits.
Documentation & Plans
README.md, PR9_MCP_TEST_PLAN.md, PR9_SMOKE_FAILURE_FIX_PLAN.md
README documents five tools, "Retrieved docs cache", and PyPI lookup trust boundary; adds MCP test plan and smoke-failure fix plan.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🐰 A rabbit hops through docs with glee,
Caches pages safe for you and me,
PyPI links kept tidy and true,
Restarts don't lose the things we knew,
JSON-RPC sings clean on stdout too.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 34.69% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'feat: cache retrieved docs and add PyPI package docs lookup' directly and concisely summarizes the two main features implemented: persistent caching of retrieved documentation and the new PyPI package lookup capability.
Linked Issues check ✅ Passed The PR successfully implements all coding requirements from #8 (persistent cache keyed by version/slug/anchor/budget params, tied to index.db identity, graceful fallback) and #6 (PyPI-only lookup, controlled source filtering, explicit trust boundary).
Out of Scope Changes check ✅ Passed All changes are directly aligned with the stated objectives: cache infrastructure, PyPI lookup service, documentation updates, model additions, and comprehensive test coverage. No unrelated refactoring or scope creep detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/open-issues-cache-pypi-docs

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (2)
tests/test_mcp_get_docs_cache_smoke.py (1)

10-16: ⚡ Quick win

Extract shared stdio helpers into a dedicated test utility module.

Importing private helpers from another test module tightly couples these tests and makes future refactors noisier. Move these helpers into a shared tests support module and import both smoke suites from there.

Refactor sketch
-from tests.test_stdio_smoke import (
+from tests.support.stdio_helpers import (
     _assert_protocol_on_stdout_only,
     _find_response,
     _isolated_cache_env,
     _make_notification,
     _make_request,
 )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_mcp_get_docs_cache_smoke.py` around lines 10 - 16, Move the
private stdio helper functions into a shared test helpers module and update
imports: create a new tests support module (e.g., tests.test_helpers or
tests._helpers) and relocate the functions _assert_protocol_on_stdout_only,
_find_response, _isolated_cache_env, _make_notification, and _make_request into
that module; then change imports in this test (and in tests that currently
import from tests.test_stdio_smoke) to import those helpers from the new module
instead of importing them from _stdio_smoke. Ensure the new module exports the
same helper names and update any relative import paths in affected tests to
reference the new module.
PR9_MCP_TEST_PLAN.md (1)

23-26: ⚡ Quick win

Make setup steps environment-agnostic.

The hardcoded workspace path and branch-specific checkout make this plan hard to run outside your local/review environment. Prefer generic “repo root” instructions plus optional branch notes.

Proposed edit
-cd /srv/openclaw/.openclaw/workspace/tmp/python-docs-mcp-review
-git checkout fix/open-issues-cache-pypi-docs || git checkout review-pr-9
-git pull --ff-only origin fix/open-issues-cache-pypi-docs
+cd <repo-root>
+# optional: checkout the PR branch under test
+# git checkout fix/open-issues-cache-pypi-docs
+git pull --ff-only
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@PR9_MCP_TEST_PLAN.md` around lines 23 - 26, The setup steps in
PR9_MCP_TEST_PLAN.md are environment-specific—replace the hardcoded path "cd
/srv/openclaw/.openclaw/workspace/tmp/python-docs-mcp-review" with a generic
repo-root instruction (e.g., "cd <repo-root> or open project root"), and change
the branch checkout commands "git checkout fix/open-issues-cache-pypi-docs ||
git checkout review-pr-9" to suggest a generic branch instruction (e.g., "git
fetch && git checkout <branch-or-PR-branch> (or use the PR branch name)") plus
an optional note showing the exact branch names used for this review; keep "git
pull --ff-only origin <branch>" and "uv sync --all-extras" but parametrize the
branch placeholder so the plan is runnable in any environment.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@PR9_MCP_TEST_PLAN.md`:
- Line 42: Update the pytest summary line in PR9_MCP_TEST_PLAN.md that currently
reads "254 passed, 3 skipped" to reflect the PR's actual test totals "247
passed, 3 skipped" so the documented expected results match the reported CI/PR
summary and avoid false FAILs.
- Line 71: Change the headings and inline phrases to use hyphenated compound
adjectives for consistency: replace "Full page retrieval" with "Full-page
retrieval" and any occurrences of "page not found style response" or similar
with "page-not-found-style response" (check the heading at "### 2.1 Full page
retrieval" and the repeated occurrence around lines noted as 111-111) so all
compound adjectives in PR9_MCP_TEST_PLAN.md follow the hyphenated style.

In `@src/mcp_server_python_docs/retrieval/ranker.py`:
- Around line 83-99: The current _resolve_symbol_location logic returns empty
string "" for a missing anchor which makes get_docs treat it as a section hit;
change the return value to use None (or normalize blank anchors to None) when
there is no valid section anchor. Specifically, in the branch that checks
section_row and the final return, return doc_row["slug"], None (or convert
section_row["anchor"] or "" into None) instead of "", and apply the same change
in the other occurrences referenced (the other blocks around lines with similar
logic) so doc_row, fallback_anchor, section_row and _resolve_symbol_location
consistently return None for “no anchor.”

In `@src/mcp_server_python_docs/services/persistent_cache.py`:
- Around line 32-35: The shared sqlite connection (self._conn) is used with
check_same_thread=False but not protected against concurrent access; serialize
DB access by adding a threading.Lock (e.g., self._db_lock created where the
connection is initialized) and wrap every database operation (methods like
get(), put(), and any other places that call self._conn.execute/commit) inside a
with self._db_lock: block (or acquire/release) so execute()/commit() calls
cannot run concurrently; alternatively, change the implementation to open a
short-lived sqlite3.connect(...) per operation and close it after the
query—apply the same change for all DB use sites referenced around the
__init__/connection code and the blocks noted (lines ~66-99, 101-123).

In `@tests/test_stdio_smoke.py`:
- Around line 151-162: The helper _assert_protocol_on_stdout_only currently
verifies JSON-RPC frames but doesn't assert the process exit status; update this
function to check result.returncode == 0 (or use result.check_returncode()) and
assert a successful exit with a clear message so smoke tests fail if the server
exited non-zero; locate the function _assert_protocol_on_stdout_only and add a
return-code assertion after reading/parsing stdout/stderr responses.

---

Nitpick comments:
In `@PR9_MCP_TEST_PLAN.md`:
- Around line 23-26: The setup steps in PR9_MCP_TEST_PLAN.md are
environment-specific—replace the hardcoded path "cd
/srv/openclaw/.openclaw/workspace/tmp/python-docs-mcp-review" with a generic
repo-root instruction (e.g., "cd <repo-root> or open project root"), and change
the branch checkout commands "git checkout fix/open-issues-cache-pypi-docs ||
git checkout review-pr-9" to suggest a generic branch instruction (e.g., "git
fetch && git checkout <branch-or-PR-branch> (or use the PR branch name)") plus
an optional note showing the exact branch names used for this review; keep "git
pull --ff-only origin <branch>" and "uv sync --all-extras" but parametrize the
branch placeholder so the plan is runnable in any environment.

In `@tests/test_mcp_get_docs_cache_smoke.py`:
- Around line 10-16: Move the private stdio helper functions into a shared test
helpers module and update imports: create a new tests support module (e.g.,
tests.test_helpers or tests._helpers) and relocate the functions
_assert_protocol_on_stdout_only, _find_response, _isolated_cache_env,
_make_notification, and _make_request into that module; then change imports in
this test (and in tests that currently import from tests.test_stdio_smoke) to
import those helpers from the new module instead of importing them from
_stdio_smoke. Ensure the new module exports the same helper names and update any
relative import paths in affected tests to reference the new module.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 1968fe20-5be0-4725-92d0-2853ce1948b3

📥 Commits

Reviewing files that changed from the base of the PR and between 8f6736c and 694c205.

📒 Files selected for processing (17)
  • PR9_MCP_TEST_PLAN.md
  • PR9_SMOKE_FAILURE_FIX_PLAN.md
  • README.md
  • src/mcp_server_python_docs/__main__.py
  • src/mcp_server_python_docs/app_context.py
  • src/mcp_server_python_docs/models.py
  • src/mcp_server_python_docs/retrieval/ranker.py
  • src/mcp_server_python_docs/server.py
  • src/mcp_server_python_docs/services/content.py
  • src/mcp_server_python_docs/services/package_docs.py
  • src/mcp_server_python_docs/services/persistent_cache.py
  • tests/conftest.py
  • tests/test_mcp_get_docs_cache_smoke.py
  • tests/test_package_docs.py
  • tests/test_persistent_docs_cache.py
  • tests/test_services.py
  • tests/test_stdio_smoke.py

Comment thread PR9_MCP_TEST_PLAN.md

- Ruff passes.
- Pyright passes for `src/`.
- Pytest passes: currently expected `254 passed, 3 skipped`.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update the expected pytest totals to match this PR context.

Line 42 says 254 passed, 3 skipped, but the PR summary for this change set reports 247 passed, 3 skipped. This can cause false FAIL interpretation during manual validation.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@PR9_MCP_TEST_PLAN.md` at line 42, Update the pytest summary line in
PR9_MCP_TEST_PLAN.md that currently reads "254 passed, 3 skipped" to reflect the
PR's actual test totals "247 passed, 3 skipped" so the documented expected
results match the reported CI/PR summary and avoid false FAILs.

Comment thread PR9_MCP_TEST_PLAN.md

## Test Area 2 — `get_docs` Functional Retrieval

### 2.1 Full page retrieval
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use hyphenated compound adjectives for consistency.

Consider “Full-page retrieval” and “page-not-found-style response” for clearer technical writing.

Also applies to: 111-111

🧰 Tools
🪛 LanguageTool

[uncategorized] ~71-~71: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...get_docs` Functional Retrieval ### 2.1 Full page retrieval Call: ```text get_docs(slug...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@PR9_MCP_TEST_PLAN.md` at line 71, Change the headings and inline phrases to
use hyphenated compound adjectives for consistency: replace "Full page
retrieval" with "Full-page retrieval" and any occurrences of "page not found
style response" or similar with "page-not-found-style response" (check the
heading at "### 2.1 Full page retrieval" and the repeated occurrence around
lines noted as 111-111) so all compound adjectives in PR9_MCP_TEST_PLAN.md
follow the hyphenated style.

Comment thread src/mcp_server_python_docs/retrieval/ranker.py Outdated
Comment thread src/mcp_server_python_docs/services/persistent_cache.py
Comment thread tests/test_stdio_smoke.py
Comment on lines +151 to +162
def _assert_protocol_on_stdout_only(result: subprocess.CompletedProcess) -> list[dict]:
"""Assert JSON-RPC protocol frames are emitted only on stdout."""
responses = _read_responses(result.stdout)
assert responses, f"No JSON-RPC responses on stdout; stderr was: {result.stderr!r}"

for resp in responses:
assert "_raw" not in resp, f"Non-JSON stdout pollution: {resp.get('_raw')}"
assert resp.get("jsonrpc") == "2.0", f"Missing JSON-RPC frame on stdout: {resp}"

stderr_frames = _jsonrpc_frames(result.stderr)
assert stderr_frames == [], f"JSON-RPC frames leaked to stderr: {stderr_frames}"
return responses
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Assert successful process exit in protocol helper.

_assert_protocol_on_stdout_only validates framing but can still pass if the server emits responses and then exits non-zero. Add a return-code check to keep smoke tests aligned with the “no crash” intent.

Suggested patch
 def _assert_protocol_on_stdout_only(result: subprocess.CompletedProcess) -> list[dict]:
     """Assert JSON-RPC protocol frames are emitted only on stdout."""
+    assert result.returncode == 0, (
+        f"Server exited with non-zero status {result.returncode}; stderr was: {result.stderr!r}"
+    )
     responses = _read_responses(result.stdout)
     assert responses, f"No JSON-RPC responses on stdout; stderr was: {result.stderr!r}"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_stdio_smoke.py` around lines 151 - 162, The helper
_assert_protocol_on_stdout_only currently verifies JSON-RPC frames but doesn't
assert the process exit status; update this function to check result.returncode
== 0 (or use result.check_returncode()) and assert a successful exit with a
clear message so smoke tests fail if the server exited non-zero; locate the
function _assert_protocol_on_stdout_only and add a return-code assertion after
reading/parsing stdout/stderr responses.

- package_docs: catch UnicodeDecodeError on non-UTF-8 PyPI
  responses so they surface as the controlled empty-result
  path instead of leaking a tool-level exception
- ranker/SymbolHit: return None for page-only symbol anchors
  so get_docs() retrieves the whole page; SymbolHit.anchor is
  now str | None
- persistent_cache: serialize execute()/commit()/stats updates
  with threading.Lock — per the Python sqlite3 docs,
  check_same_thread=False alone does not make a connection
  safe for concurrent writes; without serialization concurrent
  put() raised SystemError under load

Minor follow-ups picked up along the way: _empty_result helper
in package_docs, drop unused PackageDocsInput model, use
storage.db.get_cache_dir/get_index_path helpers from server.py,
import _NO_ANCHOR_KEY constant in smoke test.

Tests: UTF-8 decode path, page-only anchor=None contract,
concurrent put() across 20 threads. 261/261 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/mcp_server_python_docs/models.py (1)

171-190: ⚡ Quick win

Constrain the package-docs contract using Literal types.

kind and trust_boundary are currently free-form strings, which means the generated MCP schema does not enforce the narrow PyPI-declared surface this tool is meant to expose. Using Literal[...] keeps that trust boundary machine-checkable for clients and harder to widen by accident.

Suggested changes
 class PackageDocsSource(BaseModel):
     """A package-declared documentation or project source URL."""

     label: str = Field(description="Label from PyPI metadata or a normalized core metadata field")
     url: str = Field(description="HTTP(S) URL declared by the package on PyPI")
-    kind: str = Field(description="Source category, such as docs, homepage, source, or pypi")
+    kind: Literal["docs", "homepage", "source", "pypi"] = Field(
+        description="Source category, such as docs, homepage, source, or pypi"
+    )
     declared_by: str = Field(description="Where this source declaration came from")
@@
-    trust_boundary: str = Field(
+    trust_boundary: Literal["pypi-declared-metadata"] = Field(
         default="pypi-declared-metadata",
         description="Indicates results are limited to PyPI/project-declared metadata",
     )

Pydantic v2 will emit schema constraints for both fields: multi-value Literal as "enum" and single-value Literal as "const".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/mcp_server_python_docs/models.py` around lines 171 - 190, The contract
currently leaves PackageDocsSource.kind and PackageDocsResult.trust_boundary as
plain str; change their annotations to Literal[...] to constrain allowed values
(e.g., for kind use Literal["docs","homepage","source","pypi"] and for
trust_boundary use a single-value Literal["pypi-declared-metadata"]), import
Literal from typing, and keep Field(...) metadata and default values (for
trust_boundary use the Literal default) so Pydantic v2 emits enum/const schema
constraints; update PackageDocsSource and PackageDocsResult type hints
accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/mcp_server_python_docs/services/persistent_cache.py`:
- Around line 28-30: The __init__ currently calls
_fingerprint_index(Path(index_path)) outside the try/except so a missing index
file raises FileNotFoundError and crashes startup; move the _fingerprint_index
call into the existing try block inside __init__, catch FileNotFoundError (and
any other relevant exceptions) and set the instance to the "cache disabled"
fallback (e.g., self._fingerprint = None or whatever the class uses) so
initialization continues; update references to the fingerprint elsewhere to
handle the disabled state.

---

Nitpick comments:
In `@src/mcp_server_python_docs/models.py`:
- Around line 171-190: The contract currently leaves PackageDocsSource.kind and
PackageDocsResult.trust_boundary as plain str; change their annotations to
Literal[...] to constrain allowed values (e.g., for kind use
Literal["docs","homepage","source","pypi"] and for trust_boundary use a
single-value Literal["pypi-declared-metadata"]), import Literal from typing, and
keep Field(...) metadata and default values (for trust_boundary use the Literal
default) so Pydantic v2 emits enum/const schema constraints; update
PackageDocsSource and PackageDocsResult type hints accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 6ccc7291-709a-43fa-abe2-4c04d3ed2eff

📥 Commits

Reviewing files that changed from the base of the PR and between 694c205 and a525e7b.

📒 Files selected for processing (11)
  • src/mcp_server_python_docs/models.py
  • src/mcp_server_python_docs/retrieval/ranker.py
  • src/mcp_server_python_docs/server.py
  • src/mcp_server_python_docs/services/package_docs.py
  • src/mcp_server_python_docs/services/persistent_cache.py
  • tests/fixtures/schema-search_docs-output.json
  • tests/test_mcp_get_docs_cache_smoke.py
  • tests/test_package_docs.py
  • tests/test_persistent_docs_cache.py
  • tests/test_retrieval.py
  • tests/test_services.py
✅ Files skipped from review due to trivial changes (1)
  • tests/fixtures/schema-search_docs-output.json
🚧 Files skipped from review as they are similar to previous changes (6)
  • tests/test_services.py
  • tests/test_mcp_get_docs_cache_smoke.py
  • tests/test_persistent_docs_cache.py
  • tests/test_package_docs.py
  • src/mcp_server_python_docs/server.py
  • src/mcp_server_python_docs/services/package_docs.py

Comment thread src/mcp_server_python_docs/services/persistent_cache.py Outdated
- persistent_cache: move _fingerprint_index() inside the try-except
  so a missing index.db disables the cache cleanly instead of raising
  FileNotFoundError out of the constructor and crashing server startup
  (CodeRabbit Major)

- models: tighten PackageDocsSource.kind and PackageDocsResult
  .trust_boundary to Literal types so invalid values fail validation
  at construction (CodeRabbit nitpick); add PackageKind alias and
  propagate it through package_docs._source signature, with a cast at
  the dynamic _ALLOWED-derived call site (runtime-safe — the allowlist
  enumeration matches the Literal exactly)

Tests: cache disables gracefully on missing index, PackageDocsSource
rejects unknown kind, PackageDocsResult rejects unknown trust_boundary.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ayhammouda ayhammouda merged commit 6321ff4 into main May 9, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cache retrieved docs across AI sessions by Python version add official PyPI package-docs lookup

1 participant