LCORE-1333: Define max chunks to retrieve for RAG#1765
Conversation
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (7)
✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
16cbaa5 to
6fbe0e9
Compare
Add test_responses_byok_integration.py with integration tests covering inline RAG, tool RAG, combined RAG, score multiplier, chunk capping, and RAG_CONTENT_LIMIT enforcement for the /responses endpoint. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add RAG_CONTENT_LIMIT constant (default: 10) that caps the final merged BYOK + OKP output from build_rag_context. Per-source constants remain as fetch hints for the reranking pool. Tool RAG is unaffected. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
TamiTakamiya
left a comment
There was a problem hiding this comment.
LGTM. Thanks for providing the clear PR description.
| if (t.get("type") if isinstance(t, dict) else getattr(t, "type", None)) | ||
| == "file_search" | ||
| ] | ||
| assert len(file_search_tools) == 1 |
There was a problem hiding this comment.
vs_ids = (
file_search_tools[0].get("vector_store_ids")
if isinstance(file_search_tools[0], dict)
else file_search_tools[0].vector_store_ids
)
assert vs_ids == ["vs-byok-knowledge"]
would check that "- The tool includes the configured vector store ID"
| | `OKP_RAG_MAX_CHUNKS` | 5 | Max chunks retrieved from OKP (Inline RAG) | | ||
| | `BYOK_RAG_MAX_CHUNKS` | 10 | Max chunks retrieved from BYOK stores (Inline RAG) | | ||
| | `TOOL_RAG_MAX_CHUNKS` | 10 | Max chunks retrieved via Tool RAG (`file_search`) | | ||
| | `RAG_CONTENT_LIMIT` | 10 | Hard upper bound on the final merged inline RAG chunks (BYOK + OKP) delivered to the LLM | |
There was a problem hiding this comment.
The word "default" implies there's an override — but there isn't. May be replace Default with Value? The language is a bit misleading for someone reading the docs expecting a config knob.
There was a problem hiding this comment.
May be we should turn these into config knobs.
There was a problem hiding this comment.
The existing naming convention is:
BYOK_RAG_MAX_CHUNKS
OKP_RAG_MAX_CHUNKS
TOOL_RAG_MAX_CHUNKS
All follow the pattern
Something like INLINE_RAG_MAX_CHUNKS or MERGED_RAG_MAX_CHUNKS ?
Description
Add
RAG_CONTENT_LIMITconstant (default: 10) toconstants.pythat caps the final merged BYOK + OKP inline RAG output frombuild_rag_context. Per-source constants (BYOK_RAG_MAX_CHUNKS,OKP_RAG_MAX_CHUNKS) remain as fetch hints for the reranking pool — users can increase them to feed more candidates to the cross-encoder reranker before the final cap is applied.Tool RAG keeps its own
TOOL_RAG_MAX_CHUNKSconstant independently so that users can set different max chunk limits for tool RAG and inline RAG separately.Also adds
test_responses_byok_integration.py— the/responsesendpoint had no BYOK RAG integration tests. The new file covers inline RAG, tool RAG, combined RAG, score multiplier, chunk capping, andRAG_CONTENT_LIMITenforcement, matching the existing test coverage for/queryand/streaming_query.Type of change
Tools used to create PR
Related Tickets & Documents
Checklist before requesting a review
Testing
uv run make verifypasses all lintersRAG_CONTENT_LIMITcaps inline RAG chunks when set below per-source constants (patched to 3, feeds 10 chunks, asserts cap at 3)RAG_CONTENT_LIMIT