Problem Statement
When auto-recall is used with reranker enabled, the hierarchical retriever sends all collected candidates to the reranker with no upper bound on document count or document size. In practice:
Recursive search can collect tens to hundreds of candidates
L2 (full file content) abstracts can be 1,000–5,000+ chars each (even after the recent L2 exclusion fix in #2741, L0/L1 abstracts still vary widely)
Total rerank input can easily exceed the model's batch size limit (e.g. 512 tokens), causing the entire rerank batch to fail and fall back to vector scores
Even when it doesn't fail, reranking a large batch is slow — unacceptable for auto-recall where latency matters more than perfect ranking
The current all-or-nothing approach means users must choose between:
Rerank off → fast but dramatically lower recall quality
Rerank on → potentially slow or silently falling back to vector scores when batch limits are exceeded
Proposed Solution
Add two optional configuration parameters under the rerank section of ov.conf:
Copy
{
"rerank": {
"provider": "vikingdb",
"ak": "...",
"sk": "...",
"model_name": "doubao-seed-rerank",
"threshold": 0.1,
"max_docs": 20, // NEW: max documents sent to reranker per batch
"max_chars_per_doc": 500 // NEW: truncate each document's abstract to N chars before reranking
}
}
rerank.max_docs (default: 0 = unlimited, current behavior)
Caps the number of documents submitted to the reranker in a single rerank_batch call
When the collected candidate count exceeds this limit, only the top-N by vector score are sent to the reranker
The remaining candidates keep their vector fallback scores
rerank.max_chars_per_doc (default: 0 = no truncation, current behavior)
Truncates each document's abstract text to N characters before sending to the reranker
Prevents oversized L0/L1 abstracts from causing batch failures
The truncated text is only used for reranking; the full abstract is still returned in results
Alternatives Considered
Relying on score_threshold alone — doesn't limit reranker input size, only filters output. A low threshold still sends everything.
Using level=[0,1] on every API call — works but requires API-level changes and doesn't cap the total count. Also not configurable in ov.conf.
Modifying GLOBAL_SEARCH_TOPK in source code — not configurable, requires fork, and only indirectly limits candidates.
Client-side truncation — each integration (OpenClaw plugin, SDK users) would need to implement their own limiting, leading to inconsistent behavior.
Feature Area
Retrieval/Search
Use Case
Auto-recall in AI agents (e.g. OpenClaw): When an agent calls memory_recall or search() with reranker enabled, the response must be fast (sub-second) to maintain interactive feel. Perfect rerank accuracy is not needed — the top 10–20 candidates with rerank scoring is far better than pure vector scores, but sending 100+ documents to the reranker adds unacceptable latency. With max_docs: 20 and max_chars_per_doc: 500, the reranker processes a bounded, predictable payload every time, trading marginal accuracy for consistent low latency.
Example API (Optional)
# Config-driven: no code changes needed
config = {
"rerank": {
"provider": "vikingdb",
"ak": "...",
"sk": "...",
"max_docs": 20,
"max_chars_per_doc": 500
}
}
# In hierarchical_retriever.py, _rerank_scores would:
# 1. Truncate each document to max_chars_per_doc chars
# 2. If len(documents) > max_docs, take only top-N by vector score
# 3. Rerank the bounded batch
# 4. Scatter scores back to original indices (unranked docs keep vector scores)
Additional Context
This is related to the existing issue #2739 (L2 abstracts exceeding batch size) and its fix #2741. The proposed solution is more general — it addresses the root cause (unbounded rerank input) rather than just one specific case (L2 content). It also gives users a predictable latency budget for reranking in latency-sensitive paths like auto-recall.
Contribution
Problem Statement
When auto-recall is used with reranker enabled, the hierarchical retriever sends all collected candidates to the reranker with no upper bound on document count or document size. In practice:
Recursive search can collect tens to hundreds of candidates
L2 (full file content) abstracts can be 1,000–5,000+ chars each (even after the recent L2 exclusion fix in #2741, L0/L1 abstracts still vary widely)
Total rerank input can easily exceed the model's batch size limit (e.g. 512 tokens), causing the entire rerank batch to fail and fall back to vector scores
Even when it doesn't fail, reranking a large batch is slow — unacceptable for auto-recall where latency matters more than perfect ranking
The current all-or-nothing approach means users must choose between:
Rerank off → fast but dramatically lower recall quality
Rerank on → potentially slow or silently falling back to vector scores when batch limits are exceeded
Proposed Solution
Add two optional configuration parameters under the rerank section of ov.conf:
Copy
{
"rerank": {
"provider": "vikingdb",
"ak": "...",
"sk": "...",
"model_name": "doubao-seed-rerank",
"threshold": 0.1,
"max_docs": 20, // NEW: max documents sent to reranker per batch
"max_chars_per_doc": 500 // NEW: truncate each document's abstract to N chars before reranking
}
}
rerank.max_docs (default: 0 = unlimited, current behavior)
Caps the number of documents submitted to the reranker in a single rerank_batch call
When the collected candidate count exceeds this limit, only the top-N by vector score are sent to the reranker
The remaining candidates keep their vector fallback scores
rerank.max_chars_per_doc (default: 0 = no truncation, current behavior)
Truncates each document's abstract text to N characters before sending to the reranker
Prevents oversized L0/L1 abstracts from causing batch failures
The truncated text is only used for reranking; the full abstract is still returned in results
Alternatives Considered
Relying on score_threshold alone — doesn't limit reranker input size, only filters output. A low threshold still sends everything.
Using level=[0,1] on every API call — works but requires API-level changes and doesn't cap the total count. Also not configurable in ov.conf.
Modifying GLOBAL_SEARCH_TOPK in source code — not configurable, requires fork, and only indirectly limits candidates.
Client-side truncation — each integration (OpenClaw plugin, SDK users) would need to implement their own limiting, leading to inconsistent behavior.
Feature Area
Retrieval/Search
Use Case
Auto-recall in AI agents (e.g. OpenClaw): When an agent calls memory_recall or search() with reranker enabled, the response must be fast (sub-second) to maintain interactive feel. Perfect rerank accuracy is not needed — the top 10–20 candidates with rerank scoring is far better than pure vector scores, but sending 100+ documents to the reranker adds unacceptable latency. With max_docs: 20 and max_chars_per_doc: 500, the reranker processes a bounded, predictable payload every time, trading marginal accuracy for consistent low latency.
Example API (Optional)
Additional Context
This is related to the existing issue #2739 (L2 abstracts exceeding batch size) and its fix #2741. The proposed solution is more general — it addresses the root cause (unbounded rerank input) rather than just one specific case (L2 content). It also gives users a predictable latency budget for reranking in latency-sensitive paths like auto-recall.
Contribution