You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+22Lines changed: 22 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -81,6 +81,21 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
81
81
82
82
### Added
83
83
84
+
- **Pluggable text chunking strategies for `TxtParser`** (Issue #1348, alongside PR #1239): Introduced `opencontractserver/pipeline/parsers/text_chunkers.py` — a small registry-backed abstraction (`BaseTextChunker` + `TextChunk` + `get_chunker`) with three built-in strategies: `SentenceChunker` (spaCy `doc.sents`, preserves pre-#1348 behaviour and emits the existing `SENTENCE` label), `ParagraphChunker` (blank-line split with optional `min_chars` filter and `max_chars` oversize-paragraph fallback, emits `PARAGRAPH`), and `SlidingWindowChunker` (fixed-character window with configurable `overlap` and optional `respect_word_boundaries` snap, emits `WINDOW`). `TxtParser` now declares a `Settings` dataclass with a `chunkers: list[ChunkerSpec]` field (default `[{"name": "sentence"}]`) that can be overridden via `PipelineSettings` *or* per-call via a `chunkers=[...]` kwarg on `parse_document`; the parser iterates the configured strategies and emits one structural SPAN_LABEL annotation per chunk under each strategy's label, so stacked configurations (e.g. sentence + paragraph) index multiple retrieval granularities simultaneously. Motivates the benchmark work in #1239: the LegalBench-RAG `probe_recall_at_10` gap on `privacy_qa` (0.22 observed vs 0.5–0.8 paper floor) is the thesis for needing paragraph-granularity retrieval units, but this PR is strategy-neutral — which chunker wins for which subset is a follow-up optimisation to be driven by the benchmark harness itself. Regression coverage in `opencontractserver/tests/test_text_chunkers.py` (pure-Python, no Django DB) exercises offset/whitespace invariants, overlap arithmetic, word-boundary snapping, argument validation and registry lookup; `test_txt_ingestor_pipeline.py` gains two integration tests that parse the live fixture with a paragraph-only and a stacked paragraph+sliding_window recipe. Existing sentence-only ingestion path is unchanged.
85
+
-**Global post-retrieval reranker for vector search** (Issue #1349): Adds an optional cross-encoder reranking stage that runs after first-stage vector / hybrid retrieval, so OpenContracts can close the gap between vanilla HNSW recall and the accuracy achievable with a cross-encoder scoring pass.
86
+
- New abstract base class `opencontractserver.pipeline.base.reranker.BaseReranker` wired into the existing `PipelineComponentBase` settings machinery: concrete subclasses declare a `Settings` dataclass (loaded from `PipelineSettings` at runtime) and implement `_rerank_impl(query, passages, **kwargs)`. A default `_arerank_impl` wraps the sync implementation via `sync_to_async` so every backend has a working async path without duplicating logic.
87
+
- Fault-tolerant helpers `safe_rerank` / `safe_arerank` swallow reranker failures and return `None` so retrieval degrades gracefully to the first-stage ordering — critical because a misconfigured reranker must never take down semantic search.
88
+
- Four shipped backends in `opencontractserver/pipeline/rerankers/`:
89
+
-`NoopReranker` — identity pass-through for tests and benchmark control conditions.
90
+
-`CrossEncoderReranker` — in-process `sentence_transformers.CrossEncoder` (default `BAAI/bge-reranker-v2-m3` per the issue); lazy model load cached by `(model_name, device)` so workers pay the ~300 MB cost once and reuse it on every query. `sentence-transformers` / `torch` are treated as optional dependencies; a missing install surfaces a clear `ImportError` only when this backend is actually selected.
91
+
-`MicroserviceReranker` — HTTP client that mirrors the shape of `MicroserviceEmbedder` (URL, optional API key, Cloud-Run IAM auth, retry-friendly timeouts). Operators can run any reranker model behind a `/rerank` endpoint and point OpenContracts at it via `RERANKER_MICROSERVICE_URL` (+ secret `RERANKER_MICROSERVICE_API_KEY`).
92
+
-`CohereReranker` — hosted Rerank API (`rerank-v3.5` by default) via the REST endpoint directly (no hard dep on the `cohere` SDK). API key stored in the encrypted `PipelineSettings.encrypted_secrets` bag under `cohere_api_key` (env var `COHERE_API_KEY` at migration time).
93
+
- New `ComponentType.RERANKER` enum value and `rerankers/` auto-discovery in `opencontractserver.pipeline.registry`; `PipelineComponentRegistry` now exposes `.rerankers` / `get_all_rerankers_cached()` alongside parsers, embedders, thumbnailers, and post-processors.
94
+
-`PipelineSettings.default_reranker` (CharField, max_length=512, `documents/models.py:852-980`) — empty string disables reranking; any value is a full class path resolved at runtime. Seeded from `DEFAULT_RERANKER` Django setting at migration time (`documents/migrations/0037_add_default_reranker_to_pipeline_settings.py`). Helpers `get_default_reranker_path()` / `get_default_reranker_class()` / `get_default_reranker_instance()` in `opencontractserver.pipeline.utils`, with a process-local instance cache (cross-encoder model weights are expensive) invalidated via `invalidate_reranker_cache()` on every settings update.
95
+
-`CoreAnnotationVectorStore` (`opencontractserver/llms/vector_stores/core_vector_stores.py:120-1041`) now accepts an optional `reranker` override + `rerank_oversample_factor` kwarg. Every search path — `search`, `async_search`, `hybrid_search`, `async_hybrid_search`, `global_search`, `async_global_search` — oversamples candidates by `RERANK_OVERSAMPLE_FACTOR` (default 3× the requested `top_k`, hard-capped by `RERANK_MAX_CANDIDATES = 128`) when a reranker is active and re-orders results through `_apply_rerank` / `_aapply_rerank` before returning the final `top_k`. All new plumbing is a no-op when `default_reranker` is empty, so zero behavior change for existing deployments.
96
+
- GraphQL surface: `PipelineComponentsType.rerankers`, `PipelineSettingsType.default_reranker`, and `UpdatePipelineSettingsMutation.default_reranker` (validated against the registry, invalidates the reranker instance cache on change).
97
+
- New constants in `opencontractserver.constants.search`: `RERANK_OVERSAMPLE_FACTOR`, `RERANK_MAX_CANDIDATES`, `RERANK_DEFAULT_TOP_K`. New `RERANKER_REQUEST_TIMEOUT_SECONDS` in `opencontractserver.constants.document_processing`.
98
+
- Tests in `opencontractserver/tests/test_reranker.py` cover the base-class contract (sorting, top_k trim, out-of-range indices, max-candidates, async fallback), `safe_rerank` / `safe_arerank` fault-tolerance, all three HTTP backends with mocked `requests.post`, pipeline utility resolution + instance caching, registry auto-discovery, and vector-store integration (oversample factor, reranker failure fallback, re-ordering effects).
84
99
-**Mypy graduation: typed GraphQL resolvers, mutations, and filters** (Issue #1332): Raised return-annotation coverage in `config/graphql/` from ~4.8% at the start of #1331 to **91.5%** (421/460 function defs) and removed 22 modules from the `mypy.ini` baseline allow-list.
85
100
- **Root-cause annotation fixes in `opencontractserver/utils/permissioning.py`**: `set_permissions_for_obj_to_user`, `user_has_permission_for_obj`, `get_users_permissions_for_obj`, and `get_permission_id_to_name_map_for_model` were previously annotated with `instance: type[django.db.models.Model]` (a class) despite every call site passing an instance — and with `user: type[User]` instead of the `User` runtime instance. These were annotation bugs (the code was correct, the annotations were inverted), which compounded: every mutation calling `set_permissions_for_obj_to_user(user, obj, ...)` was a single `[arg-type]` error each. Corrected to `instance: django.db.models.Model` / `user: UserModel` (forward-referenced via `TYPE_CHECKING` import of `opencontractserver.users.models.User`). Also added the missing `dict[int, str]` annotation on `this_model_permission_id_map` and removed the `user_instance=User` (class) default on `get_users_group_ids`, which would have exploded at runtime if any caller ever omitted the argument. Module graduated out of the baseline.
86
101
-**Graduated from `mypy.ini` baseline** (22 modules): `config.graphql.{action_queries, agent_mutations, badge_mutations, base_types, conversation_mutations, conversation_types, corpus_types, document_queries, filters, ingestion_source_mutations, moderation_mutations, og_metadata_queries, pipeline_queries, security, serializers, slug_queries, smart_label_mutations, social_types, user_queries, user_types, voting_mutations}` and `opencontractserver.utils.permissioning`. Each had the underlying mypy errors fixed first (root-cause in `permissioning.py` cleared the `set_permissions_for_obj_to_user` cluster across every mutation file above).
@@ -351,6 +366,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
351
366
352
367
### Added
353
368
369
+
-**Benchmark harness for external RAG datasets** (new app `opencontractserver/benchmarks/`): Generate an OpenContracts corpus from a third-party benchmark (LegalBench-RAG today, pluggable for CUAD/MAUD/etc. via a small adapter interface), run the production extract-grid pipeline against the benchmark's queries with a configurable LLM, probe retrieval independently via `CoreAnnotationVectorStore`, and compute standard metrics (SQuAD-style exact match / token F1 for answers; character-span recall@k / precision@k / IoU for retrieval). Results are written as `report.json` / `report.csv` / `config.json` / `gold.json` under a run directory.
370
+
- Adapter interface and `LegalBenchRAGAdapter` at `opencontractserver/benchmarks/adapters/` (reads the authoritative ZeroEntropy schema — `{"tests": [{"query", "snippets": [{"file_path", "span": [start, end]}], "tags"}]}`)
371
+
- Loader, runner, evaluator, and report modules under `opencontractserver/benchmarks/`
- Micro fixture under `fixtures/benchmarks/legalbench_rag_micro/` for end-to-end tests without downloading the full dataset
374
+
- Test coverage: `opencontractserver/tests/test_benchmarks.py` (metric unit tests, adapter unit tests, loader materialization test, runner end-to-end test with mocked structured-response agent)
375
+
-**`model_override` kwarg on `doc_extract_query_task`** (`opencontractserver/tasks/data_extract_tasks.py`): Optional, backward-compatible kwarg that lets callers override the hardcoded `openai:gpt-4o-mini` default for a single invocation. Consumed by the benchmark runner to sweep models without affecting production defaults; still defaults to `openai:gpt-4o-mini` when not supplied.
354
376
-**Frontend unit tests for utils and hooks** (Issue #1267): Added 14 new `*.test.ts(x)` files covering previously-untested utilities and hooks to raise `frontend-unit` coverage on high-ROI pure functions:
0 commit comments