You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+14Lines changed: 14 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,6 +23,16 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
23
23
24
24
### Fixed
25
25
26
+
- **`CreateCorpusActionModal` opened with the wrong default agent instructions for document triggers** (Issue #1385, `frontend/src/components/corpuses/CreateCorpusActionModal.tsx:136-144,168-171`): the `inlineAgentInstructions` state was initialised with `DEFAULT_MODERATOR_INSTRUCTIONS` even though the default trigger is `add_document` (a document trigger). The trigger-change handler at line 611 swaps to `DEFAULT_DOCUMENT_AGENT_INSTRUCTIONS`, but a user who created an inline agent on the default-selected trigger without first re-selecting the trigger would submit the moderator copy as the new agent's system instructions. Initialised both the `useState` default and `resetForm()` to `DEFAULT_DOCUMENT_AGENT_INSTRUCTIONS` so the pre-interaction value matches the default trigger. Updated `frontend/tests/CreateCorpusActionModal.ct.tsx` "inline-agent create: full happy path" mutation mock to expect `DEFAULT_DOCUMENT_AGENT_INSTRUCTIONS` — the previous mock variable masked this bug because `MockedProvider` was matching the stale moderator default rather than the trigger-appropriate one.
27
+
28
+
### Changed
29
+
30
+
-**Test/type cleanup follow-ups from the PR #1383 review** (Issue #1385):
31
+
- Pinned the `isProcessing` contract for SYNC_CONTENT in `frontend/tests/CorpusChat.ct.tsx` "SYNC_CONTENT renders a complete message immediately": added an `expect(input).toBeEnabled()` assertion after the reply renders, locking the documented invariant that `setIsProcessing(true)` is owned solely by `ASYNC_START` and that a SYNC_CONTENT-only reply must never disable the input.
32
+
- Consolidated the duplicated `::: oc-component` fence dispatcher: extracted `OcComponentBlock` interface and a new `buildOcComponentCustomBlocks(renderMarkdown)` helper into `frontend/src/utils/camlComponents.ts`. Both `frontend/src/hooks/useCamlComponentRenderer.tsx` and `frontend/src/components/corpuses/caml/CamlDirectiveRenderer.tsx` now share the same helper instead of each casting `block` independently.
33
+
- Replaced `route: any` and `page: any` escape hatches with the proper `Route` and `Page` types from `@playwright/test` in `frontend/tests/CorpusDescriptionEditor.ct.tsx` (`setupMdRoute` and the abort-route test).
34
+
- Migrated `.version-number` CSS-class locators in `frontend/tests/CorpusDescriptionEditor.ct.tsx` to a semantic `data-testid="version-number"` matcher (`page.getByTestId("version-number")`); added the test id to the rendered version-row in `frontend/src/components/corpuses/CorpusDescriptionEditor.tsx`.
35
+
26
36
-**`test_superuser_sees_all_queryset` miscounts personal corpuses by 1** (Issue #1394, `opencontractserver/tests/test_visibility_managers.py`, `opencontractserver/tests/test_resolvers.py`): Two `VisibleToUserTests.test_superuser_sees_all_queryset` cases asserted that `Corpus.objects.visible_to_user(superuser).count() == 4` (public + private + 2 personal), but the actual count is 5 because the test DB starts with a pre-existing personal corpus owned by django-guardian's `AnonymousUser` (created during fixture setup before/around the username-based skip in `opencontractserver/users/signals.py::user_created_signal`). The assertion is now scoped to corpuses created by the test's two users (`creator__in=[self.user, self.superuser]`), making it resilient to any fixture-level corpuses that exist at test DB init time. Production code is unchanged.
27
37
- **Merged `frontend` Codecov flag drops to ~33% on every commit where Frontend CI's CT job fails** (`frontend/package.json` `test:coverage:ct`): the script chained `playwright test ... && mkdir -p ... && nyc report ...`, so a failing CT run short-circuited before `nyc report` could turn the per-test JSON files in `.nyc_output` into an `lcov.info`. The downstream `Upload CT Coverage to Codecov` step (`if: success() || failure()`) then errored with "No coverage reports found" and `frontend-component` did not upload for that SHA. Codecov's server-side aggregation of the `frontend` flag was left with only `frontend-unit` (~23%) and `frontend-e2e` (~24%), pulling the merged number down to ~33% even though the previous commit was at ~67% — observed on six consecutive main commits 2026-04-26T01:02..02:58Z (`2d7033f8`..`be5bcfc8`) before recovering on `30298391`. Mirrored the existing `test:e2e:coverage` pattern (`; CT_EXIT=$?; nyc report ... || echo "No coverage data to report"; exit $CT_EXIT`) so `nyc report` runs regardless of test outcome and the lcov ships even on red CT runs. `frontend-component` will still report a slightly lower number when tests fail (failed tests register fewer hits), but it will report — keeping the merged `frontend` flag's denominator stable.
28
38
-**`User.__init__` shared-state mutation re-introduced by branch merge** (`opencontractserver/users/models.py:172-180` removed): PR #1374 (commit `50ed6740`) deleted the `User.__init__` override that mutated `Field.validators[0]` on every instantiation, but a subsequent merge (`b68c1cb4 → 6d2cddbf`) resurrected the override along with its mypy-narrowing changes. The current main on commit `6d2cddbf` therefore reproduced the original `#1358` bug: `User(...)` rebound `username_field.validators[0]` and clobbered any third-party validator prepended to the list. Removed the `__init__` override entirely; the class-body declaration `validators=[UserUnicodeUsernameValidator()]` on the `username` field (still present from PR #1374) is the canonical and only declaration. Also dropped the now-unused `Field` import. Regression coverage from PR #1374 (`opencontractserver/tests/test_user_username_validator.py`) was already on main and is what surfaced the regression in CI.
@@ -37,6 +47,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
37
47
- Regression coverage: `opencontractserver/tests/test_corpus_isolation_vector_store.py` — six tests covering cross-corpus leak, deletion-aware drop, orphan-set leak, document-scoped retrieval still returns structural rows, viewer-without-doc-permission excluded, creator still sees own row.
38
48
-**Test-only**: `opencontractserver/tests/test_pydantic_ai_agents.py`, `opencontractserver/tests/test_structural_annotation_portability.py` — `Document.objects.create(...)` calls in `TransactionTestCase` setUp now pass `processing_started=timezone.now()` to short-circuit `process_doc_on_create_atomic`, which would otherwise eagerly chain a Celery PDF-ingest task that fails on the (file-less) test document and aborts the whole test class. Pre-existing failure, exposed cleanly when the regression suite was added.
39
49
50
+
### Fixed
51
+
52
+
- **`Embedding.embedder_path` could be NULL but was typed `str`** (Issue #1357, `opencontractserver/annotations/models.py:461-465`, `opencontractserver/annotations/models.py:584-585`, `opencontractserver/annotations/migrations/0068_enforce_embedder_path_not_null.py`): The Django field was declared `null=True, blank=True` while the Python annotation claimed `str`, causing a long-standing mypy `assignment` error and — more importantly — silently gutting the partial unique constraints added in migration 0059. Each `unique_embedding_per_{document,annotation,note,conversation,message}_embedder` constraint is conditioned on `<parent>__isnull=False` and keys on `(embedder_path, <parent>)`, so any row with `embedder_path IS NULL` bypassed duplicate prevention for its parent. Every production code path that creates an `Embedding` (`Embedding.objects.store_embedding()`, `HasEmbeddingMixin.add_embedding()`, `worker_uploads._store_embeddings()`) already supplies a concrete `embedder_path` or skips creation when empty, so enforcing non-null at the DB level matches actual behaviour rather than constraining it. New migration 0068 backfills any legacy NULL rows with `settings.DEFAULT_EMBEDDER` (deleting rows that would collide with an existing `(default_embedder_path, parent)` row under the partial unique constraint — they were previously unreachable via any query path since all call sites filter on a concrete embedder path), then `AlterField`s the column to `NOT NULL`. Removed the now-unreachable `or 'Unknown Model'` fallback in `Embedding.__str__`. Migration runs with `atomic = False` so the RunPython backfill commits before `AlterField` takes the `ACCESS EXCLUSIVE` lock to set `NOT NULL`, matching the pattern established by migration 0059.
53
+
40
54
### Added
41
55
42
56
- **Pluggable text chunking strategies for `TxtParser`** (Issue #1348, alongside PR #1239): Introduced `opencontractserver/pipeline/parsers/text_chunkers.py` — a small registry-backed abstraction (`BaseTextChunker` + `TextChunk` + `get_chunker`) with three built-in strategies: `SentenceChunker` (spaCy `doc.sents`, preserves pre-#1348 behaviour and emits the existing `SENTENCE` label), `ParagraphChunker` (blank-line split with optional `min_chars` filter and `max_chars` oversize-paragraph fallback, emits `PARAGRAPH`), and `SlidingWindowChunker` (fixed-character window with configurable `overlap` and optional `respect_word_boundaries` snap, emits `WINDOW`). `TxtParser` now declares a `Settings` dataclass with a `chunkers: list[ChunkerSpec]` field (default `[{"name": "sentence"}]`) that can be overridden via `PipelineSettings` *or* per-call via a `chunkers=[...]` kwarg on `parse_document`; the parser iterates the configured strategies and emits one structural SPAN_LABEL annotation per chunk under each strategy's label, so stacked configurations (e.g. sentence + paragraph) index multiple retrieval granularities simultaneously. Motivates the benchmark work in #1239: the LegalBench-RAG `probe_recall_at_10` gap on `privacy_qa` (0.22 observed vs 0.5–0.8 paper floor) is the thesis for needing paragraph-granularity retrieval units, but this PR is strategy-neutral — which chunker wins for which subset is a follow-up optimisation to be driven by the benchmark harness itself. Regression coverage in `opencontractserver/tests/test_text_chunkers.py` (pure-Python, no Django DB) exercises offset/whitespace invariants, overlap arithmetic, word-boundary snapping, argument validation and registry lookup; `test_txt_ingestor_pipeline.py` gains two integration tests that parse the live fixture with a paragraph-only and a stacked paragraph+sliding_window recipe. Existing sentence-only ingestion path is unchanged.
0 commit comments