Skip to content

Commit fb62757

Browse files
JSv4claude
andauthored
Fix O(N) query regression in bulk folder operations (#1199) (#1237)
* Batch bulk folder operations to restore O(1) query profile Fix issue #1199: move_documents_to_folder() and delete_folder() regressed to ~3 DB round-trips per document after PR #1195 added DocumentPath history nodes. A 100-document bulk move issued ~300 queries instead of the single .update() the pre-lineage code used. Both methods now: 1. Pre-fetch occupied target-directory paths in a single query via the new _fetch_occupied_paths_in_directory() helper and share a mutable set across all disambiguations via _disambiguate_path's new occupied_override kwarg (replaces the single-use extra_occupied). 2. Batch-deactivate superseded rows with .filter(pk__in=...).update(). 3. Batch-insert successor rows with bulk_create(), then manually dispatch post_save(created=True) via the new _dispatch_document_path_created_signals() helper so the document-text embedding side effect in documents.signals.process_doc_on_document_path_create still fires. 4. Use select_related("document") + select_for_update(of=("self",)) on the affected-path query to eliminate an N+1 on current.document and scope row locks to the DocumentPath table. Net result: a 100-document bulk move is ~4 DB round-trips instead of ~300, with full Path Tree history preserved. Tests were updated to match the batched call shape: - The two monkeypatches of _disambiguate_path now use **kwargs so they pass through the new occupied_override parameter transparently. - The two IntegrityError rollback tests for bulk move now patch bulk_create instead of create. * Address review: add index-efficiency comment, contract docs, rename test - Add comment noting regex filters in _fetch_occupied_paths_in_directory cannot use btree indexes and may benefit from GIN/pg_trgm for large directory listings. - Add read-only contract comment on occupied_override parameter in _disambiguate_path to prevent accidental in-method mutation. - Document intentional update_fields=None omission in _dispatch_document_path_created_signals matching Model.save() semantics. - Rename test_delete_folder_rolls_back_successful_relocations_on_later_failure to test_delete_folder_rolls_back_on_planning_failure with updated docstring reflecting that failure now occurs in the path-planning phase before writes. * Address review: scope row lock, match native signal kwargs, add dispatch test - Add of=("self",) to move_document_to_folder's select_for_update for consistency with bulk methods (prevents accidental Document row locks) - Supply raw=False and using=path._state.db in manual post_save dispatch to match Django's native Model.save() kwargs - Add logger.warning for empty-directory fallback in _fetch_occupied_paths_in_directory to surface malformed paths early - Clarify _target_directory_string docstring re: strip("/") rationale - Add test_bulk_move_dispatches_post_save_for_each_created_path to verify manual signal dispatch fires exactly N times with correct kwargs * Address review: guard empty paths, raise on empty directory, add delete_folder signal test - _target_directory_string: raise ValueError if CorpusFolder.get_path() returns empty string, preventing "//" directory which would match all paths instead of the intended folder - _fetch_occupied_paths_in_directory: upgrade empty-directory fallback from a warning (which silently loads ALL active paths) to a ValueError, failing fast on malformed input rather than masking bugs - Add test_delete_folder_dispatches_post_save_for_each_created_path to verify signal dispatch parity with the existing bulk-move signal test * Fix 4 test failures: cache get_path() result, update no-slash path tests - move_documents_to_folder now calls get_path() exactly once and derives both the target directory string and the compute-moved-path argument from the cached value via new _target_directory_string_from_path helper - TestCoverageGap_DisambiguateNoSlashPath tests updated to expect ValueError since _fetch_occupied_paths_in_directory now intentionally rejects empty directory strings (commit 62cfc7d) * Address review: add update_fields to signal dispatch, new test coverage - Add explicit update_fields=None to post_save.send() in _dispatch_document_path_created_signals to match Django Model.save() dispatch signature and prevent TypeError in future handlers - Add update_fields assertion to both signal dispatch tests - Add TestCoverageGap_DeleteFolderIntegrityErrorRollback: verifies IntegrityError during delete_folder's bulk_create rolls back the deactivation update and folder deletion - Add TestCoverageGap_TargetDirectoryStringEmptyPath: exercises the ValueError guard when get_path() returns empty string * Address review: consolidate _target_directory_string methods, add leading-slash guard - Consolidate _target_directory_string to delegate to _target_directory_string_from_path, eliminating duplicated path-normalization logic (DRY) - Add early leading-slash guard in _disambiguate_path so callers that pass slashless paths get a clear error at the point of call rather than a confusing ValueError from _fetch_occupied_paths_in_directory - Remove dead else branch (directory = "") in _disambiguate_path now that the guard makes it unreachable * Address review feedback: remove dead code, improve type safety, simplify tests - Remove unused _target_directory_string wrapper (dead code per CLAUDE.md rules); update its test class to exercise _target_directory_string_from_path directly with additional edge-case coverage (empty, slash-only, normal path). - Fix stale comment referencing removed _target_directory_string method. - Tighten planned_paths type annotation from list[tuple] to list[tuple[DocumentPath, str]] for static analysis and readability. - Simplify fragile call_args inspection in signal dispatch tests: replace multi-branch kwargs extraction with direct tuple unpacking. * Fix 3 failing tests and stale docstring in bulk folder operations Remove TestBulkMoveIntegrityRecovery and TestDeleteFolderIntegrityRecovery test classes that mocked DocumentPath.objects.create but the implementation uses bulk_create. These tests tested per-row retry logic from the old sequential approach that no longer exists in the batch code path. The equivalent rollback behavior is already covered by TestCoverageGapBulkMoveIntegrityErrorRollback and TestCoverageGap_DeleteFolderIntegrityErrorRollback which correctly mock bulk_create. Also update the move_documents_to_folder docstring to accurately describe the two-phase plan-then-execute approach and all-or-nothing TOCTOU semantics, replacing stale references to the old sequential approach and _create_successor_path_with_retry. Add corpus_id to delete_folder error log for consistency with move_documents_to_folder error logging. * Fix codecov config: rename file to files param, ignore test/config paths * Fix test class naming inconsistency and update stale changelog references - Rename TestCoverageGap_DeleteFolderIntegrityErrorRollback and TestCoverageGap_TargetDirectoryStringFromPathEdgeCases to drop underscore separators, matching the existing TestCoverageGapXxx convention used by other test classes in this PR. - Update CHANGELOG.md Issue #1200 entry to reflect that TestBulkMoveIntegrityRecovery and TestDeleteFolderIntegrityRecovery were replaced by TestCoverageGapBulkMoveIntegrityErrorRollback and TestCoverageGapDeleteFolderIntegrityErrorRollback (bulk operations now use batch approach instead of per-row retry). * Address review feedback on bulk folder ops - Document that _dispatch_document_path_created_signals only replays post_save (pre_save remains skipped like bulk_create itself); add a maintainer note to extend it if pre_save receivers are added. - Normalise '/' to the root directory string in _target_directory_string_from_path so callers that resolve a root-equivalent folder via CorpusFolder.get_path() don't get a ValueError. Empty strings still raise — they indicate upstream bugs. Updated the corresponding TestCoverageGapTargetDirectoryStringFromPathEdgeCases test (slash no longer raises, now normalises to '/'). - Clarify in _disambiguate_path docstring that extra_occupied is now only consumed by the single-document retry loop in _create_successor_path_with_retry; bulk callers use occupied_override. - Hint in the bulk_move / delete_folder rolled-back error strings that the whole batch / deletion is safe to retry. - Rename TestCoverageGap_BulkMoveGetPathCallCount to match the TestCoverageGap* naming convention used by its siblings. * Address PR #1237 review: ordering invariants, pre_save guard, perf note Addresses the most recent Claude bot review on PR #1237: 1. Document the fetch-before-deactivate ORDERING INVARIANT at both bulk call sites (``delete_folder`` and ``move_documents_to_folder``). The shared ``occupied_paths`` set must be populated while the to-be- superseded rows are still ``is_current=True`` — ``_disambiguate_path`` silently ignores ``exclude_pk`` when ``occupied_override`` is provided, so reordering would cause the batch to re-claim its own source paths. Call-site comments prevent accidental reordering during future edits. 2. Add a regression guard test (``test_document_path_has_no_pre_save_receivers``) that fails loudly if a ``pre_save`` receiver is ever connected to ``DocumentPath``. ``_dispatch_document_path_created_signals`` only replays ``post_save``, matching ``bulk_create``'s contract and the current (empty) pre_save receiver set. The guard makes any future wiring of pre_save handlers a hard-fail so the dispatch helper is extended intentionally. 3. Expand the existing regex-performance note on ``_fetch_occupied_paths_in_directory`` with a ``TODO(perf, #1199 follow-up)`` marker so the follow-up work has a traceable tag if profiling later shows the regex scan as a bottleneck. The TOCTOU-retry concern raised in the review is out of scope for this PR — it requires caller-side retry logic in the GraphQL mutation layer (``MoveDocumentsToFolderMutation``, ``DeleteCorpusFolderMutation``) and is tracked separately. The existing "safe to retry" wording in error messages remains accurate guidance for the caller. * Unblock mypy CI: suppress django-stubs 6.0.3 ValuesQuerySet false positives The values_list + unpack pattern introduced in main commit 41eb6ae still trips mypy 1.20.1 + django-stubs 6.0.3 with 'object is not iterable' and 'Cannot determine type' errors. Main's Backend CI is red for the same reason. Suppress narrowly on the two affected lines with type: ignore; runtime behaviour is correct. * Address review: drop dead in-memory sync loops, revert unrelated has-type ignore --------- Signed-off-by: JSIV <5049984+JSv4@users.noreply.github.com> Co-authored-by: Claude <noreply@anthropic.com>
1 parent 2d7033f commit fb62757

3 files changed

Lines changed: 613 additions & 365 deletions

File tree

CHANGELOG.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -282,7 +282,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
282282
- **Rules-of-Hooks violation in `UserProfileRoute`** (Issue #1295): `frontend/src/components/routes/UserProfileRoute.tsx` called `useQuery` _after_ a conditional early return for the `!slug` redirect case, so when the same component fiber was reused across the `/profile``/users/:slug` redirect (e.g. both routes rendering `UserProfileRoute` in a single `Routes` tree) React threw `Rendered more hooks than during the previous render`. Production flows that unmount/remount across the redirect masked the bug, but any future refactor that kept the fiber alive would start crashing, and test suites that mounted both routes together could not exercise the full redirect → render path. Moved `useQuery` above the early-return block and pass `{ slug: slug ?? "" }` as variables while keeping the existing `skip: !slug` gate on the network call.
283283
- **`PdfAnnotations.undoAnnotation()` mutated the source array** (Issue #1291): `frontend/src/components/annotator/types/annotations.ts` called `this.annotations.pop()` before constructing the returned instance, silently mutating the caller's `annotations` array even though the field is declared `readonly`. Every other method on this reducer-style class returns without touching the input, so the leak was a footgun for downstream consumers of the upcoming PdfAnnotator package extraction (Issue #1283) — any reference to the pre-undo instance would observe a stale shorter array without re-rendering. Replaced `.pop()` with a non-mutating `slice(0, -1)` + index read. Updated `frontend/src/components/annotator/types/__tests__/annotations.test.ts` to remove the pinning comment and add an explicit assertion that the original instance's `annotations` array is unchanged after `undoAnnotation()`.
284284
- **`RelationGroup.updateForAnnotationDeletion` never pruned orphaned relations** (Issue #1292): `frontend/src/components/annotator/types/annotations.ts:49-50` computed `nowSourceEmpty` / `nowTargetEmpty` against `this.sourceIds` / `this.targetIds` (the **pre-filter** arrays) instead of the `newSourceIds` / `newTargetIds` arrays produced a few lines above. Because the "now empty" flags were identical to the "was empty" flags, all four `return undefined` branches intended to prune orphaned relations were unreachable, and the method always returned a fresh `RelationGroup` — even when `PdfAnnotations.undoAnnotation()` had just removed the relation's only source or target annotation. That left zombie relations in `PdfAnnotations.relations` with empty `sourceIds` or `targetIds` and required defensive handling downstream. Fix swaps the two comparisons to the post-filter arrays so the documented deletion conditions fire correctly. `frontend/src/components/annotator/types/__tests__/annotations.test.ts` drops the bug-pinning header note and replaces the ambiguous "prunes any relations" assertion with two explicit cases: the survivor is kept when it still has a source and a target, and a relation whose only target references the popped annotation is now removed (21/21 unit tests pass).
285-
- **TOCTOU race on `DocumentPath` uniqueness** (Issue #1200): `DocumentFolderService.move_document_to_folder()`, `move_documents_to_folder()`, and `delete_folder()` previously caught `IntegrityError` from the `unique_active_path_per_corpus` partial unique constraint and either bubbled it up to the caller as a "Path conflict, please retry" error (single move) or rolled back the entire batch (bulk move / folder delete). Under concurrent moves of different documents to the same target folder, two transactions could both observe a candidate path as free in `_disambiguate_path()` and race to insert it; the loser hit the partial unique index and the operation failed. New helper `DocumentFolderService._create_successor_path_with_retry()` (`opencontractserver/corpuses/folder_service.py`) wraps the deactivate-then-create pair in a savepoint and retries with a freshly disambiguated path on `IntegrityError`, treating each losing path as occupied. Up to `MAX_PATH_CREATE_RETRIES + 1` attempts (`opencontractserver/constants/document_processing.py`) run before propagating the conflict. The partial unique index added in migration `0023_documentpath_documentpathgroupobjectpermission_and_more` remains the authoritative correctness guarantee. New test classes `TestMoveDocumentIntegrityRecovery`, `TestBulkMoveIntegrityRecovery`, and `TestDeleteFolderIntegrityRecovery` cover transient-failure recovery, disambiguated retry path selection, and exhausted-retry rollback (`opencontractserver/tests/test_document_folder_service.py`).
285+
- **O(N) query regression in bulk folder operations** (Issue #1199): `DocumentFolderService.move_documents_to_folder()` and `DocumentFolderService.delete_folder()` in `opencontractserver/corpuses/folder_service.py` previously issued ~3 DB round-trips per document (an EXISTS-style disambiguation fetch, an `UPDATE` via `save(update_fields=…)`, and an `INSERT` via `DocumentPath.objects.create()`). For a 100-document batch that was ~300 queries instead of the single-query `.update()` the old non-lineage code used. Both methods now:
286+
1. Pre-fetch all occupied paths in the target directory in a **single** query via the new `_fetch_occupied_paths_in_directory` helper and pass the shared mutable set to `_disambiguate_path` via a new `occupied_override` parameter (replaces the single-purpose `extra_occupied` kwarg).
287+
2. Batch-deactivate every superseded `DocumentPath` row with one `.filter(pk__in=…).update(is_current=False)` call.
288+
3. Batch-insert every successor row with one `.bulk_create()` call, then manually dispatch `post_save` (`created=True`) via the new `_dispatch_document_path_created_signals` helper so the document-text embedding side effect wired up in `documents/signals.py::process_doc_on_document_path_create` still fires (bulk_create normally bypasses per-row signals).
289+
4. Use `select_related("document")` + `select_for_update(of=("self",))` on the affected-path query so `current.document` accesses inside the build loop no longer N+1, and the row lock stays scoped to the `DocumentPath` table. Net result: a 100-document bulk move now executes roughly 4 DB round-trips instead of ~300, and the old batched `.update()` performance is restored without sacrificing the Path Tree history nodes introduced in PR #1195.
290+
- **TOCTOU race on `DocumentPath` uniqueness** (Issue #1200): `DocumentFolderService.move_document_to_folder()`, `move_documents_to_folder()`, and `delete_folder()` previously caught `IntegrityError` from the `unique_active_path_per_corpus` partial unique constraint and either bubbled it up to the caller as a "Path conflict, please retry" error (single move) or rolled back the entire batch (bulk move / folder delete). Under concurrent moves of different documents to the same target folder, two transactions could both observe a candidate path as free in `_disambiguate_path()` and race to insert it; the loser hit the partial unique index and the operation failed. New helper `DocumentFolderService._create_successor_path_with_retry()` (`opencontractserver/corpuses/folder_service.py`) wraps the deactivate-then-create pair in a savepoint and retries with a freshly disambiguated path on `IntegrityError`, treating each losing path as occupied. Up to `MAX_PATH_CREATE_RETRIES + 1` attempts (`opencontractserver/constants/document_processing.py`) run before propagating the conflict. The partial unique index added in migration `0023_documentpath_documentpathgroupobjectpermission_and_more` remains the authoritative correctness guarantee. Test class `TestMoveDocumentIntegrityRecovery` covers single-document transient-failure recovery, disambiguated retry path selection, and exhausted-retry rollback. Bulk operations (`move_documents_to_folder`, `delete_folder`) now use the batch approach from Issue #1199 instead of per-row retry, so their former test classes (`TestBulkMoveIntegrityRecovery`, `TestDeleteFolderIntegrityRecovery`) have been replaced by `TestCoverageGapBulkMoveIntegrityErrorRollback` and `TestCoverageGapDeleteFolderIntegrityErrorRollback` which verify full-batch rollback on `IntegrityError` (`opencontractserver/tests/test_document_folder_service.py`).
286291
- **Bulk move loop recomputed target folder path per document** (Issue #1202): `DocumentFolderService.move_documents_to_folder()` (`opencontractserver/corpuses/folder_service.py`) called `_compute_moved_path()` once per document, and each call invoked `target_folder.get_path()` — which walks ancestors via a recursive CTE query. For an N-document bulk move to the same folder, this issued N redundant CTE queries for an O(1) value. The target folder path is now resolved once before the loop and threaded through `_compute_moved_path()` via a new optional `target_folder_path` parameter.
287292
- **Backend coverage disappearing from Codecov dashboard** (`.codecov.yml`): Added a `backend` flag with `carryforward: true` (paths: `opencontractserver/`, `config/`), mirroring the pattern already in place for `frontend-unit` / `frontend-component`. Without carryforward, any commit whose backend `pytest` job fails or times out (e.g. the merge commit `f166a59`, where the push-to-main `pytest` check failed after PR #1213 merged) causes Codecov to record backend files at 0% for that commit, which then replaces the healthy ~90% backend coverage on the dashboard — making it look as if only frontend coverage (~39%) is being collected. With carryforward, a flaky/failing backend run inherits the parent commit's backend coverage so the dashboard continues to reflect both suites.
288293
- **IngestionSource follow-up hardening** (Issue #1228):

0 commit comments

Comments
 (0)