Commit fb62757
* Batch bulk folder operations to restore O(1) query profile
Fix issue #1199: move_documents_to_folder() and delete_folder() regressed
to ~3 DB round-trips per document after PR #1195 added DocumentPath
history nodes. A 100-document bulk move issued ~300 queries instead of
the single .update() the pre-lineage code used.
Both methods now:
1. Pre-fetch occupied target-directory paths in a single query via the
new _fetch_occupied_paths_in_directory() helper and share a mutable
set across all disambiguations via _disambiguate_path's new
occupied_override kwarg (replaces the single-use extra_occupied).
2. Batch-deactivate superseded rows with .filter(pk__in=...).update().
3. Batch-insert successor rows with bulk_create(), then manually
dispatch post_save(created=True) via the new
_dispatch_document_path_created_signals() helper so the
document-text embedding side effect in
documents.signals.process_doc_on_document_path_create still fires.
4. Use select_related("document") + select_for_update(of=("self",)) on
the affected-path query to eliminate an N+1 on current.document and
scope row locks to the DocumentPath table.
Net result: a 100-document bulk move is ~4 DB round-trips instead of
~300, with full Path Tree history preserved.
Tests were updated to match the batched call shape:
- The two monkeypatches of _disambiguate_path now use **kwargs so they
pass through the new occupied_override parameter transparently.
- The two IntegrityError rollback tests for bulk move now patch
bulk_create instead of create.
* Address review: add index-efficiency comment, contract docs, rename test
- Add comment noting regex filters in _fetch_occupied_paths_in_directory
cannot use btree indexes and may benefit from GIN/pg_trgm for large
directory listings.
- Add read-only contract comment on occupied_override parameter in
_disambiguate_path to prevent accidental in-method mutation.
- Document intentional update_fields=None omission in
_dispatch_document_path_created_signals matching Model.save() semantics.
- Rename test_delete_folder_rolls_back_successful_relocations_on_later_failure
to test_delete_folder_rolls_back_on_planning_failure with updated docstring
reflecting that failure now occurs in the path-planning phase before writes.
* Address review: scope row lock, match native signal kwargs, add dispatch test
- Add of=("self",) to move_document_to_folder's select_for_update for
consistency with bulk methods (prevents accidental Document row locks)
- Supply raw=False and using=path._state.db in manual post_save dispatch
to match Django's native Model.save() kwargs
- Add logger.warning for empty-directory fallback in
_fetch_occupied_paths_in_directory to surface malformed paths early
- Clarify _target_directory_string docstring re: strip("/") rationale
- Add test_bulk_move_dispatches_post_save_for_each_created_path to verify
manual signal dispatch fires exactly N times with correct kwargs
* Address review: guard empty paths, raise on empty directory, add delete_folder signal test
- _target_directory_string: raise ValueError if CorpusFolder.get_path()
returns empty string, preventing "//" directory which would match all
paths instead of the intended folder
- _fetch_occupied_paths_in_directory: upgrade empty-directory fallback
from a warning (which silently loads ALL active paths) to a ValueError,
failing fast on malformed input rather than masking bugs
- Add test_delete_folder_dispatches_post_save_for_each_created_path to
verify signal dispatch parity with the existing bulk-move signal test
* Fix 4 test failures: cache get_path() result, update no-slash path tests
- move_documents_to_folder now calls get_path() exactly once and derives
both the target directory string and the compute-moved-path argument
from the cached value via new _target_directory_string_from_path helper
- TestCoverageGap_DisambiguateNoSlashPath tests updated to expect
ValueError since _fetch_occupied_paths_in_directory now intentionally
rejects empty directory strings (commit 62cfc7d)
* Address review: add update_fields to signal dispatch, new test coverage
- Add explicit update_fields=None to post_save.send() in
_dispatch_document_path_created_signals to match Django Model.save()
dispatch signature and prevent TypeError in future handlers
- Add update_fields assertion to both signal dispatch tests
- Add TestCoverageGap_DeleteFolderIntegrityErrorRollback: verifies
IntegrityError during delete_folder's bulk_create rolls back the
deactivation update and folder deletion
- Add TestCoverageGap_TargetDirectoryStringEmptyPath: exercises the
ValueError guard when get_path() returns empty string
* Address review: consolidate _target_directory_string methods, add leading-slash guard
- Consolidate _target_directory_string to delegate to _target_directory_string_from_path,
eliminating duplicated path-normalization logic (DRY)
- Add early leading-slash guard in _disambiguate_path so callers that pass slashless
paths get a clear error at the point of call rather than a confusing ValueError
from _fetch_occupied_paths_in_directory
- Remove dead else branch (directory = "") in _disambiguate_path now that the guard
makes it unreachable
* Address review feedback: remove dead code, improve type safety, simplify tests
- Remove unused _target_directory_string wrapper (dead code per CLAUDE.md rules);
update its test class to exercise _target_directory_string_from_path directly
with additional edge-case coverage (empty, slash-only, normal path).
- Fix stale comment referencing removed _target_directory_string method.
- Tighten planned_paths type annotation from list[tuple] to
list[tuple[DocumentPath, str]] for static analysis and readability.
- Simplify fragile call_args inspection in signal dispatch tests: replace
multi-branch kwargs extraction with direct tuple unpacking.
* Fix 3 failing tests and stale docstring in bulk folder operations
Remove TestBulkMoveIntegrityRecovery and TestDeleteFolderIntegrityRecovery
test classes that mocked DocumentPath.objects.create but the implementation
uses bulk_create. These tests tested per-row retry logic from the old
sequential approach that no longer exists in the batch code path. The
equivalent rollback behavior is already covered by
TestCoverageGapBulkMoveIntegrityErrorRollback and
TestCoverageGap_DeleteFolderIntegrityErrorRollback which correctly mock
bulk_create.
Also update the move_documents_to_folder docstring to accurately describe
the two-phase plan-then-execute approach and all-or-nothing TOCTOU
semantics, replacing stale references to the old sequential approach
and _create_successor_path_with_retry.
Add corpus_id to delete_folder error log for consistency with
move_documents_to_folder error logging.
* Fix codecov config: rename file to files param, ignore test/config paths
* Fix test class naming inconsistency and update stale changelog references
- Rename TestCoverageGap_DeleteFolderIntegrityErrorRollback and
TestCoverageGap_TargetDirectoryStringFromPathEdgeCases to drop
underscore separators, matching the existing TestCoverageGapXxx
convention used by other test classes in this PR.
- Update CHANGELOG.md Issue #1200 entry to reflect that
TestBulkMoveIntegrityRecovery and TestDeleteFolderIntegrityRecovery
were replaced by TestCoverageGapBulkMoveIntegrityErrorRollback and
TestCoverageGapDeleteFolderIntegrityErrorRollback (bulk operations
now use batch approach instead of per-row retry).
* Address review feedback on bulk folder ops
- Document that _dispatch_document_path_created_signals only replays
post_save (pre_save remains skipped like bulk_create itself); add a
maintainer note to extend it if pre_save receivers are added.
- Normalise '/' to the root directory string in
_target_directory_string_from_path so callers that resolve a
root-equivalent folder via CorpusFolder.get_path() don't get a
ValueError. Empty strings still raise — they indicate upstream bugs.
Updated the corresponding TestCoverageGapTargetDirectoryStringFromPathEdgeCases
test (slash no longer raises, now normalises to '/').
- Clarify in _disambiguate_path docstring that extra_occupied is now
only consumed by the single-document retry loop in
_create_successor_path_with_retry; bulk callers use occupied_override.
- Hint in the bulk_move / delete_folder rolled-back error strings that
the whole batch / deletion is safe to retry.
- Rename TestCoverageGap_BulkMoveGetPathCallCount to match the
TestCoverageGap* naming convention used by its siblings.
* Address PR #1237 review: ordering invariants, pre_save guard, perf note
Addresses the most recent Claude bot review on PR #1237:
1. Document the fetch-before-deactivate ORDERING INVARIANT at both bulk
call sites (``delete_folder`` and ``move_documents_to_folder``). The
shared ``occupied_paths`` set must be populated while the to-be-
superseded rows are still ``is_current=True`` — ``_disambiguate_path``
silently ignores ``exclude_pk`` when ``occupied_override`` is provided,
so reordering would cause the batch to re-claim its own source paths.
Call-site comments prevent accidental reordering during future edits.
2. Add a regression guard test
(``test_document_path_has_no_pre_save_receivers``) that fails loudly
if a ``pre_save`` receiver is ever connected to ``DocumentPath``.
``_dispatch_document_path_created_signals`` only replays ``post_save``,
matching ``bulk_create``'s contract and the current (empty) pre_save
receiver set. The guard makes any future wiring of pre_save
handlers a hard-fail so the dispatch helper is extended intentionally.
3. Expand the existing regex-performance note on
``_fetch_occupied_paths_in_directory`` with a ``TODO(perf, #1199
follow-up)`` marker so the follow-up work has a traceable tag if
profiling later shows the regex scan as a bottleneck.
The TOCTOU-retry concern raised in the review is out of scope for this
PR — it requires caller-side retry logic in the GraphQL mutation layer
(``MoveDocumentsToFolderMutation``, ``DeleteCorpusFolderMutation``) and
is tracked separately. The existing "safe to retry" wording in error
messages remains accurate guidance for the caller.
* Unblock mypy CI: suppress django-stubs 6.0.3 ValuesQuerySet false positives
The values_list + unpack pattern introduced in main commit 41eb6ae
still trips mypy 1.20.1 + django-stubs 6.0.3 with 'object is not
iterable' and 'Cannot determine type' errors. Main's Backend CI is
red for the same reason. Suppress narrowly on the two affected lines
with type: ignore; runtime behaviour is correct.
* Address review: drop dead in-memory sync loops, revert unrelated has-type ignore
---------
Signed-off-by: JSIV <5049984+JSv4@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>
1 parent 2d7033f commit fb62757
3 files changed
Lines changed: 613 additions & 365 deletions
File tree
- opencontractserver
- corpuses
- tests
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
282 | 282 | | |
283 | 283 | | |
284 | 284 | | |
285 | | - | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
286 | 291 | | |
287 | 292 | | |
288 | 293 | | |
| |||
0 commit comments