Commit 255f84a
committed
Scope structural annotations to the queried corpus in vector store
CoreAnnotationVectorStore._build_base_queryset() had two interacting issues
on the corpus-wide path (corpus_id set, document_id None):
1. The corpus-only branch used Q(structural=True) with no corpus join, so a
parser-produced structural annotation (document_id = corpus_id = NULL,
set via structural_set FK) from any corpus could match. The structural
class was effectively unscoped.
2. The check_corpus_deletion block ANDed Q(document_id__in=active_doc_ids).
__in cannot match NULL, so the same parser-produced rows were silently
dropped on the production-default path. With the deletion guard on,
structural results were missing entirely; with it off, results were
pulled from every corpus.
Net effect was a corpus-isolation gap for the structural class — observable
as cross-corpus contamination in corpus-wide retrieval and as a search
quality regression in single-tenant deployments with multiple corpora.
Fix walks structural_set → Document.structural_annotation_set (reverse FK)
→ DocumentPath to derive the set of structural_set ids reachable from
documents in the queried corpus that are visible to the requesting user,
then constrains the structural branch to those ids. The deletion-aware
filter additionally accepts structural rows whose set links to one of the
active documents, so the default path no longer drops them.
Visibility is now enforced for the structural branch the same way it is for
non-structural rows: via Document.objects.visible_to_user(user). The upfront
IDOR check on corpus_id is unchanged.
Adds opencontractserver/tests/test_corpus_isolation_vector_store.py
(six regression tests). CoreAnnotationVectorStore.global_search() was
already correct and is not modified.
Two unrelated TransactionTestCase setUp blocks
(test_pydantic_ai_agents.py, test_structural_annotation_portability.py) now
pass processing_started=timezone.now() to short-circuit
process_doc_on_create_atomic, which would otherwise eagerly chain a Celery
PDF-ingest task that fails on a file-less test document and aborts the
test class. These were already broken on main; cleaned up so the new
regression suite has a green baseline alongside.1 parent ff410db commit 255f84a
5 files changed
Lines changed: 486 additions & 8 deletions
File tree
- opencontractserver
- llms/vector_stores
- tests
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
10 | 20 | | |
11 | 21 | | |
12 | 22 | | |
| |||
Lines changed: 55 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
290 | 290 | | |
291 | 291 | | |
292 | 292 | | |
| 293 | + | |
293 | 294 | | |
294 | 295 | | |
295 | 296 | | |
| |||
302 | 303 | | |
303 | 304 | | |
304 | 305 | | |
305 | | - | |
306 | | - | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
307 | 320 | | |
308 | 321 | | |
309 | 322 | | |
| |||
361 | 374 | | |
362 | 375 | | |
363 | 376 | | |
364 | | - | |
365 | | - | |
366 | | - | |
367 | | - | |
368 | | - | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
369 | 417 | | |
370 | 418 | | |
371 | 419 | | |
| |||
0 commit comments