Skip to content

Commit eeab1bc

Browse files
committed
Drop dead per-annotation guardian writes; add is_new fast path
Two production-relevant ingestion speedups, both behaviour-preserving. (A) import_annotations no longer writes per-annotation guardian rows. The annotation visibility/permission model is derived from doc + corpus (+ structural flag, creator, analysis/extract privacy) — none of: * AnnotationQuerySet.visible_to_user (shared/QuerySets.py) * AnnotationQueryOptimizer._compute_effective_permissions * user_has_permission_for_obj for annotations consult AnnotationUserObjectPermission rows. Writing them in the bulk_create loop was ~14 DB ops per annotation of dead work (1 user lookup + 7 remove_perm + 7 assign_perm pieces). For ingests that produce thousands of paragraph annotations this dominated non-embedding wall-clock. Locked in by ``test_no_per_annotation_guardian_rows_are_required`` in test_import_utils.py: deletes any guardian rows after import and re-asserts visibility outcomes are unchanged for creator, collaborator (doc+corpus read), outsider, and superuser. (B) set_permissions_for_obj_to_user gains an is_new=True fast path. Freshly-created objects have no prior guardian rows to clear, so the upfront remove_perm sweep (7 DB ops) is dead work. is_new=True skips it; default of False preserves full-replace semantics for sharing flows / permission downgrades (CRUD → READ-only must clear UPDATE/DELETE). import_annotations passes is_new=True for freshly-created labels, relationships, doc-level annotations, and the doc itself. Other call sites (sharing flows in conversation/badge/extract mutations) left at the default — they may operate on existing perms and need the replace semantics. remove_perm import hoisted to module scope so tests can patch it to assert the skip-sweep behaviour. Local timings (LegalBench-RAG cuad, 29 docs, ~2200 annotations, microservice + MiniLM, all earlier perf-tuning landed): Pre Option A+B (microservice hardened) 9m 24s ingest 7m 59s Post Option A+B 4m 37s ingest 3m 6s probe_char_recall identical (0.5593) — quality preserved. That's a 61% ingest reduction with zero behaviour change. The annotation table sizes match between runs; only the dead guardian table growth is gone. Tests: test_import_utils.TestImportAnnotationsPermissionInvariants — 6 tests covering creator / collaborator / outsider / anonymous / structural-on-public-doc / explicit no-perm-rows-required. test_visibility_managers.StructuralAnnotationOptimizerTests — 6 tests pinning the optimizer's corpus_id=None branch (structural annotations derive from doc only). test_permissioning.SetPermissionsIsNewTests — 4 tests covering skip-sweep / grant correctness / default-path-still-clears / int-user-id resolution. Cross-cutting regression check vs baseline (test_resolvers, test_corpus_annotations_query, test_annotation_permission_mutations, test_security_hardening): zero new failures introduced — the two pre-existing failures are filed separately as github.com//issues/1394.
1 parent d8e8aa7 commit eeab1bc

5 files changed

Lines changed: 543 additions & 28 deletions

File tree

opencontractserver/tests/permissioning/test_permissioning.py

Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -979,3 +979,118 @@ def test_annotation_permission_types(self):
979979
include_group_permissions=True,
980980
)
981981
)
982+
983+
984+
class SetPermissionsIsNewTests(TestCase):
985+
"""Tests for the ``is_new=True`` fast path on ``set_permissions_for_obj_to_user``.
986+
987+
Freshly-created objects have no prior guardian permissions to clear,
988+
so the upfront ``remove_perm`` sweep (7 DB ops per call) is dead
989+
work. ``is_new=True`` lets ingest-style call sites skip it. The
990+
default behaviour (full replace) must remain unchanged for
991+
non-creation paths like sharing flows or permission updates.
992+
"""
993+
994+
@classmethod
995+
def setUpTestData(cls):
996+
cls.user = User.objects.create_user(
997+
username="is_new_test_user", password="pw"
998+
)
999+
1000+
def test_is_new_skips_remove_perm_calls(self):
1001+
"""When called with ``is_new=True`` on a fresh object, the function
1002+
must NOT issue any ``remove_perm`` calls.
1003+
"""
1004+
from unittest.mock import patch
1005+
1006+
new_corpus = Corpus.objects.create(title="brand new", creator=self.user)
1007+
1008+
with patch(
1009+
"opencontractserver.utils.permissioning.remove_perm"
1010+
) as mock_remove:
1011+
set_permissions_for_obj_to_user(
1012+
self.user, new_corpus, [PermissionTypes.ALL], is_new=True
1013+
)
1014+
1015+
self.assertEqual(
1016+
mock_remove.call_count,
1017+
0,
1018+
"is_new=True must skip the remove_perm sweep entirely",
1019+
)
1020+
1021+
def test_is_new_still_grants_requested_perms(self):
1022+
"""``is_new=True`` must grant the requested perms identically to
1023+
the default path — only the upfront removal is skipped.
1024+
"""
1025+
new_corpus = Corpus.objects.create(title="brand new 2", creator=self.user)
1026+
set_permissions_for_obj_to_user(
1027+
self.user, new_corpus, [PermissionTypes.ALL], is_new=True
1028+
)
1029+
1030+
for perm in [
1031+
PermissionTypes.READ,
1032+
PermissionTypes.UPDATE,
1033+
PermissionTypes.DELETE,
1034+
PermissionTypes.PERMISSION,
1035+
]:
1036+
self.assertTrue(
1037+
user_has_permission_for_obj(self.user, new_corpus, perm),
1038+
f"User must have {perm} after is_new grant",
1039+
)
1040+
1041+
def test_default_path_still_removes_existing_perms(self):
1042+
"""Without ``is_new``, the function must continue to do the full
1043+
replace — clearing prior perms before assigning new ones.
1044+
1045+
Load-bearing semantic for sharing flows where a user is being
1046+
downgraded from CRUD to READ-only: old UPDATE/DELETE perms
1047+
must be removed.
1048+
"""
1049+
existing_corpus = Corpus.objects.create(
1050+
title="will be downgraded", creator=self.user
1051+
)
1052+
# Grant ALL first.
1053+
set_permissions_for_obj_to_user(
1054+
self.user, existing_corpus, [PermissionTypes.ALL]
1055+
)
1056+
self.assertTrue(
1057+
user_has_permission_for_obj(
1058+
self.user, existing_corpus, PermissionTypes.UPDATE
1059+
)
1060+
)
1061+
1062+
# Downgrade to READ-only via the default (replace) path.
1063+
set_permissions_for_obj_to_user(
1064+
self.user, existing_corpus, [PermissionTypes.READ]
1065+
)
1066+
1067+
self.assertTrue(
1068+
user_has_permission_for_obj(
1069+
self.user, existing_corpus, PermissionTypes.READ
1070+
)
1071+
)
1072+
self.assertFalse(
1073+
user_has_permission_for_obj(
1074+
self.user, existing_corpus, PermissionTypes.UPDATE
1075+
),
1076+
"Default path must clear UPDATE when downgrading to READ-only",
1077+
)
1078+
self.assertFalse(
1079+
user_has_permission_for_obj(
1080+
self.user, existing_corpus, PermissionTypes.DELETE
1081+
)
1082+
)
1083+
1084+
def test_is_new_resolves_int_user_id(self):
1085+
"""``is_new`` must still accept an int user_id (for symmetry with
1086+
the default call signature used by ``import_annotations``).
1087+
"""
1088+
new_corpus = Corpus.objects.create(title="int id test", creator=self.user)
1089+
set_permissions_for_obj_to_user(
1090+
self.user.id, new_corpus, [PermissionTypes.ALL], is_new=True
1091+
)
1092+
self.assertTrue(
1093+
user_has_permission_for_obj(
1094+
self.user, new_corpus, PermissionTypes.READ
1095+
)
1096+
)

opencontractserver/tests/test_import_utils.py

Lines changed: 256 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -209,3 +209,259 @@ def test_import_relationships(self):
209209

210210
self.assertIn(annotation_id_map["old-a2"], ann_ids_rel2_source)
211211
self.assertIn(annotation_id_map["old-a3"], ann_ids_rel2_targets)
212+
213+
214+
class TestImportAnnotationsPermissionInvariants(TestCase):
215+
"""Lock in the invariants that allow ``import_annotations`` to skip
216+
per-annotation guardian writes.
217+
218+
``AnnotationUserObjectPermission`` rows are NOT consulted by:
219+
* ``AnnotationQuerySet.visible_to_user`` (uses doc/corpus + structural + creator)
220+
* ``AnnotationQueryOptimizer._compute_effective_permissions`` (doc/corpus only)
221+
* ``user_has_permission_for_obj`` for annotations (delegates to optimizer)
222+
223+
These tests exist so any future regression that re-introduces a
224+
consumer of those rows gets caught immediately — without these
225+
tests, ``import_annotations`` could silently start producing wrong
226+
visibility for non-creator readers.
227+
"""
228+
229+
@classmethod
230+
def setUpTestData(cls):
231+
from django.contrib.auth.models import AnonymousUser
232+
233+
from opencontractserver.corpuses.models import Corpus
234+
from opencontractserver.documents.models import Document
235+
236+
User = get_user_model()
237+
cls.creator = User.objects.create_user(
238+
username="creator", password="pw"
239+
)
240+
cls.outsider = User.objects.create_user(
241+
username="outsider", password="pw"
242+
)
243+
cls.collaborator = User.objects.create_user(
244+
username="collaborator", password="pw"
245+
)
246+
cls.anonymous = AnonymousUser()
247+
cls.superuser = User.objects.create_superuser(
248+
username="root", password="pw", email="r@example.com"
249+
)
250+
251+
# Public-doc-in-public-corpus: anyone can see; creator owns.
252+
cls.public_corpus = Corpus.objects.create(
253+
title="Public", creator=cls.creator, is_public=True
254+
)
255+
cls.public_doc = Document.objects.create(
256+
title="Public Doc", creator=cls.creator, is_public=True
257+
)
258+
cls.public_doc, _, _ = cls.public_corpus.add_document(
259+
document=cls.public_doc, user=cls.creator
260+
)
261+
262+
# Private-doc-in-private-corpus: only creator + collaborator (via guardian).
263+
cls.private_corpus = Corpus.objects.create(
264+
title="Private", creator=cls.creator, is_public=False
265+
)
266+
cls.private_doc = Document.objects.create(
267+
title="Private Doc", creator=cls.creator, is_public=False
268+
)
269+
cls.private_doc, _, _ = cls.private_corpus.add_document(
270+
document=cls.private_doc, user=cls.creator
271+
)
272+
273+
# Grant creator full perms on the docs/corpuses they "own". Real
274+
# production flows (corpus.import_content, folder_service) call
275+
# set_permissions_for_obj_to_user after creation; bare
276+
# ``Document.objects.create(creator=...)`` only sets the FK.
277+
# The optimizer reads doc+corpus guardian rows, not the creator
278+
# FK, so we replicate the production grant here.
279+
from opencontractserver.types.enums import PermissionTypes
280+
from opencontractserver.utils.permissioning import (
281+
set_permissions_for_obj_to_user,
282+
)
283+
284+
for obj in (cls.public_doc, cls.private_doc):
285+
set_permissions_for_obj_to_user(
286+
cls.creator, obj, [PermissionTypes.ALL]
287+
)
288+
for obj in (cls.public_corpus, cls.private_corpus):
289+
set_permissions_for_obj_to_user(
290+
cls.creator, obj, [PermissionTypes.ALL]
291+
)
292+
293+
# Grant collaborator doc+corpus read so they can see annotations
294+
# via the doc/corpus path (this is the legitimate sharing flow,
295+
# NOT per-annotation perms).
296+
from guardian.shortcuts import assign_perm
297+
298+
assign_perm("documents.read_document", cls.collaborator, cls.private_doc)
299+
assign_perm("corpuses.read_corpus", cls.collaborator, cls.private_corpus)
300+
301+
cls.label = AnnotationLabel.objects.create(
302+
text="ParaLabel", creator=cls.creator, label_type="SPAN_LABEL"
303+
)
304+
305+
def _ingest(self, doc, corpus, *, structural):
306+
"""Run import_annotations on a small fixture and return the resulting Annotations."""
307+
annotation_data: list[OpenContractsAnnotationPythonType] = [
308+
{
309+
"id": f"a-{i}",
310+
"annotationLabel": "ParaLabel",
311+
"rawText": f"text {i}",
312+
"page": 1,
313+
"annotation_json": {"bounds": [i, 0, 10, 10]},
314+
"parent_id": None,
315+
"annotation_type": None,
316+
"structural": structural,
317+
}
318+
for i in range(3)
319+
]
320+
old_to_new = import_annotations(
321+
user_id=self.creator.id,
322+
doc_obj=doc,
323+
corpus_obj=corpus,
324+
annotations_data=annotation_data,
325+
label_lookup={"ParaLabel": self.label},
326+
)
327+
return Annotation.objects.filter(pk__in=old_to_new.values())
328+
329+
def test_creator_sees_own_imported_annotations(self):
330+
"""The creator can always see annotations they imported."""
331+
annots = self._ingest(self.public_doc, self.public_corpus, structural=False)
332+
creator_visible = Annotation.objects.visible_to_user(self.creator)
333+
for a in annots:
334+
self.assertIn(a, creator_visible)
335+
336+
def test_collaborator_sees_via_doc_and_corpus_perms_not_annotation_perms(self):
337+
"""Non-creator visibility flows from doc+corpus permissions only.
338+
339+
The collaborator was granted ``read_document`` and ``read_corpus``
340+
but explicitly NOT any annotation-level guardian rows. They must
341+
still see the annotations (because the optimizer derives perms
342+
from doc+corpus, not from ``AnnotationUserObjectPermission``).
343+
"""
344+
annots = self._ingest(
345+
self.private_doc, self.private_corpus, structural=False
346+
)
347+
visible = Annotation.objects.visible_to_user(self.collaborator)
348+
for a in annots:
349+
self.assertIn(
350+
a,
351+
visible,
352+
"Collaborator should see annotations on a doc/corpus they "
353+
"have read perm on, regardless of per-annotation guardian rows.",
354+
)
355+
356+
def test_outsider_blocked_by_doc_and_corpus_perms(self):
357+
"""Outsider with no doc/corpus perms cannot see private annotations."""
358+
annots = self._ingest(
359+
self.private_doc, self.private_corpus, structural=False
360+
)
361+
visible = Annotation.objects.visible_to_user(self.outsider)
362+
for a in annots:
363+
self.assertNotIn(
364+
a,
365+
visible,
366+
"Outsider with no doc/corpus perms must not see private annotations.",
367+
)
368+
369+
def test_anonymous_sees_only_structural_on_public_doc(self):
370+
"""Anonymous users see structural annotations on public docs only."""
371+
# Public doc, structural annotations: visible to anonymous.
372+
struct_annots = self._ingest(
373+
self.public_doc, self.public_corpus, structural=True
374+
)
375+
visible = Annotation.objects.visible_to_user(self.anonymous)
376+
for a in struct_annots:
377+
self.assertIn(a, visible)
378+
379+
# Private doc, structural: NOT visible to anonymous.
380+
struct_private = self._ingest(
381+
self.private_doc, self.private_corpus, structural=True
382+
)
383+
visible_private = Annotation.objects.visible_to_user(self.anonymous)
384+
for a in struct_private:
385+
self.assertNotIn(a, visible_private)
386+
387+
def test_user_has_permission_for_obj_uses_doc_corpus_for_annotations(self):
388+
"""``user_has_permission_for_obj`` for annotations consults the
389+
optimizer (doc+corpus) — not ``AnnotationUserObjectPermission``.
390+
"""
391+
from opencontractserver.types.enums import PermissionTypes
392+
from opencontractserver.utils.permissioning import user_has_permission_for_obj
393+
394+
annots = self._ingest(
395+
self.private_doc, self.private_corpus, structural=False
396+
)
397+
for a in annots:
398+
# Creator: doc creator → has all perms via doc.
399+
self.assertTrue(
400+
user_has_permission_for_obj(self.creator, a, PermissionTypes.READ)
401+
)
402+
# Collaborator: granted doc+corpus read → can read annotation.
403+
self.assertTrue(
404+
user_has_permission_for_obj(
405+
self.collaborator, a, PermissionTypes.READ
406+
)
407+
)
408+
# Outsider: no doc/corpus perms → cannot read.
409+
self.assertFalse(
410+
user_has_permission_for_obj(self.outsider, a, PermissionTypes.READ)
411+
)
412+
# Superuser: always.
413+
self.assertTrue(
414+
user_has_permission_for_obj(self.superuser, a, PermissionTypes.READ)
415+
)
416+
417+
def test_no_per_annotation_guardian_rows_are_required(self):
418+
"""The annotation-level guardian table can be empty without
419+
affecting visibility outcomes.
420+
421+
After this test passes, ``import_annotations`` is free to skip
422+
the per-annotation ``set_permissions_for_obj_to_user`` call —
423+
the annotations remain visible to readers who have doc+corpus
424+
access. We assert this by clearing any guardian rows that the
425+
ingest loop wrote and re-running visibility checks.
426+
"""
427+
from opencontractserver.annotations.models import (
428+
AnnotationUserObjectPermission,
429+
)
430+
from opencontractserver.types.enums import PermissionTypes
431+
from opencontractserver.utils.permissioning import user_has_permission_for_obj
432+
433+
annots = list(
434+
self._ingest(self.private_doc, self.private_corpus, structural=False)
435+
)
436+
# Clear any per-annotation guardian rows that the import wrote.
437+
AnnotationUserObjectPermission.objects.filter(
438+
content_object__in=annots
439+
).delete()
440+
441+
# Visibility and perm checks must still resolve correctly via doc+corpus.
442+
for a in annots:
443+
self.assertIn(
444+
a,
445+
Annotation.objects.visible_to_user(self.creator),
446+
"Creator must still see their annotation without per-row perms.",
447+
)
448+
self.assertIn(
449+
a,
450+
Annotation.objects.visible_to_user(self.collaborator),
451+
"Collaborator must still see annotation via doc/corpus perms.",
452+
)
453+
self.assertNotIn(
454+
a,
455+
Annotation.objects.visible_to_user(self.outsider),
456+
"Outsider must still be blocked via doc/corpus perms.",
457+
)
458+
self.assertTrue(
459+
user_has_permission_for_obj(
460+
self.collaborator, a, PermissionTypes.READ
461+
)
462+
)
463+
self.assertFalse(
464+
user_has_permission_for_obj(
465+
self.outsider, a, PermissionTypes.READ
466+
)
467+
)

0 commit comments

Comments
 (0)