diff --git a/CHANGELOG.md b/CHANGELOG.md
index d352ed735..2ed9b0108 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -9,6 +9,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Fixed
 
+- **Extraction grounding follow-up #2** (Issue #1407, follow-up to PR #1397):
+  - **DB-level idempotency for the grounding `get_or_create`** (`opencontractserver/annotations/migrations/0069_grounding_annotation_unique_constraints.py`, `opencontractserver/annotations/models.py`): added two partial `UniqueConstraint`s on `Annotation` that exactly match the application-level `get_or_create` lookup keys in `opencontractserver/utils/extraction_grounding.py` — `annotation_unique_token_label_grounding_key` keyed on `(document, corpus, annotation_label, page, raw_text, creator)` and `annotation_unique_span_label_grounding_key` keyed on `(document, corpus, annotation_label, raw_text, json, creator)`, both scoped via `condition=Q(structural=False, is_grounding_source=True)`. Without these, two concurrent Celery workers retrying the same datacell could both miss on the SELECT and both succeed on the CREATE, defeating PR #1397's `get_or_create`. The new `Annotation.is_grounding_source` boolean (default `False`, set `True` only inside the grounding pipeline) scopes the constraints so legitimate non-grounding flows producing the same key tuple — `add_annotations_from_exact_strings` finding multiple occurrences of the same word on a page, hierarchical annotation trees with duplicate `raw_text` at different parents — are not blocked. The migration adds the field, backfills `is_grounding_source=True` for any pre-existing `OC_EXTRACT_SOURCE` rows, deduplicates the now-flagged duplicates (keeping the lowest-pk row in each group and re-pointing `datacell.sources` M2M FKs) before adding the constraints — same pattern as `0068_enforce_embedder_path_not_null.py`. `creator` is in the key so two distinct users manually creating identical grounding rows are not blocked; only literal duplicates (the realistic Celery-race target) are.
+  - **`creator_id` is now in the grounding `get_or_create` lookup keys** (not just `defaults`) so the new constraints back the application-level lookup — required for `IntegrityError` to fire on the loser of a race and for `get_or_create` to fall back to the `SELECT`.
+  - **Documented PostgreSQL-specific `json` lookup semantics** (`opencontractserver/utils/extraction_grounding.py` `_create_span_annotation` docstring): the `json={"start", "end"}` lookup key relies on `JSONField` mapping to PostgreSQL `jsonb` (structural equality, key-order independent). On SQLite `JSONField` is plain text and the comparison would be lexical. The project's runtime and test database are both PostgreSQL, so this is documented rather than guarded.
+  - **Documented dedup rationale** (`_create_grounding_annotations`): the end-of-function dedup-by-pk loop is needed because distinct extraction values can legitimately resolve to the same source span (e.g. `["Acme Holdings", "Acme Holdings Inc."]` both anchoring on the same token range, or an idempotent re-run returning the existing row twice). The comment now spells out that this is intentional rather than masking a bug in `align_text_to_document`.
+  - **Test cleanup** (`opencontractserver/tests/test_extraction_grounding.py`): replaced the redundant `assertIsNotNone(...) + assert ... is not None` pair with a single plain `assert` (the only one of the two that actually narrows for mypy / pyright), with an inline comment explaining the choice.
 - **Shared-protocol contract drift surfaced by PR #1400 follow-up review** (Issue #1408):
   - **`PermissionedQueryManagerProtocol.visible_to_user` signature mismatch** (`opencontractserver/types/protocols.py:147`): the protocol declared `visible_to_user(self, user: Any = None)` but `PermissionManager.visible_to_user` and `UserFeedbackManager.visible_to_user` have no default. A caller holding a protocol-typed reference could call `.visible_to_user()` with no args and trigger a runtime `TypeError`, and mypy would also reject the concrete classes as structural matches. Dropped the `= None` default so the protocol pins the strictest contract; the docstring now explains that callers must pass an `AnonymousUser` when no authenticated principal is available. Lenient managers (e.g. `BaseVisibilityManager` with `user=None`) still satisfy the protocol — verified with both `isinstance` and `issubclass` checks.
   - **`ToolProtocol` `@property` descriptors against a `@dataclass`** (`opencontractserver/types/protocols.py:97`): `CoreTool` is a `@dataclass` exposing `name` / `description` / `parameters` / `requires_approval` as plain instance attributes. Mypy accepted dataclass fields against the property-shaped protocol, but the asymmetry was a footgun for future implementors who would reach for `@property` based on the protocol surface. Converted to plain attribute declarations matching the concrete class.
diff --git a/opencontractserver/annotations/migrations/0069_grounding_annotation_unique_constraints.py b/opencontractserver/annotations/migrations/0069_grounding_annotation_unique_constraints.py
new file mode 100644
index 000000000..9815bb009
--- /dev/null
+++ b/opencontractserver/annotations/migrations/0069_grounding_annotation_unique_constraints.py
@@ -0,0 +1,349 @@
+"""
+Add database-level unique constraints backing the application-level
+``get_or_create`` calls in ``opencontractserver/utils/extraction_grounding.py``.
+
+Context (issue #1407, follow-up to PR #1397):
+The grounding pipeline already uses ``Annotation.objects.get_or_create()``
+to make Celery retries idempotent.  Without a backing ``UniqueConstraint``,
+two workers racing on the same datacell can both miss on the SELECT and
+both succeed on the CREATE — duplicating the source annotation that the
+``get_or_create`` was meant to prevent.  In practice this is rare for a
+single datacell (Celery typically retries sequentially), but the
+constraints below promote idempotency from a best-effort behaviour to a
+correctness invariant.
+
+The constraints are scoped to ``is_grounding_source=True`` so that
+legitimate non-grounding flows producing the same key tuple — e.g.
+``add_annotations_from_exact_strings`` finding multiple occurrences of
+the same word on the same page, or hierarchical annotation trees with
+duplicate ``raw_text`` at different parents — are not blocked.
+
+Two partial unique constraints (one per annotation_type) match the
+respective ``get_or_create`` lookup keys exactly:
+
+* TOKEN_LABEL (PDF grounding) — keyed on
+  ``(document, corpus, annotation_label, page, raw_text, creator)`` since
+  the bounding-box ``json`` is deterministic for a fixed input and we
+  want retries to reuse the existing row even if PlasmaPDF reformats the
+  bounding-box layout.
+* SPAN_LABEL (text/DOCX grounding) — keyed on
+  ``(document, corpus, annotation_label, raw_text, json, creator)``
+  because the character offsets in ``json={"start", "end"}`` ARE the
+  identity for a span annotation.  ``page`` is omitted (always 1 in this
+  path).
+
+``creator`` is included so two users manually creating identical
+grounding rows are not blocked — only a single user creating the literal
+same row is, which matches the realistic race-condition target (the
+grounding pipeline always uses the datacell owner's ID).
+
+Migration steps (in order):
+1. Add the ``is_grounding_source`` boolean field (default False).
+2. Backfill: set ``is_grounding_source=True`` for any pre-existing
+   annotation tied to the ``OC_EXTRACT_SOURCE`` label and not structural,
+   so the constraints meaningfully cover historical grounding rows.
+3. Dedupe surviving duplicates among the backfilled (now-flagged) rows,
+   re-pointing ``datacell.sources`` M2M FKs to the lowest-pk keeper —
+   same pattern as ``0068_enforce_embedder_path_not_null.py``.
+4. Add the two partial UniqueConstraints.
+"""
+
+import logging
+
+from django.db import migrations, models, transaction
+from django.db.models import Count, Q
+
+from opencontractserver.constants.annotations import OC_EXTRACT_SOURCE_LABEL
+
+logger = logging.getLogger(__name__)
+
+
+def backfill_is_grounding_source(apps, schema_editor):
+    """Flag pre-existing grounding annotations.
+
+    Without this backfill, historical OC_EXTRACT_SOURCE rows would have
+    ``is_grounding_source=False`` and slip past the new constraints.
+    """
+    Annotation = apps.get_model("annotations", "Annotation")
+    AnnotationLabel = apps.get_model("annotations", "AnnotationLabel")
+
+    grounding_label_ids = list(
+        AnnotationLabel.objects.filter(text=OC_EXTRACT_SOURCE_LABEL).values_list(
+            "id", flat=True
+        )
+    )
+    if not grounding_label_ids:
+        return
+
+    updated = Annotation.objects.filter(
+        annotation_label_id__in=grounding_label_ids,
+        structural=False,
+    ).update(is_grounding_source=True)
+
+    if updated:
+        logger.info(
+            "Backfilled is_grounding_source=True on %s pre-existing "
+            "OC_EXTRACT_SOURCE annotation rows.",
+            updated,
+        )
+
+
+def reverse_backfill(apps, schema_editor):
+    """Reverse: clear is_grounding_source on the backfilled rows."""
+    Annotation = apps.get_model("annotations", "Annotation")
+    Annotation.objects.filter(is_grounding_source=True).update(
+        is_grounding_source=False
+    )
+
+
+def _repoint_cross_references(apps, redundant_ids: list[int], keeper_id: int) -> None:
+    """Repoint every realistic cross-reference from ``redundant_ids`` to ``keeper_id``.
+
+    Survey of FK / M2M references to ``Annotation`` outside the grounding
+    pipeline (run ``grep -nE 'ForeignKey|ManyToManyField' ...`` to refresh):
+
+    * ``extracts.Datacell.sources`` (M2M)            — primary, always
+    * ``feedback.UserFeedback.commented_annotation`` (FK, SET_NULL)
+    * ``conversations.ChatMessage.source_annotations`` (M2M)
+    * ``conversations.ChatMessage.created_annotations`` (M2M)
+    * ``users.AssignmentTask.resulting_annotations`` (M2M)
+    * ``annotations.Relationship.source_annotations`` (M2M)
+    * ``annotations.Relationship.target_annotations`` (M2M)
+    * ``annotations.Embedding.annotation`` (FK, CASCADE)
+       — handled implicitly: deleting redundant annotations cascade-deletes
+       their embeddings, the keeper retains its own.
+    * ``annotations.Annotation.parent`` (FK, CASCADE)
+       — grounding annotations are leaf rows; in the unlikely event a child
+       FK exists, CASCADE preserves migration integrity at the cost of
+       data on a duplicate row that should not have had children.
+
+    Every reference except the cascading ones is repointed to the keeper so
+    no cross-domain data is silently dropped when the redundant row is
+    deleted.
+    """
+    Datacell = apps.get_model("extracts", "Datacell")
+    UserFeedback = apps.get_model("feedback", "UserFeedback")
+    ChatMessage = apps.get_model("conversations", "ChatMessage")
+    AssignmentTask = apps.get_model("users", "AssignmentTask")
+    Relationship = apps.get_model("annotations", "Relationship")
+
+    Datacell.sources.through.objects.filter(annotation_id__in=redundant_ids).update(
+        annotation_id=keeper_id
+    )
+    UserFeedback.objects.filter(commented_annotation_id__in=redundant_ids).update(
+        commented_annotation_id=keeper_id
+    )
+    ChatMessage.source_annotations.through.objects.filter(
+        annotation_id__in=redundant_ids
+    ).update(annotation_id=keeper_id)
+    ChatMessage.created_annotations.through.objects.filter(
+        annotation_id__in=redundant_ids
+    ).update(annotation_id=keeper_id)
+    AssignmentTask.resulting_annotations.through.objects.filter(
+        annotation_id__in=redundant_ids
+    ).update(annotation_id=keeper_id)
+    Relationship.source_annotations.through.objects.filter(
+        annotation_id__in=redundant_ids
+    ).update(annotation_id=keeper_id)
+    Relationship.target_annotations.through.objects.filter(
+        annotation_id__in=redundant_ids
+    ).update(annotation_id=keeper_id)
+
+
+def _dedupe_token_label(apps):
+    """Collapse duplicate flagged TOKEN_LABEL grounding annotations.
+
+    Groups by ``(document, corpus, annotation_label, page, raw_text,
+    creator)`` and, for each group of >1, keeps the lowest-pk row and
+    repoints all realistic cross-references (see
+    ``_repoint_cross_references``) to it before deleting the rest.
+    """
+    Annotation = apps.get_model("annotations", "Annotation")
+
+    duplicates = (
+        Annotation.objects.filter(
+            structural=False,
+            annotation_type="TOKEN_LABEL",
+            is_grounding_source=True,
+        )
+        .values(
+            "document_id",
+            "corpus_id",
+            "annotation_label_id",
+            "page",
+            "raw_text",
+            "creator_id",
+        )
+        .annotate(n=Count("id"))
+        .filter(n__gt=1)
+    )
+
+    total_groups = 0
+    total_collapsed = 0
+    for group in duplicates:
+        ids = list(
+            Annotation.objects.filter(
+                structural=False,
+                annotation_type="TOKEN_LABEL",
+                is_grounding_source=True,
+                document_id=group["document_id"],
+                corpus_id=group["corpus_id"],
+                annotation_label_id=group["annotation_label_id"],
+                page=group["page"],
+                raw_text=group["raw_text"],
+                creator_id=group["creator_id"],
+            )
+            .order_by("id")
+            .values_list("id", flat=True)
+        )
+        if len(ids) <= 1:
+            continue
+        keeper, *redundant = ids
+        with transaction.atomic():
+            _repoint_cross_references(apps, redundant, keeper)
+            Annotation.objects.filter(id__in=redundant).delete()
+        total_groups += 1
+        total_collapsed += len(redundant)
+
+    if total_groups:
+        logger.info(
+            "Collapsed %s duplicate grounding TOKEN_LABEL annotation rows "
+            "across %s groups before adding unique constraint.",
+            total_collapsed,
+            total_groups,
+        )
+
+
+def _dedupe_span_label(apps):
+    """Collapse duplicate flagged SPAN_LABEL grounding annotations.
+
+    Groups by ``(document, corpus, annotation_label, raw_text, json,
+    creator)``.  ``json`` equality on PostgreSQL JSONB is structural
+    (key-order independent), which matches the application-level
+    ``get_or_create`` lookup behaviour.
+    """
+    Annotation = apps.get_model("annotations", "Annotation")
+
+    duplicates = (
+        Annotation.objects.filter(
+            structural=False,
+            annotation_type="SPAN_LABEL",
+            is_grounding_source=True,
+        )
+        .values(
+            "document_id",
+            "corpus_id",
+            "annotation_label_id",
+            "raw_text",
+            "json",
+            "creator_id",
+        )
+        .annotate(n=Count("id"))
+        .filter(n__gt=1)
+    )
+
+    total_groups = 0
+    total_collapsed = 0
+    for group in duplicates:
+        ids = list(
+            Annotation.objects.filter(
+                structural=False,
+                annotation_type="SPAN_LABEL",
+                is_grounding_source=True,
+                document_id=group["document_id"],
+                corpus_id=group["corpus_id"],
+                annotation_label_id=group["annotation_label_id"],
+                raw_text=group["raw_text"],
+                json=group["json"],
+                creator_id=group["creator_id"],
+            )
+            .order_by("id")
+            .values_list("id", flat=True)
+        )
+        if len(ids) <= 1:
+            continue
+        keeper, *redundant = ids
+        with transaction.atomic():
+            _repoint_cross_references(apps, redundant, keeper)
+            Annotation.objects.filter(id__in=redundant).delete()
+        total_groups += 1
+        total_collapsed += len(redundant)
+
+    if total_groups:
+        logger.info(
+            "Collapsed %s duplicate grounding SPAN_LABEL annotation rows "
+            "across %s groups before adding unique constraint.",
+            total_collapsed,
+            total_groups,
+        )
+
+
+def dedupe_grounding_targets(apps, schema_editor):
+    _dedupe_token_label(apps)
+    _dedupe_span_label(apps)
+
+
+def reverse_dedupe(apps, schema_editor):
+    """No-op: dropping the constraint cannot resurrect deleted duplicates."""
+
+
+class Migration(migrations.Migration):
+    atomic = False
+
+    dependencies = [
+        ("annotations", "0068_enforce_embedder_path_not_null"),
+        # Dedup helper repoints cross-references on these apps' models. Pin
+        # to the latest migration of each so they exist when this runs.
+        ("conversations", "0018_conversation_memory_curated"),
+        ("extracts", "0028_rename_placeholder_indexes"),
+        ("feedback", "0006_alter_userfeedback_backend_lock"),
+        ("users", "0026_alter_user_username_validator"),
+    ]
+
+    operations = [
+        migrations.AddField(
+            model_name="annotation",
+            name="is_grounding_source",
+            field=models.BooleanField(default=False),
+        ),
+        migrations.RunPython(backfill_is_grounding_source, reverse_backfill),
+        migrations.RunPython(dedupe_grounding_targets, reverse_dedupe),
+        migrations.AddConstraint(
+            model_name="annotation",
+            constraint=models.UniqueConstraint(
+                fields=[
+                    "document",
+                    "corpus",
+                    "annotation_label",
+                    "page",
+                    "raw_text",
+                    "creator",
+                ],
+                condition=Q(
+                    structural=False,
+                    annotation_type="TOKEN_LABEL",
+                    is_grounding_source=True,
+                ),
+                name="annotation_unique_token_label_grounding_key",
+            ),
+        ),
+        migrations.AddConstraint(
+            model_name="annotation",
+            constraint=models.UniqueConstraint(
+                fields=[
+                    "document",
+                    "corpus",
+                    "annotation_label",
+                    "raw_text",
+                    "json",
+                    "creator",
+                ],
+                condition=Q(
+                    structural=False,
+                    annotation_type="SPAN_LABEL",
+                    is_grounding_source=True,
+                ),
+                name="annotation_unique_span_label_grounding_key",
+            ),
+        ),
+    ]
diff --git a/opencontractserver/annotations/models.py b/opencontractserver/annotations/models.py
index 7f3876f81..4f0cdec1d 100644
--- a/opencontractserver/annotations/models.py
+++ b/opencontractserver/annotations/models.py
@@ -963,6 +963,15 @@ class Annotation(BaseOCModel, HasEmbeddingMixin):
     # Mark structural / layout annotations explicitly.
     structural = django.db.models.BooleanField(default=False)
 
+    # True only for annotations created by the extraction-grounding pipeline
+    # (``opencontractserver/utils/extraction_grounding.py``). Backs the
+    # partial UniqueConstraints below — the constraints scope to this flag
+    # so legitimate non-grounding flows that happen to produce the same
+    # ``(document, corpus, label, page, raw_text, creator)`` tuple (e.g.
+    # multi-occurrence exact-string annotation tools, hierarchical
+    # annotation trees) are not blocked.
+    is_grounding_source = django.db.models.BooleanField(default=False)
+
     # Content modalities present in this annotation (TEXT, IMAGE, etc.)
     content_modalities = ArrayField(
         django.db.models.CharField(max_length=20),
@@ -1242,6 +1251,55 @@ class Meta:
                     "Annotations in a structural_set must have structural=True"
                 ),
             ),
+            # Backs the application-level ``get_or_create`` in the
+            # extraction grounding pipeline (see
+            # ``opencontractserver/utils/extraction_grounding.py``). Without
+            # this, two concurrent Celery workers retrying the same datacell
+            # could both miss on the SELECT and both succeed on the CREATE,
+            # producing duplicate source annotations. ``creator`` is in the
+            # key so two distinct users manually creating identical rows
+            # are not blocked. Scoped to ``is_grounding_source=True`` so
+            # legitimate non-grounding flows that happen to produce the
+            # same key tuple (e.g. ``add_annotations_from_exact_strings``
+            # finding multiple occurrences of the same word on a page,
+            # hierarchical annotation trees) are not blocked.
+            django.db.models.UniqueConstraint(
+                fields=[
+                    "document",
+                    "corpus",
+                    "annotation_label",
+                    "page",
+                    "raw_text",
+                    "creator",
+                ],
+                condition=django.db.models.Q(
+                    structural=False,
+                    annotation_type=TOKEN_LABEL,
+                    is_grounding_source=True,
+                ),
+                name="annotation_unique_token_label_grounding_key",
+            ),
+            # Span counterpart: ``json={"start", "end"}`` is the identity
+            # of a span annotation, so it replaces ``page`` in the key.
+            # JSONB equality on PostgreSQL is structural (key-order
+            # independent), which matches the application-level lookup;
+            # SQLite stores JSON as text and would compare lexically.
+            django.db.models.UniqueConstraint(
+                fields=[
+                    "document",
+                    "corpus",
+                    "annotation_label",
+                    "raw_text",
+                    "json",
+                    "creator",
+                ],
+                condition=django.db.models.Q(
+                    structural=False,
+                    annotation_type=SPAN_LABEL,
+                    is_grounding_source=True,
+                ),
+                name="annotation_unique_span_label_grounding_key",
+            ),
         ]
 
 
diff --git a/opencontractserver/tests/test_extraction_grounding.py b/opencontractserver/tests/test_extraction_grounding.py
index 9438bc0be..a2d389917 100644
--- a/opencontractserver/tests/test_extraction_grounding.py
+++ b/opencontractserver/tests/test_extraction_grounding.py
@@ -286,8 +286,7 @@ def test_ground_text_document(self):
             self.assertEqual(annot.document, self.document)
             self.assertEqual(annot.corpus, self.corpus)
             self.assertFalse(annot.structural)
-            self.assertIsNotNone(annot.annotation_label)
-            assert annot.annotation_label is not None  # narrow for mypy
+            assert annot.annotation_label is not None  # narrows for mypy
             self.assertEqual(annot.annotation_label.text, OC_EXTRACT_SOURCE_LABEL)
 
             # Verify span data
@@ -511,8 +510,7 @@ def test_ground_pdf_creates_token_label_annotations(self):
             self.assertEqual(annot.document, self.document)
             self.assertEqual(annot.corpus, self.corpus)
             self.assertFalse(annot.structural)
-            self.assertIsNotNone(annot.annotation_label)
-            assert annot.annotation_label is not None  # narrow for mypy
+            assert annot.annotation_label is not None  # narrows for mypy
             self.assertEqual(annot.annotation_label.text, OC_EXTRACT_SOURCE_LABEL)
             # PlasmaPDF returns 0-indexed pages; valid range is
             # [0, len(pages) - 1]. The bug we're guarding against is the
@@ -690,3 +688,73 @@ def test_ground_pdf_separate_corpora_create_separate_annotations(self):
             self.assertEqual(annot.corpus_id, self.corpus.id)
         for annot in second_corpus_annotations:
             self.assertEqual(annot.corpus_id, other_corpus.id)
+
+    def test_db_constraint_blocks_concurrent_token_label_grounding_duplicates(self):
+        """The DB-level UniqueConstraint must reject a literal duplicate
+        of a grounding TOKEN_LABEL row, and ``get_or_create`` must
+        recover via SELECT — not propagate the IntegrityError.
+
+        Simulates the celery race the constraint exists to defeat: one
+        worker has already inserted the grounding row, a second worker's
+        SELECT misses, and its INSERT must fail-and-fallback rather than
+        creating a sibling row.
+        """
+        from django.db import IntegrityError, transaction
+
+        from opencontractserver.annotations.models import (
+            TOKEN_LABEL,
+            Annotation,
+        )
+        from opencontractserver.constants.annotations import OC_EXTRACT_SOURCE_LABEL
+        from opencontractserver.utils.extraction_grounding import (
+            ground_extraction_to_annotations,
+        )
+
+        # Run grounding once to seed an OC_EXTRACT_SOURCE row with the
+        # right (label, page, raw_text, creator, is_grounding_source=True)
+        # tuple — exactly what the constraint guards.
+        seeded = async_to_sync(ground_extraction_to_annotations)(
+            datacell=self.datacell,
+            document=self.document,
+            corpus=self.corpus,
+            user_id=self.user.id,
+            enable_fuzzy=False,
+        )
+        self.assertGreater(len(seeded), 0)
+        seed = seeded[0]
+        assert seed.annotation_label is not None  # narrows for mypy
+        self.assertEqual(seed.annotation_label.text, OC_EXTRACT_SOURCE_LABEL)
+        self.assertEqual(seed.annotation_type, TOKEN_LABEL)
+        self.assertTrue(seed.is_grounding_source)
+
+        # A direct duplicate INSERT must fail at the database level.
+        with self.assertRaises(IntegrityError):
+            with transaction.atomic():
+                Annotation.objects.create(
+                    document=seed.document,
+                    corpus=seed.corpus,
+                    annotation_label=seed.annotation_label,
+                    annotation_type=TOKEN_LABEL,
+                    page=seed.page,
+                    raw_text=seed.raw_text,
+                    creator_id=self.user.id,
+                    is_grounding_source=True,
+                    structural=False,
+                    json={},
+                )
+
+        # And re-running the grounding pipeline must NOT create a
+        # duplicate — get_or_create's fallback SELECT (after IntegrityError
+        # on a racing INSERT in production, or after the up-front SELECT
+        # hit here) returns the seed row.
+        count_before = Annotation.objects.filter(document=self.document).count()
+        retry = async_to_sync(ground_extraction_to_annotations)(
+            datacell=self.datacell,
+            document=self.document,
+            corpus=self.corpus,
+            user_id=self.user.id,
+            enable_fuzzy=False,
+        )
+        count_after = Annotation.objects.filter(document=self.document).count()
+        self.assertEqual(count_after, count_before)
+        self.assertIn(seed.id, {a.id for a in retry})
diff --git a/opencontractserver/utils/extraction_grounding.py b/opencontractserver/utils/extraction_grounding.py
index b1ff2a553..7b6e3bb98 100644
--- a/opencontractserver/utils/extraction_grounding.py
+++ b/opencontractserver/utils/extraction_grounding.py
@@ -251,12 +251,21 @@ def _create_grounding_annotations(
                 exc_info=True,
             )
 
-    # Deduplicate by primary key, preserving first-seen order: two
-    # alignment results for the same phrase on the same page produce a
-    # single ``get_or_create`` row, but each iteration appends it.
-    # Returning duplicates makes ``len(annotations)`` diverge from
-    # ``datacell.sources.count()`` and breaks idempotency invariants
-    # for downstream consumers.
+    # Deduplicate by primary key, preserving first-seen order.
+    #
+    # Why this is needed (not a bug masked): ``align_text_to_document``
+    # returns at most one ``AlignmentResult`` per query string, but
+    # multiple distinct extraction values can legitimately resolve to the
+    # same source span — e.g. a column extracted as both
+    # ``["Acme Holdings", "Acme Holdings Inc."]`` where each substring
+    # gets its own alignment but both anchor on the same token range, or
+    # an idempotent re-run where ``get_or_create`` returns the existing
+    # row twice for the same phrase.  In both cases the underlying
+    # source annotation is the same row, and returning duplicates would
+    # make ``len(annotations)`` diverge from ``datacell.sources.count()``
+    # (since ``M2M.add()`` is itself idempotent) and break the idempotency
+    # invariants that downstream consumers — including the idempotency
+    # tests in ``test_extraction_grounding.py`` — rely on.
     seen_pks: set[int] = set()
     deduped: list[Annotation] = []
     for annot in annotations:
@@ -325,6 +334,13 @@ def _create_pdf_annotation(
     # must point to an annotation whose ``corpus`` matches the extract's
     # corpus, otherwise ``MIN(document_permission, corpus_permission)``
     # falls back to the wrong corpus's permissions.
+    #
+    # ``creator`` is in the lookup AND in the backing partial UniqueConstraint
+    # ``annotation_unique_token_label_grounding_key`` (see Annotation.Meta).
+    # The constraint promotes idempotency from a best-effort
+    # ``get_or_create`` to a correctness invariant: if two Celery workers
+    # race on the same datacell, the loser's CREATE raises IntegrityError
+    # and ``get_or_create`` falls back to a SELECT.
     annot, _ = Annotation.objects.get_or_create(
         document=document,
         corpus=corpus,
@@ -332,9 +348,10 @@ def _create_pdf_annotation(
         page=page,
         annotation_type=TOKEN_LABEL,
         raw_text=oc_ann["rawText"],
+        creator_id=creator_id,
+        is_grounding_source=True,
         defaults={
             "json": oc_ann["annotation_json"],
-            "creator_id": creator_id,
             "structural": False,
         },
     )
@@ -360,15 +377,27 @@ def _create_span_annotation(
     serves as a placeholder and the actual location is encoded by the
     character offsets in ``json``.
 
-    Identity key uses ``json={"start": ..., "end": ...}``. The ``json``
-    column is a Django ``JSONField``, which maps to PostgreSQL ``jsonb``
-    — equality compares structurally, so key order does not affect
-    ``get_or_create``'s lookup.  We still construct the dict in a stable
-    order to avoid surprises if the column type ever changes.
+    Identity key uses ``json={"start": ..., "end": ...}``.
+
+    .. important::
+       The ``json`` lookup is **PostgreSQL-specific**.  Django maps
+       ``JSONField`` to PostgreSQL ``jsonb`` (structural equality —
+       ``{"start": 1, "end": 2}`` matches ``{"end": 2, "start": 1}``)
+       but to plain ``TEXT`` on SQLite, where comparison is lexical.
+       A future contributor running this pipeline on SQLite would see
+       silent ``get_or_create`` misses on dict-key reordering.  The
+       project's runtime and test database are both PostgreSQL, so
+       this is documented rather than guarded.  We still construct
+       the dict in a stable insertion order to keep behaviour
+       predictable across backends.
 
     ``corpus`` IS in the lookup so a multi-corpus document doesn't share
     a single annotation between unrelated corpora — see the parallel
     docstring on ``_create_pdf_annotation`` for the permission rationale.
+
+    ``creator`` is in the lookup AND in the backing partial
+    UniqueConstraint ``annotation_unique_span_label_grounding_key`` so
+    racing Celery retries collapse to a single row.
     """
     from opencontractserver.annotations.models import SPAN_LABEL, Annotation
 
@@ -379,9 +408,10 @@ def _create_span_annotation(
         annotation_type=SPAN_LABEL,
         raw_text=result.matched_text,
         json={"start": result.char_start, "end": result.char_end},
+        creator_id=creator_id,
+        is_grounding_source=True,
         defaults={
             "page": 1,
-            "creator_id": creator_id,
             "structural": False,
         },
     )