Skip to content

Commit 36921ac

Browse files
JSv4claude
andauthored
Add return-type annotations to core models and import/export pipeline (#1355)
* typing: annotate core models + import/export tasks (Closes #1334) Adds return-type annotations and TypedDict-based parameter annotations to the core domain models (corpuses, documents, annotations, extracts) and the bulk import/export pipeline, without changing runtime behavior. Coverage (models.py + sibling files in each package): - corpuses/: 61.5% -> 88.4% (target >=80%) - annotations/: 48.1% -> 93.8% (target >=80%) - documents/: 51.9% -> 84.9% (target >=80%) - extracts/: 47.1% -> 94.1% (target >=75%) - tasks/{import,export}{,_v2}.py: 100% function-signature coverage Each touched file now uses `from __future__ import annotations` so forward refs work unquoted, with `TYPE_CHECKING` blocks to dodge circular imports across corpuses / documents / annotations / extracts. Import/export tasks now consume existing TypedDicts from opencontractserver/types/dicts.py in their signatures instead of bare dict -- OpenContractsExportDataJsonV2Type, OpenContractDocExport, OpenContractsAnnotationPythonType, OpenContractsRelationshipPythonType, IngestionSourceExport, DocumentPathExport, StructuralAnnotationSetExport, AgentConfigExport, DescriptionRevisionExport, ConversationExport, ChatMessageExport, MessageVoteExport, OpenContractsAnnotatedDocumentImportType. No new TypedDicts were introduced; all were already defined. Signal handlers (annotations/signals.py, corpuses/signals.py, documents/signals.py, extracts/signals.py) now type sender / instance / created / **kwargs using TYPE_CHECKING imports of sender model classes. Manager methods on CorpusActionExecutionManager tighten visible_to_user's user param to Optional[AbstractBaseUser] and tighten summary_by_status / summary_by_action returns. Graduation from the mypy baseline is deferred: the affected modules still have pre-existing Django-plugin-specific errors (CharField assignment mismatches, base-class variable overrides on embedder_path / creator, lambdas in field defaults) that are not caused by missing annotations. Per the issue's "annotations only, no bugfixes" constraint, those are left for a follow-up PR. * Fix linting and address PR review comments - Drop unused Optional / Union / AbstractBaseUser imports (flake8 F401) after pyupgrade rewrote Optional[X] to X | None - Pass Annotation / Note class instead of None for sender in signal tests so mypy accepts the typed signal signatures - Tighten save() -> None on Corpus, CorpusAction, and Note (Django's Model.save always returns None); drop dangling return super().save() - Tighten Document.get_embedding_reference_kwargs -> dict[str, Any] - Type structural_sets_seen as set[StructuralAnnotationSet] - Use AbstractBaseUser (TYPE_CHECKING) for Corpus.update_description author and get_or_create_personal_corpus user parameters instead of Any - Remove unused AbstractBaseUser TYPE_CHECKING imports from export_tasks and export_tasks_v2 * Address review feedback and unblock mypy CI Review feedback (PR #1355 second Claude review): - PipelineSettings.delete() now annotated -> NoReturn. The method always raises, so the previous tuple[int, dict[str, int]] annotation (copied from Django's Model.delete signature) would let mypy accept call sites that try to unpack the return value. - CorpusFolder.get_descendant_folders() now returns QuerySet[CorpusFolder] with a narrow return-value type: ignore. The CTE-annotated queryset subclass is not easily importable, but callers iterate over CorpusFolder instances -- Any was strictly less useful. - _import_v2_relationships' label_lookup parameter gains an inline comment documenting the (label_text, label_type) tuple-key invariant. Other reviewer items skipped with reasoning: - dict[str | None, OpenContractDocExport | None] in export_tasks.py -- tightening requires tracing upstream None-producers; reviewer marked it non-blocking. - author: AbstractBaseUser | int unions -- reviewer explicitly flagged as follow-up work outside this annotation-only PR. - corpus_id: int | None local annotation -- reviewer called it fine to leave. mypy unblock: - validate_v3_migration.py was fixed in main (commit 41eb6ae) for the [:5]-indexing error that 6.0.3 surfaced, but the values_list + unpack pattern still trips mypy 1.20.1 + django-stubs 6.0.3 with "object is not iterable" and "Cannot determine type" errors. Suppress narrowly with type: ignore on the two affected lines; runtime behaviour is correct and the alternative refactors obscured the loop body. * Address review: use AbstractBaseUser, drop unneeded has-type ignore * Address PR #1355 review: tighten Any types + drop redundant comment - CorpusActionTemplate.to_action_kwargs / clone_to_corpus: replace 'creator: Any | None' with 'creator: AbstractBaseUser | None' for consistency with the rest of the file (the import is already in the TYPE_CHECKING block). - Corpus.get_descendant_folders: drop the two-line block comment about tree-queries' CTE-annotated subclass — the inline 'type: ignore' is already self-documenting and the comment violated CLAUDE.md's no-redundant-comments guideline. --------- Signed-off-by: JSIV <5049984+JSv4@users.noreply.github.com> Co-authored-by: Claude <noreply@anthropic.com>
1 parent 50ed674 commit 36921ac

16 files changed

Lines changed: 429 additions & 224 deletions

File tree

CHANGELOG.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,19 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1919

2020
### Added
2121

22+
- **Return-type annotations across core models and import/export pipeline** (Issue #1334, follow-up to #1331): The mypy gate wired in by #1331 recorded a 7208-error baseline frozen across 357 files. This PR pays down the annotation deficit on the core domain models and the bulk import/export tasks without touching runtime behavior or adding validators. Coverage jumped from the pre-issue numbers to:
23+
- `opencontractserver/corpuses/` 61.5% → 88.4% (target ≥80%)
24+
- `opencontractserver/annotations/` 48.1% → 93.8% (target ≥80%)
25+
- `opencontractserver/documents/` 51.9% → 84.9% (target ≥80%)
26+
- `opencontractserver/extracts/` 47.1% → 94.1% (target ≥75%)
27+
- `tasks/import_tasks.py`, `tasks/import_tasks_v2.py`, `tasks/export_tasks.py`, `tasks/export_tasks_v2.py` all at 100% function-signature coverage.
28+
- **Files touched** (annotations only — zero behavior changes, zero new comments, zero renames): `annotations/{models,signals,admin}.py`, `corpuses/{models,signals,managers}.py`, `documents/{models,signals}.py`, `extracts/{models,signals}.py`, `tasks/{import_tasks,import_tasks_v2,export_tasks,export_tasks_v2}.py`.
29+
- Each file adopts `from __future__ import annotations` so forward references work without string quoting, and uses a `TYPE_CHECKING` block for the handful of cross-app imports (`AbstractBaseUser`, `Corpus`, `Document`, `Annotation`, etc.) that would otherwise be circular at runtime.
30+
- **TypedDict adoption** — the import/export task signatures now consume existing TypedDicts from `opencontractserver/types/dicts.py` directly instead of bare `dict`: `OpenContractsExportDataJsonPythonType` / `OpenContractsExportDataJsonV2Type` for the data.json payloads, `OpenContractDocExport` for per-document payloads, `OpenContractsAnnotationPythonType` / `OpenContractsRelationshipPythonType` for annotation/relationship lists, `IngestionSourceExport` / `DocumentPathExport` / `StructuralAnnotationSetExport` / `CorpusFolderExport` / `AgentConfigExport` / `DescriptionRevisionExport` / `ConversationExport` / `ChatMessageExport` / `MessageVoteExport` for V2-specific shapes, and `OpenContractsAnnotatedDocumentImportType` for single-document imports. No new TypedDicts were introduced — all were already defined and previously unused in callers.
31+
- Signal handlers (`annotations/signals.py`, `corpuses/signals.py`, `documents/signals.py`, `extracts/signals.py`) now have typed `sender` / `instance` / `created` / `**kwargs` parameters with `TYPE_CHECKING` imports of the sender model classes.
32+
- **Manager methods**: `CorpusActionExecutionManager.visible_to_user` now types the optional `user` param as `Optional[AbstractBaseUser]` (previously bare default), and `summary_by_status` / `summary_by_action` tightened from bare `dict` / `QuerySet` to `dict[str, int]` / `QuerySet[Any]`.
33+
- **Model method signatures**: `save()`/`clean()`/`delete()` overrides are now `-> None` (Django's `Model.save` returns None) or `-> tuple[int, dict[str, int]]` (Django's `Model.delete` signature); `__str__` returns `str`; `@property` accessors have explicit scalar returns; classmethod factories return `"ClassName"`; tree/versioning helpers on `CorpusFolder` type their descendant queries as `QuerySet[CorpusFolder]` / `list[CorpusFolder]` where applicable (falling back to `Any` where the CTE queryset class isn't easily importable).
34+
- **Graduation from the mypy baseline is deferred.** The `mypy.ini` `[mypy-…] ignore_errors = True` sections for these modules still contain pre-existing Django-plugin-specific errors (field descriptor `assignment` mismatches, `misc` lambdas in model field declarations, base-class variable override warnings on `embedder_path`/`creator`) that are not caused by missing annotations and therefore cannot be fixed under this issue's "annotations only, no bugfixes" constraint. The annotations landed here make those modules ready for a follow-up PR that either `# type: ignore`s or refactors the remaining Django-model type issues and then removes the baseline entries.
2235
- **Mypy type-checking wired into pre-commit and CI** (Issue #1331): The existing `[mypy]` block in `setup.cfg` and the `mypy==1.20.1` / `django-stubs==6.0.2` / `djangorestframework-stubs==3.16.9` pins in `requirements/local.txt` were never actually enforced, so the investment was drifting (48 pre-existing `# type: ignore` markers, many modules at 0% annotation coverage). This PR turns on the gate without requiring the 7208 existing errors across 357 files to be fixed first — `mypy.ini` lists each of those files under its own `[mypy-<module>] ignore_errors = True` section, so new modules added outside the baseline **are** type-checked and CI / the hook fails on their errors.
2336
- `mypy.ini` (split out of `setup.cfg` because the per-module baseline is ~1000 lines): `python_version` bumped `3.9``3.11` to match the runtime, plugins kept, `django_settings_module` pointed at the new `config/settings/mypy.py` (dummy `DATABASE_URL` default so contributors don't need env vars).
2437
- `.pre-commit-config.yaml`: new `mirrors-mypy@v1.20.1` hook with `pass_filenames: false` and fully pinned stubs + Django runtime in `additional_dependencies` (pre-commit autoupdate only bumps `rev`, so unpinned stubs would drift).

opencontractserver/annotations/admin.py

Lines changed: 39 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,9 @@
1+
from __future__ import annotations
2+
3+
from typing import TYPE_CHECKING, Any
4+
15
from django.contrib import admin
2-
from django.db.models import Count
6+
from django.db.models import Count, QuerySet
37
from guardian.admin import GuardedModelAdmin
48

59
from opencontractserver.annotations.models import (
@@ -11,6 +15,9 @@
1115
Relationship,
1216
)
1317

18+
if TYPE_CHECKING:
19+
from django.http import HttpRequest
20+
1421

1522
class ContentModalityFilter(admin.SimpleListFilter):
1623
"""
@@ -21,7 +28,11 @@ class ContentModalityFilter(admin.SimpleListFilter):
2128
title = "content modality"
2229
parameter_name = "modality"
2330

24-
def lookups(self, request, model_admin):
31+
def lookups(
32+
self,
33+
request: HttpRequest,
34+
model_admin: admin.ModelAdmin[Any],
35+
) -> tuple[tuple[str, str], ...]:
2536
return (
2637
("text", "TEXT only"),
2738
("image", "IMAGE only"),
@@ -31,7 +42,11 @@ def lookups(self, request, model_admin):
3142
("empty", "No modalities"),
3243
)
3344

34-
def queryset(self, request, queryset):
45+
def queryset(
46+
self,
47+
request: HttpRequest,
48+
queryset: QuerySet[Annotation],
49+
) -> QuerySet[Annotation]:
3550
if self.value() == "text":
3651
return queryset.filter(content_modalities=["TEXT"])
3752
if self.value() == "image":
@@ -61,7 +76,7 @@ class AnnotationEmbeddingInline(admin.TabularInline):
6176
readonly_fields = ("id", "embedder_path", "dimension", "created", "modified")
6277
extra = 0
6378

64-
def dimension(self, obj):
79+
def dimension(self, obj: Embedding) -> str:
6580
"""Display which vector dimension is populated."""
6681
if obj.vector_384 is not None:
6782
return "384"
@@ -73,7 +88,7 @@ def dimension(self, obj):
7388
return "3072"
7489
return "Unknown"
7590

76-
dimension.short_description = "Dimension"
91+
dimension.short_description = "Dimension" # type: ignore[attr-defined]
7792

7893

7994
class NoteEmbeddingInline(admin.TabularInline):
@@ -87,7 +102,7 @@ class NoteEmbeddingInline(admin.TabularInline):
87102
readonly_fields = ("id", "embedder_path", "dimension", "created", "modified")
88103
extra = 0
89104

90-
def dimension(self, obj):
105+
def dimension(self, obj: Embedding) -> str:
91106
"""Display which vector dimension is populated."""
92107
if obj.vector_384 is not None:
93108
return "384"
@@ -99,7 +114,7 @@ def dimension(self, obj):
99114
return "3072"
100115
return "Unknown"
101116

102-
dimension.short_description = "Dimension"
117+
dimension.short_description = "Dimension" # type: ignore[attr-defined]
103118

104119

105120
@admin.register(Annotation)
@@ -125,31 +140,31 @@ class AnnotationAdmin(GuardedModelAdmin):
125140
raw_id_fields = ("annotation_label", "document", "corpus", "analysis", "creator")
126141
inlines = [AnnotationEmbeddingInline]
127142

128-
def get_queryset(self, request):
143+
def get_queryset(self, request: HttpRequest) -> QuerySet[Any]:
129144
"""
130145
Override queryset to annotate with embedding count.
131146
"""
132147
qs = super().get_queryset(request)
133148
return qs.annotate(total_embeddings=Count("embedding_set", distinct=True))
134149

135-
def total_embeddings(self, obj):
150+
def total_embeddings(self, obj: Annotation) -> int:
136151
"""
137152
Display the total number of embeddings for this annotation.
138153
"""
139-
return obj.total_embeddings
154+
return obj.total_embeddings # type: ignore[attr-defined]
140155

141-
total_embeddings.admin_order_field = "total_embeddings"
142-
total_embeddings.short_description = "Embeddings"
156+
total_embeddings.admin_order_field = "total_embeddings" # type: ignore[attr-defined]
157+
total_embeddings.short_description = "Embeddings" # type: ignore[attr-defined]
143158

144-
def modality_display(self, obj):
159+
def modality_display(self, obj: Annotation) -> str:
145160
"""
146161
Display content modalities as a formatted string.
147162
"""
148163
if obj.content_modalities:
149164
return ", ".join(obj.content_modalities)
150165
return "-"
151166

152-
modality_display.short_description = "Modality"
167+
modality_display.short_description = "Modality" # type: ignore[attr-defined]
153168

154169

155170
@admin.register(Relationship)
@@ -194,21 +209,21 @@ class NoteAdmin(GuardedModelAdmin):
194209
raw_id_fields = ("parent", "document", "annotation", "creator")
195210
inlines = [NoteEmbeddingInline]
196211

197-
def get_queryset(self, request):
212+
def get_queryset(self, request: HttpRequest) -> QuerySet[Any]:
198213
"""
199214
Override queryset to annotate with embedding count.
200215
"""
201216
qs = super().get_queryset(request)
202217
return qs.annotate(total_embeddings=Count("embedding_set", distinct=True))
203218

204-
def total_embeddings(self, obj):
219+
def total_embeddings(self, obj: Note) -> int:
205220
"""
206221
Display the total number of embeddings for this note.
207222
"""
208-
return obj.total_embeddings
223+
return obj.total_embeddings # type: ignore[attr-defined]
209224

210-
total_embeddings.admin_order_field = "total_embeddings"
211-
total_embeddings.short_description = "Embeddings"
225+
total_embeddings.admin_order_field = "total_embeddings" # type: ignore[attr-defined]
226+
total_embeddings.short_description = "Embeddings" # type: ignore[attr-defined]
212227

213228

214229
@admin.register(Embedding)
@@ -227,7 +242,7 @@ class EmbeddingAdmin(GuardedModelAdmin):
227242
search_fields = ["embedder_path", "id"]
228243
raw_id_fields = ("document", "annotation", "note", "creator")
229244

230-
def reference_type(self, obj):
245+
def reference_type(self, obj: Embedding) -> str:
231246
"""Display what type of object this embedding belongs to."""
232247
if obj.document_id:
233248
return f"Document #{obj.document_id}"
@@ -237,11 +252,11 @@ def reference_type(self, obj):
237252
return f"Note #{obj.note_id}"
238253
return "Unknown"
239254

240-
reference_type.short_description = "Referenced Object"
255+
reference_type.short_description = "Referenced Object" # type: ignore[attr-defined]
241256

242-
def dimension_info(self, obj):
257+
def dimension_info(self, obj: Embedding) -> str:
243258
"""Display which vector dimension is populated."""
244-
dimensions = []
259+
dimensions: list[str] = []
245260
if obj.vector_384 is not None:
246261
dimensions.append("384")
247262
if obj.vector_768 is not None:
@@ -252,4 +267,4 @@ def dimension_info(self, obj):
252267
dimensions.append("3072")
253268
return ", ".join(dimensions) if dimensions else "None"
254269

255-
dimension_info.short_description = "Dimensions"
270+
dimension_info.short_description = "Dimensions" # type: ignore[attr-defined]

opencontractserver/annotations/models.py

Lines changed: 28 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,13 @@
1+
from __future__ import annotations
2+
13
import difflib
24
import functools
35
import hashlib
46
import logging
57
import uuid
68

79
# Typed representations for the `json` payload
8-
from typing import Optional, Union, cast
10+
from typing import TYPE_CHECKING, Any, Union, cast
911

1012
import django
1113
from django.contrib.auth import get_user_model
@@ -50,6 +52,9 @@
5052
)
5153
from .json_types import MultipageAnnotationJson, SpanAnnotationJson
5254

55+
if TYPE_CHECKING:
56+
from django.contrib.auth.models import AbstractBaseUser
57+
5358
User = get_user_model()
5459
logger = logging.getLogger(__name__)
5560

@@ -157,7 +162,7 @@ class Meta:
157162
]
158163

159164
# Override save to update modified on save
160-
def save(self, *args, **kwargs):
165+
def save(self, *args: Any, **kwargs: Any) -> None:
161166
"""On save, update timestamps"""
162167
if not self.pk:
163168
self.created = timezone.now()
@@ -376,7 +381,7 @@ def clean(self) -> None:
376381
)
377382

378383
# Override save to update modified on save
379-
def save(self, *args, **kwargs):
384+
def save(self, *args: Any, **kwargs: Any) -> None:
380385
"""On save, update timestamps and validate constraints"""
381386
# Always validate on save (consistent with Annotation model)
382387
self.clean()
@@ -479,7 +484,7 @@ class Embedding(BaseOCModel):
479484
created = django.db.models.DateTimeField(default=timezone.now, blank=True)
480485
modified = django.db.models.DateTimeField(default=timezone.now, blank=True)
481486

482-
def save(self, *args, **kwargs) -> None:
487+
def save(self, *args: Any, **kwargs: Any) -> None:
483488
"""
484489
Overridden save method:
485490
- ensures modified timestamp is updated
@@ -583,7 +588,7 @@ class Meta:
583588
verbose_name = "Embedding"
584589
verbose_name_plural = "Embeddings"
585590

586-
def __str__(self):
591+
def __str__(self) -> str:
587592
return f"Embedding (ID={self.pk}) [{self.embedder_path or 'Unknown Model'}]"
588593

589594

@@ -660,20 +665,20 @@ class Meta:
660665
django.db.models.Index(fields=["created"]),
661666
]
662667

663-
def __str__(self):
668+
def __str__(self) -> str:
664669
return f"StructuralAnnotationSet({self.content_hash[:12]}...)"
665670

666671
@property
667-
def annotation_count(self):
672+
def annotation_count(self) -> int:
668673
"""Get the count of structural annotations in this set."""
669674
return self.structural_annotations.count()
670675

671676
@property
672-
def relationship_count(self):
677+
def relationship_count(self) -> int:
673678
"""Get the count of structural relationships in this set."""
674679
return self.structural_relationships.count()
675680

676-
def duplicate(self, corpus_id: Optional[int] = None) -> "StructuralAnnotationSet":
681+
def duplicate(self, corpus_id: int | None = None) -> StructuralAnnotationSet:
677682
"""
678683
Create a copy of this set with all its annotations for corpus isolation.
679684
@@ -992,15 +997,15 @@ class Annotation(BaseOCModel, HasEmbeddingMixin):
992997
)
993998
modified = django.db.models.DateTimeField(default=timezone.now, blank=True)
994999

995-
def get_embedding_reference_kwargs(self) -> dict:
1000+
def get_embedding_reference_kwargs(self) -> dict[str, Any]:
9961001
return {"annotation_id": self.pk}
9971002

9981003
# ---------------------------------------------------------------------
9991004
# Convenience helpers
10001005
# ---------------------------------------------------------------------
10011006

10021007
@property
1003-
def typed_json(self) -> "Union[MultipageAnnotationJson, SpanAnnotationJson]":
1008+
def typed_json(self) -> MultipageAnnotationJson | SpanAnnotationJson:
10041009
"""Return `self.json` with a precise static type.
10051010
10061011
This helper exists purely for IDE / static-analysis benefit. It performs
@@ -1128,7 +1133,7 @@ def clean(self) -> None: # noqa: C901 (complexity – kept minimal)
11281133
# Persistence
11291134
# ------------------------------------------------------------------
11301135

1131-
def save(self, *args, **kwargs):
1136+
def save(self, *args: Any, **kwargs: Any) -> None:
11321137
"""Override save to optionally validate `json` integrity and
11331138
auto-compact the annotation JSON to v2 format.
11341139
@@ -1258,7 +1263,7 @@ class AnnotationGroupObjectPermission(GroupObjectPermissionBase):
12581263
# enabled = False
12591264

12601265

1261-
def calculate_labelset_icon_path(instance, filename):
1266+
def calculate_labelset_icon_path(instance: LabelSet, filename: str) -> str:
12621267
return calc_oc_file_path(
12631268
instance, filename, f"user_{instance.creator.id}/labelsets/icons/{uuid.uuid4()}"
12641269
)
@@ -1410,7 +1415,7 @@ class Note(BaseOCModel, HasEmbeddingMixin):
14101415
10 # store full snapshot every N revisions for fast reconstruction
14111416
)
14121417

1413-
def save(self, *args, **kwargs):
1418+
def save(self, *args: Any, **kwargs: Any) -> None:
14141419
"""Override save to automatically create a NoteRevision when `content` changes.
14151420
Set `skip_revision=True` in kwargs to bypass automatic revision creation
14161421
(used internally by `version_up`).
@@ -1438,7 +1443,7 @@ def save(self, *args, **kwargs):
14381443
self.modified = timezone.now()
14391444

14401445
with transaction.atomic():
1441-
super_result = super().save(*args, **kwargs)
1446+
super().save(*args, **kwargs)
14421447

14431448
# Auto-create a NoteRevision unless explicitly skipped
14441449
if not skip_revision and (
@@ -1480,9 +1485,12 @@ def save(self, *args, **kwargs):
14801485
).hexdigest(),
14811486
)
14821487

1483-
return super_result
1484-
1485-
def version_up(self, *, new_content: str, author):
1488+
def version_up(
1489+
self,
1490+
*,
1491+
new_content: str,
1492+
author: AbstractBaseUser | int,
1493+
) -> NoteRevision | None:
14861494
"""Utility to bump the note to a new version.
14871495
14881496
Args:
@@ -1546,7 +1554,7 @@ def version_up(self, *, new_content: str, author):
15461554

15471555
return revision
15481556

1549-
def get_embedding_reference_kwargs(self) -> dict:
1557+
def get_embedding_reference_kwargs(self) -> dict[str, Any]:
15501558
return {"note_id": self.pk}
15511559

15521560
class Meta:
@@ -1630,5 +1638,5 @@ class Meta:
16301638
unique_together = ("note", "version")
16311639
ordering = ("note_id", "version")
16321640

1633-
def __str__(self):
1641+
def __str__(self) -> str:
16341642
return f"NoteRevision(note_id={self.note_id}, v={self.version})"

0 commit comments

Comments
 (0)