Conversation
Hardens the extraction-related plumbing introduced by #1381 / #1380 / #1399 against three medium-impact failure modes and four documentation / observability gaps surfaced during code review. Medium: * embeddings_task: transient HTTP errors now drop queued sub-batches and shutdown the executor with cancel_futures=True before re-raising, so Celery autoretry isn't blocked waiting on in-flight peer round trips. * benchmarks/loader: force_celery_eager refuses to mutate the global Celery config unless MODE=TEST or OC_BENCHMARK_CLI=1; refuses when task_always_eager is already True. run_benchmark management command sets OC_BENCHMARK_CLI=1 automatically. * config/settings/test: DEFAULT_EMBEDDER env override is now gated behind an explicit BENCHMARK_MODE=1 to prevent stray CI env from silently pushing the test suite onto a real embedder. Minor: * CorpusAgent prompt now says "most legal corpora" instead of "most legal documents" — corpus-scoped wording. * doc_extract_query_task: log when model_override is accepted in the unrestricted (operator-only) path so operators can grep production logs. * get_default_reranker_instance: docstring warns that bulk_update / data-migration writes to PipelineSettings.default_reranker bypass the auto_now `modified` field used as part of the cache key. Tests: * PydanticAiSchemaCanaryTests pin ModelResponse.kind == "response", ToolCallPart.part_kind == "tool-call", TextPart.part_kind == "text" so a future pydantic-ai version bump surfaces immediately. * test_transient_error_does_not_block_on_in_flight_peers asserts the executor fast-fail path completes well inside the 10s peer block. * ForceCeleryEagerSafetyGuardsTestCase pins the three new safety refusals. Closes #1410
| # ValueError indicates a caller contract violation (e.g., batch size | ||
| # exceeds embedder maximum). Re-raise rather than silently recording | ||
| # as an annotation failure so the programming error surfaces loudly. | ||
| transient_exc = None # programming error path |
Code Review — PR #1417: Extract pipeline cleanup follow-upsOverall this is a well-scoped and well-motivated cleanup PR. Each fix is clearly tied to the issue it resolves, tests are purposeful, and the CHANGELOG is properly updated. A few items worth discussing below. ThreadPoolExecutor fast-fail path (
|
| Area | Status |
|---|---|
| Core fix (executor fast-fail) | ✅ Correct and well-tested |
force_celery_eager guards |
✅ Correct; minor edge cases documented above |
BENCHMARK_MODE guard |
✅ Clean |
| Logging addition | ✅ Good; consider WARN level |
| Canary tests | ✅ Good design |
| Timing test | ✅ Should pass; two minor nits |
| CHANGELOG | ✅ Properly updated |
The redundant transient_exc = None and the unused call_started variable are the only items I'd ask to clean up before merge. The force_celery_eager already-eager / pre-existing test settings edge case is worth at least documenting in a comment even if not acted on now.
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
Summary
Resolves the seven items raised in #1410 — three medium-impact failure modes and four documentation / observability gaps surfaced during code review of the extract-pipeline work in #1381 / #1380 / #1399.
Closes #1410
Changes
Medium
opencontractserver/tasks/embeddings_task.py:_batch_embed_text_annotations— Transient HTTP errors (requests.Timeout,requests.ConnectionError,EmbeddingServerError) now drop queued sub-batches and shut down theThreadPoolExecutorwithwait=False, cancel_futures=Truebefore re-raising. Previously the exception propagated out of thewithblock, which callsshutdown(wait=True, cancel_futures=False)by default and blocked Celery autoretry by up to ~max_workers× the per-sub-batch round-trip latency. In-flight HTTP calls cannot be torn down from Python, but at least we no longer wait on them.opencontractserver/benchmarks/loader.py:force_celery_eager— Refuses to mutate the global Celery config unlesssettings.MODE == "TEST"or the newOC_BENCHMARK_CLIenv var is set; also refuses whentask_always_eageris alreadyTrue(concurrent benchmark runs would race on the global save / restore). Therun_benchmarkmanagement command now setsOC_BENCHMARK_CLI=1automatically. Prevents the benchmark helper from silently routing every task in a live web / worker process through the in-process executor.opencontractserver/tasks/data_extract_tasks.py:_classify_none_result— Verified the implementation already usesisinstance(msg, ModelResponse)/isinstance(part, ToolCallPart)(no string-matching on.kind). AddedPydanticAiSchemaCanaryTestsso a future pydantic-ai version that renames the discriminators surfaces immediately rather than silently flipping everyNoneextraction into theempty_historymis-classification mode.Minor
opencontractserver/llms/agents/pydantic_ai_agents.py—PydanticAICorpusAgent._build_structured_system_promptnow says "most legal corpora need multiple targeted queries" instead of "most legal documents …" — corpus-scoped wording for the corpus agent. The document-agent prompt is unchanged.config/settings/test.py— TheDEFAULT_EMBEDDERenv-var override is now gated behind an explicitBENCHMARK_MODE=1env var. Without that opt-in, a strayDEFAULT_EMBEDDERvalue in the CI environment would silently push the regular test suite onto a real embedder and start making live network calls. The defaultTestEmbedderkeeps regularpytestruns hermetic.opencontractserver/tasks/data_extract_tasks.py— Whenmodel_overrideis accepted in the unrestricted (operator-only) path becauseBENCHMARK_ALLOWED_MODEL_OVERRIDESis unset, anINFO-level log line now records the override + cell ID. Lets operatorsgrepproduction logs to confirm the open mode is fired only by the benchmark tooling and never by an unexpected web/GraphQL caller.opencontractserver/pipeline/utils.py:get_default_reranker_instance— Docstring now warns thatbulk_update/QuerySet.update/ data-migration writes toPipelineSettings.default_rerankerbypass theauto_nowmodifiedfield used as part of the cache key, so callers who mutate the singleton via those paths must also callinvalidate_reranker_cache()(or touchmodifiedexplicitly).Tests
opencontractserver/tests/test_data_extract_failure_classification.py::PydanticAiSchemaCanaryTests— pin the pydantic-ai message-schema discriminators.opencontractserver/tests/test_batch_embedding.py::TestBatchEmbedTextAnnotations::test_transient_error_does_not_block_on_in_flight_peers— asserts the executor fast-fail path completes well inside the 10 s peer-block window. A regression that waits on in-flight peers (the previous default-shutdown behaviour) would take ~10 s and fail this test.opencontractserver/tests/test_benchmarks.py::ForceCeleryEagerSafetyGuardsTestCase— pins the three newforce_celery_eagersafety refusals.Test plan
pytest opencontractserver/tests/test_data_extract_failure_classification.py opencontractserver/tests/test_batch_embedding.py opencontractserver/tests/test_benchmarks.pypasses.force_celery_eageror the embeddings batch helper).python manage.py run_benchmark --benchmark legalbench-rag …still works end-to-end and now logs theOC_BENCHMARK_CLIopt-in implicitly.Generated by Claude Code