Skip to content

Commit 815d509

Browse files
committed
Drop data-title attr; rely on filter({ hasText }) instead
Codecov was flagging data-title={x ?? ""} as a partial-coverage branch on every card-view variant (Documents.tsx had 3, ModernDocumentItem.tsx had 1). The fallback was only there because the JSX value can be null or undefined; both forms (|| and ??) trigger codecov's branch tracker even with no semantic difference at runtime. Switching the helpers to the standard Playwright filter-by-text idiom removes the need for the attribute entirely, which means we can drop the JSX expression and its branch. data-testid=document-card and data-processing={String(Boolean(...))} remain — those are the load-bearing affordances and neither has a branch codecov tracks.
1 parent 6ad522b commit 815d509

4 files changed

Lines changed: 23 additions & 29 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1111

1212
- **VCR.py wrapper for LLM calls in `doc_extract_query_task`**`opencontractserver/utils/vcr_replay.py` exposes a `maybe_vcr_cassette()` context manager that, when `OC_LLM_VCR_MODE` and `OC_LLM_VCR_CASSETTE` are set on the celery worker, records or replays every HTTP call to LLM provider hosts (currently `api.openai.com` / `api.anthropic.com`). A custom request-body matcher strips volatile values (millisecond timestamps, Django document PKs, OpenAI tool-call IDs, UUIDs) so a cassette recorded against one DB replays cleanly against another. With the env vars unset the wrapper is a no-op — production behavior is unchanged. Pre-recorded cassette for the E2E extract spec lives at `opencontractserver/tests/fixtures/cassettes/e2e_extract_pdf_workflow/extract.yaml`. Replay was verified end-to-end against a deliberately-fake `OPENAI_API_KEY` to confirm no real network call is made. See `docs/development/e2e_vcr.md` for record / replay / debug instructions.
1313
- **`.github/workflows/frontend-e2e-extract.yml`** — CI workflow (currently `workflow_dispatch` only) that runs the new E2E extract spec against the full `local.yml` stack with `OC_LLM_VCR_MODE=replay` and a fake `OPENAI_API_KEY`. Manual-only because LlamaParse is not yet covered by the cassette and would otherwise be called for real on each run; activating on every PR is a follow-up that needs to extend `_LLM_HOSTS` and re-record.
14-
- **`frontend/tests/e2e/extract-pdf-workflow.spec.ts`** — full-stack Playwright E2E spec for the extract pipeline: login → create corpus → upload two PDFs (`frontend/tests/fixtures/{usc-title-1,eton-agreement}.pdf`) → wait for parse + embedding → create extract with one column → run with a real OpenAI call → CSV export → assert non-empty cells. Adds new helpers to `frontend/tests/e2e/helpers.ts` (`uploadPdfViaUI`, `waitForDocumentReady`, `createExtractViaUI`, `openExtractByName`, `addColumnViaUI`, `addDocumentsToExtractViaUI`, `runExtractAndWaitForFinish`). Gated on `E2E_RUN_LLM_TESTS=true`; skipped in CI until LLM responses can be mocked over the wire. Runs on the live `local.yml` stack; required tweaks to disable Auth0 (`.envs/.local/.django USE_AUTH0=false`) and to widen the celeryworker `watchfiles --ignore-paths` (in `compose/local/django/celery/worker/start` and the `local.yml` command pointer) so editor / Playwright artifact writes don't hot-reload the worker mid-task. Also adds `data-testid="document-card"` + `data-title` (+ `data-processing` on the `/documents`-view variants) to `frontend/src/views/Documents.tsx` and `data-testid="document-card"` + `data-title` to `frontend/src/components/documents/ModernDocumentItem.tsx`, so tests can poll for the `backendLock` UI signal without depending on hover-only action menus.
14+
- **`frontend/tests/e2e/extract-pdf-workflow.spec.ts`** — full-stack Playwright E2E spec for the extract pipeline: login → create corpus → upload two PDFs (`frontend/tests/fixtures/{usc-title-1,eton-agreement}.pdf`) → wait for parse + embedding → create extract with one column → run with a real OpenAI call → CSV export → assert non-empty cells. Adds new helpers to `frontend/tests/e2e/helpers.ts` (`uploadPdfViaUI`, `waitForDocumentReady`, `createExtractViaUI`, `openExtractByName`, `addColumnViaUI`, `addDocumentsToExtractViaUI`, `runExtractAndWaitForFinish`). Gated on `E2E_RUN_LLM_TESTS=true`; skipped in CI until LLM responses can be mocked over the wire. Runs on the live `local.yml` stack; required tweaks to disable Auth0 (`.envs/.local/.django USE_AUTH0=false`) and to widen the celeryworker `watchfiles --ignore-paths` (in `compose/local/django/celery/worker/start` and the `local.yml` command pointer) so editor / Playwright artifact writes don't hot-reload the worker mid-task. Also adds `data-testid="document-card"` (+ `data-processing` on the `/documents`-view variants) to `frontend/src/views/Documents.tsx` and `data-testid="document-card"` to `frontend/src/components/documents/ModernDocumentItem.tsx`, so tests can poll for the `backendLock` UI signal without depending on hover-only action menus. Cards are matched by `[data-testid="document-card"]` filtered with the visible title text — the standard Playwright pattern.
1515
- **Mypy: type analyzer, shared, agents, badges, worker_uploads; introduce shared protocols** (Issue #1335): Brought the five smaller, interface-rich target packages over the ≥70% return-annotation bar called for by the issue and seeded `opencontractserver/types/protocols.py` with the four protocols requested in the scope:
1616
- `VectorStoreProtocol` — minimum surface (`search` / `async_search`) implemented by `CoreAnnotationVectorStore` (`opencontractserver/llms/vector_stores/core_vector_stores.py`); imported and re-exported from that module so consumers can annotate against the protocol rather than the concrete dataclass.
1717
- `PipelineComponentProtocol``title` / `description` / `author` / `dependencies` surface that the pipeline registry duck-types against; imported from `opencontractserver/pipeline/base/base_component.py` so any future parser/embedder/thumbnailer registered outside the inheritance hierarchy still type-checks against the same contract.

frontend/src/components/documents/ModernDocumentItem.tsx

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1287,7 +1287,6 @@ export const ModernDocumentItem: React.FC<ModernDocumentItemProps> = ({
12871287
<CardContainer
12881288
ref={setNodeRef}
12891289
data-testid="document-card"
1290-
data-title={title ?? ""}
12911290
className={`${is_selected ? "is-selected" : ""} ${
12921291
isProcessing ? "backend-locked" : ""
12931292
} ${isFailed ? "failed" : ""} ${
@@ -1500,7 +1499,6 @@ export const ModernDocumentItem: React.FC<ModernDocumentItemProps> = ({
15001499
<ListContainer
15011500
ref={setNodeRef}
15021501
data-testid="document-card"
1503-
data-title={title ?? ""}
15041502
className={`${is_selected ? "is-selected" : ""} ${
15051503
isProcessing ? "backend-locked" : ""
15061504
} ${isFailed ? "failed" : ""} ${isLongPressing ? "long-pressing" : ""}`}

frontend/src/views/Documents.tsx

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1381,7 +1381,6 @@ export const Documents = () => {
13811381
role="button"
13821382
tabIndex={0}
13831383
data-testid="document-card"
1384-
data-title={doc.title ?? ""}
13851384
data-processing={String(Boolean(doc.backendLock))}
13861385
aria-label={`Open document ${doc.title || "Untitled"}`}
13871386
$selected={selected_document_ids.includes(doc.id)}
@@ -1506,7 +1505,6 @@ export const Documents = () => {
15061505
role="row"
15071506
tabIndex={0}
15081507
data-testid="document-card"
1509-
data-title={doc.title ?? ""}
15101508
data-processing={String(Boolean(doc.backendLock))}
15111509
aria-label={`Open document ${doc.title || "Untitled"}`}
15121510
$selected={selected_document_ids.includes(doc.id)}
@@ -1576,7 +1574,6 @@ export const Documents = () => {
15761574
role="listitem"
15771575
tabIndex={0}
15781576
data-testid="document-card"
1579-
data-title={doc.title ?? ""}
15801577
data-processing={String(Boolean(doc.backendLock))}
15811578
aria-label={`Open document ${doc.title || "Untitled"}`}
15821579
$selected={selected_document_ids.includes(doc.id)}

frontend/tests/e2e/helpers.ts

Lines changed: 22 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -648,12 +648,13 @@ export async function addColumnViaUI(
648648
* tab's "Add documents" floating button. Opens SelectDocumentsModal,
649649
* waits for all document cards to load (graphene returns up to 100 in
650650
* one page, so all documents appear immediately), clicks each target
651-
* card by its `data-title` attribute, then confirms with "Add Documents".
651+
* card by filtering on its visible title text, then confirms with
652+
* "Add Documents".
652653
*
653654
* NOTE: The modal's search bar performs full-text content search (not
654-
* title search). We do NOT use it — we rely on `data-title` selectors
655-
* directly. All cards are in the DOM and Playwright's `click()` scrolls
656-
* them into view automatically.
655+
* title search). We do NOT use it — we filter cards by visible title
656+
* text directly. All cards are in the DOM and Playwright's `click()`
657+
* scrolls them into view automatically.
657658
*
658659
* If a document is already in the extract, `filterDocIds` removes it
659660
* from the modal, so it won't appear. The step is skipped gracefully
@@ -691,8 +692,8 @@ export async function addDocumentsToExtractViaUI(
691692

692693
// Wait for at least one document card to load. The query fetches up to
693694
// RELAY_CONNECTION_MAX_LIMIT (100) in one page, so all cards are in the
694-
// DOM immediately. Cards use data-testid="document-card" and
695-
// data-title={title} (ModernDocumentItem.tsx).
695+
// DOM immediately. Cards are tagged with `data-testid="document-card"`
696+
// (ModernDocumentItem.tsx); we then filter by the visible title text.
696697
await expect(
697698
page.locator("[data-testid='document-card']").first()
698699
).toBeVisible({ timeout: 15_000 });
@@ -701,9 +702,9 @@ export async function addDocumentsToExtractViaUI(
701702
// the extract and filtered out of the modal by filterDocIds).
702703
const titlesToAdd: string[] = [];
703704
for (const title of documentTitles) {
704-
const card = page.locator(
705-
`[data-testid='document-card'][data-title='${title}']`
706-
);
705+
const card = page
706+
.locator("[data-testid='document-card']")
707+
.filter({ hasText: title });
707708
const count = await card.count();
708709
if (count > 0) {
709710
titlesToAdd.push(title);
@@ -718,9 +719,10 @@ export async function addDocumentsToExtractViaUI(
718719
}
719720

720721
for (const title of titlesToAdd) {
721-
const card = page.locator(
722-
`[data-testid='document-card'][data-title='${title}']`
723-
);
722+
const card = page
723+
.locator("[data-testid='document-card']")
724+
.filter({ hasText: title })
725+
.first();
724726
// click() scrolls the element into view automatically.
725727
await card.click();
726728
await page.waitForTimeout(200);
@@ -835,29 +837,26 @@ export async function runExtractAndWaitForFinish(
835837
* grid view live inside an `opacity: 0` hover-overlay and are never
836838
* reliably testable without triggering hover interactions.
837839
*
838-
* The `data-testid="document-card"` + `data-title` attributes are set
839-
* on all three card wrapper variants in Documents.tsx (GRID, LIST,
840-
* COMPACT) and on ModernDocumentItem.tsx for corpus-scoped views.
841-
*
842-
* Real-world ingestion can take "up to a few minutes" per PDF; the
843-
* default 8-minute ceiling leaves headroom for cold workers.
840+
* Cards are matched by `data-testid="document-card"` (set in both
841+
* Documents.tsx and ModernDocumentItem.tsx) and the visible title
842+
* text. Real-world ingestion can take "up to a few minutes" per PDF;
843+
* the default 8-minute ceiling leaves headroom for cold workers.
844844
*/
845845
export async function waitForDocumentReady(
846846
page: Page,
847847
documentTitle: string,
848848
timeoutMs: number = 8 * 60 * 1000
849849
): Promise<void> {
850-
// Use attribute selector to match the exact title (avoids partial-text
851-
// collisions when one title is a prefix of another).
852-
const cardSelector = `[data-testid="document-card"][data-title="${documentTitle}"]`;
853-
854850
await expect(async () => {
855851
// Re-navigate on every attempt to force a fresh GraphQL fetch and
856852
// bypass any in-flight polling gaps.
857853
await spaNavigate(page, "/documents");
858854
await expectViewVisible(page, { kind: "text", text: /Your\s+documents/i });
859855

860-
const card = page.locator(cardSelector);
856+
const card = page
857+
.locator("[data-testid='document-card']")
858+
.filter({ hasText: documentTitle })
859+
.first();
861860
await expect(card).toBeVisible({ timeout: 10_000 });
862861

863862
// `data-processing="true"` while backendLock === true.

0 commit comments

Comments
 (0)