docs: docs-tests harness for Console UI drift detection#2669
docs: docs-tests harness for Console UI drift detection#2669marcel-rbro wants to merge 12 commits into
Conversation
Docs-as-tests for the Apify Console: extract UI claims (routes, tabs, buttons, headings) from platform docs with an LLM, store them as a reviewed baseline under assertions/, and verify them against Console staging with Playwright. Failures point back to source_file:line. - pages.json: adjustable list of docs pages to cover - scripts/extract*.sh: LLM extraction (run locally, commit the result) - tests/from-doc.spec.ts: evaluate the stored assertions (CI-friendly) - reporters/issues-reporter.ts: machine-readable drift report No secrets committed; auth.json and .env are gitignored. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
✅ Preview for this PR (commit |
- portable read loop in extract-all.sh (macOS bash 3.2 has no mapfile) - detach claude stdin so it doesn't drain the page list when looped - slice the <output> block with perl so single-line tag+JSON also parses Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Narrow the manifest to four fixture-free landing pages (console index, settings, billing, store) and commit the AI-extracted baseline: 53 assertions (21 route, 15 text, 11 tab, 6 button). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the interactive auth.setup.ts + auth.json storageState handoff with a worker-scoped fixture that logs in fresh each run from CONSOLE_STAGING_USER_EMAIL/_PASSWORD (.env locally, GitHub Secrets in CI) and keeps the session in memory. Nothing is written to or read from disk, so no auth file has to pre-exist in the GitHub Action. Seeded user has no 2FA, so it's a plain email+password submit; drop the setup project and pnpm auth. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The worker auth fixture assumed a single combined sign-in form; Apify's sign-in is two-step (email -> Next -> password -> Log in), both steps on /sign-in. Pin the real selectors, avoid the SSO buttons, and wait on domcontentloaded (the Console SPA never reaches networkidle). Add timestamped step logging to the login fixture so a slow or stuck login is visible instead of a silent hang before any test reports. Also adds the pnpm workspace + lockfile so docs-tests installs in isolation, and documents the staging-user vars in .env.example. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Runs the docs-tests Playwright harness against Console staging weekly and on manual dispatch; files a drift issue for maintainers on failure. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Remove store filter and sign-in button assertions that can't be tested against Console staging, handle needs_auth:false assertions in a logged-out context, add the bold-text extraction rule, and update the Session / Add connector labels to match the current Console. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
How to test this PRThe evaluation harness is just branch files, so most testing needs no CI. Note the CI constraint up front: 1. Local run — fastest, tests the harness + baseline directlycd docs-tests
pnpm install
pnpm exec playwright install chromium
cp .env.example .env # fill in CONSOLE_STAGING_URL + seeded-user email/password
pnpm test # evaluate the committed baseline against Console staging
Expected on this branch: 41 passed · 5 skipped · 0 failed. The 5 skipped are detail-page assertions with no landing route (a documented gap, not a failure). 2. See a failure / the drift reportPoint one assertion at something that doesn't exist (edit an 3. Exercise the actual GitHub Action on a branch (pre-merge)Because dispatch/schedule only fire from the default branch, use a branch that carries a
Neither the push trigger nor the canary is part of this PR — they exist only on the test branch, so 4. After merge (production)The workflow becomes available on
On failure it uploads the report + |
Exclude docs-tests/ from markdownlint + Vale (internal tooling, like .claude/ .agents/standards), and fix two oxlint findings in the harness code (no-control-regex on the ANSI strip, prefer-includes over a regex test). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
oxfmt --check flagged the 5 docs-tests TS files (formatting); reformatted them. Vale's Microsoft.Dashes is an explicit toggle that survives an empty BasedOnStyles, so disable it explicitly for docs-tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Retry failed assertions once in CI so a transient staging/network hiccup does not surface as drift and auto-file an issue. Reuse the validated baseURL from playwright.config in the logged-out context, and extract the duplicated failed/timedOut predicate in the reporter. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Part of #2671
Adds
docs-tests/— a docs-as-tests harness that catches when the platform docs claim something the Console UI no longer does (a renamed tab, moved route, relabeled button).Model: an adjustable list of docs pages → AI-extracted assertions (run locally, human-reviewed) → a committed baseline → evaluated against Console staging with Playwright. Failures point back to
source_file:line.Includes the scheduled GitHub Action (
docs-ui-tests.yaml, weekly + manual dispatch) that runs the evaluation against staging and files a drift issue on failure. Extraction never runs in CI, so no API key is needed there; the staging target and login come from repo secrets at run time — nothing is committed.Coverage starts with the Console section (routes + landing-page elements); widening is a follow-up. See
docs-tests/README.mdfor the model and known gaps.🤖 Generated with Claude Code