Skip to content

docs: docs-tests harness for Console UI drift detection#2669

Open
marcel-rbro wants to merge 12 commits into
masterfrom
docs/ui-tests
Open

docs: docs-tests harness for Console UI drift detection#2669
marcel-rbro wants to merge 12 commits into
masterfrom
docs/ui-tests

Conversation

@marcel-rbro

@marcel-rbro marcel-rbro commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Part of #2671

Adds docs-tests/ — a docs-as-tests harness that catches when the platform docs claim something the Console UI no longer does (a renamed tab, moved route, relabeled button).

Model: an adjustable list of docs pages → AI-extracted assertions (run locally, human-reviewed) → a committed baseline → evaluated against Console staging with Playwright. Failures point back to source_file:line.

Includes the scheduled GitHub Action (docs-ui-tests.yaml, weekly + manual dispatch) that runs the evaluation against staging and files a drift issue on failure. Extraction never runs in CI, so no API key is needed there; the staging target and login come from repo secrets at run time — nothing is committed.

Coverage starts with the Console section (routes + landing-page elements); widening is a follow-up. See docs-tests/README.md for the model and known gaps.

🤖 Generated with Claude Code

Docs-as-tests for the Apify Console: extract UI claims (routes, tabs,
buttons, headings) from platform docs with an LLM, store them as a
reviewed baseline under assertions/, and verify them against Console
staging with Playwright. Failures point back to source_file:line.

- pages.json: adjustable list of docs pages to cover
- scripts/extract*.sh: LLM extraction (run locally, commit the result)
- tests/from-doc.spec.ts: evaluate the stored assertions (CI-friendly)
- reporters/issues-reporter.ts: machine-readable drift report

No secrets committed; auth.json and .env are gitignored.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added t-docs Issues owned by technical writing team. tested Temporary label used only programatically for some analytics. labels Jun 22, 2026
@marcel-rbro marcel-rbro added the documentation Improvements or additions to documentation. label Jun 22, 2026
@apify-service-account

apify-service-account commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

✅ Preview for this PR (commit b2caab7c) is ready at https://pr-2669.preview.docs.apify.com (see action run).

marcel-rbro and others added 2 commits June 22, 2026 14:14
- portable read loop in extract-all.sh (macOS bash 3.2 has no mapfile)
- detach claude stdin so it doesn't drain the page list when looped
- slice the <output> block with perl so single-line tag+JSON also parses

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Narrow the manifest to four fixture-free landing pages (console index,
settings, billing, store) and commit the AI-extracted baseline: 53
assertions (21 route, 15 text, 11 tab, 6 button).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
marcel-rbro and others added 5 commits June 22, 2026 17:01
Replace the interactive auth.setup.ts + auth.json storageState handoff with
a worker-scoped fixture that logs in fresh each run from
CONSOLE_STAGING_USER_EMAIL/_PASSWORD (.env locally, GitHub Secrets in CI) and
keeps the session in memory. Nothing is written to or read from disk, so no
auth file has to pre-exist in the GitHub Action. Seeded user has no 2FA, so
it's a plain email+password submit; drop the setup project and pnpm auth.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The worker auth fixture assumed a single combined sign-in form; Apify's
sign-in is two-step (email -> Next -> password -> Log in), both steps on
/sign-in. Pin the real selectors, avoid the SSO buttons, and wait on
domcontentloaded (the Console SPA never reaches networkidle).

Add timestamped step logging to the login fixture so a slow or stuck
login is visible instead of a silent hang before any test reports.

Also adds the pnpm workspace + lockfile so docs-tests installs in
isolation, and documents the staging-user vars in .env.example.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Runs the docs-tests Playwright harness against Console staging weekly and on
manual dispatch; files a drift issue for maintainers on failure.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Remove store filter and sign-in button assertions that can't be tested against
Console staging, handle needs_auth:false assertions in a logged-out context,
add the bold-text extraction rule, and update the Session / Add connector
labels to match the current Console.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@marcel-rbro

marcel-rbro commented Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

How to test this PR

The evaluation harness is just branch files, so most testing needs no CI. Note the CI constraint up front: workflow_dispatch and schedule only fire from the default branch, so the workflow can't be launched from this PR branch via the Actions UI — use a local run or a push-trigger branch to exercise it pre-merge.

1. Local run — fastest, tests the harness + baseline directly

cd docs-tests
pnpm install
pnpm exec playwright install chromium
cp .env.example .env      # fill in CONSOLE_STAGING_URL + seeded-user email/password
pnpm test                 # evaluate the committed baseline against Console staging
  • pnpm report — HTML report with screenshots/video/trace for any failure
  • pnpm issues — machine-readable output/issues.json (one entry per failing assertion, with source_file:line)

Expected on this branch: 41 passed · 5 skipped · 0 failed. The 5 skipped are detail-page assertions with no landing route (a documented gap, not a failure).

2. See a failure / the drift report

Point one assertion at something that doesn't exist (edit an assertions/*.json target) and re-run pnpm test. It lands in output/issues.json with the offending source_quote, source_file:line, and the live page's observed candidates. In CI, that same failure files (or comments on) a docs-ui-drift issue.

3. Exercise the actual GitHub Action on a branch (pre-merge)

Because dispatch/schedule only fire from the default branch, use a branch that carries a push: trigger:

  • Dedicated canary / CI-test branch — docs/ui-tests-ci-check. A throwaway branch that mirrors this PR's content plus two things deliberately kept off master: a temporary push: trigger scoped to the branch, and a canary assertion (docs-tests/assertions/_ci-canary.json) that always fails. Push any commit to it and the workflow runs on the real runner end to end — logs in via the CONSOLE_STAGING_* repo secrets, the canary fails on purpose, the Playwright report + issues.json upload as artifacts, and a docs-ui-drift issue is filed. This is the sandbox for exercising the full failure path (drift detection + auto-issue) without touching master or this PR branch. Delete it after merge.
  • Temporary push trigger on your own branch. Same technique for testing other content: add a push: trigger scoped to your branch in docs-ui-tests.yaml, push, then remove it before merge.

Neither the push trigger nor the canary is part of this PR — they exist only on the test branch, so master keeps just workflow_dispatch + the weekly schedule.

4. After merge (production)

The workflow becomes available on master:

  • On demand: Actions → Docs UI drift tests → Run workflow (workflow_dispatch)
  • Automatically: weekly schedule (Mon 06:00 UTC)

On failure it uploads the report + issues.json and files/updates a docs-ui-drift issue for maintainers.

@marcel-rbro marcel-rbro marked this pull request as ready for review July 2, 2026 14:52
marcel-rbro and others added 4 commits July 3, 2026 13:36
Exclude docs-tests/ from markdownlint + Vale (internal tooling, like .claude/
.agents/standards), and fix two oxlint findings in the harness code
(no-control-regex on the ANSI strip, prefer-includes over a regex test).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
oxfmt --check flagged the 5 docs-tests TS files (formatting); reformatted them.
Vale's Microsoft.Dashes is an explicit toggle that survives an empty
BasedOnStyles, so disable it explicitly for docs-tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Retry failed assertions once in CI so a transient staging/network hiccup does
not surface as drift and auto-file an issue. Reuse the validated baseURL from
playwright.config in the logged-out context, and extract the duplicated
failed/timedOut predicate in the reporter.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation. t-docs Issues owned by technical writing team. tested Temporary label used only programatically for some analytics.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants