Adds a Flask mirror of https://www.gov.uk/ as the 16th WebHarbor site,
running on port 40015.
## What's mirrored
- 16 top-level topics (Money and tax, Visas and immigration, Driving, ...)
- 44 subtopics
- 15 government departments (HMRC, DfE, Home Office, DVLA, NHS England, ...)
with real ministers / permanent secretaries / employee counts
- 73 guidance articles (Self Assessment, Income Tax, Universal Credit,
Skilled Worker visa, passport applications, vehicle tax, ...)
- 20 announcements (press releases, news stories, speeches)
- Search across articles / announcements / departments
## Visual fidelity
Uses the official MIT-licensed govuk-frontend v6.1.0 CSS + JS + GDS
Transport font + crown SVG. Templates use the canonical Design System
component DOM (govuk-header, govuk-breadcrumbs, govuk-summary-list,
govuk-pagination, govuk-grid-row, etc.) so an agent's selectors match
the real GOV.UK.
Content licensed under the Open Government Licence v3.0 (synthesized
in the spirit of GOV.UK guidance; no upstream copy embedded).
## Folder layout
Matches the canonical site layout (compare wolfram_alpha, google_search):
sites/gov_uk/
|-- _health.py
|-- app.py
|-- seed_data.py
|-- tasks.jsonl
|-- instance_seed/ (HF-managed)
|-- static/{css,js,fonts,icons,images,external_cache}/
\`-- templates/
## Wiring
- websyn_start.sh: gov_uk appended to SITES, 15->16 counts
- control_server.py: gov_uk added to SITES
- Dockerfile: EXPOSE 40000-40015
## Pre-PR verification (passed)
- docker build webharbor:dev clean (5.92 GB)
- 16/16 sites bind in 2s
- All gov_uk routes (/, /browse, /browse/<topic>, /browse/<t>/<s>,
/guidance/<slug>, /government/organisations[/<dept>],
/government/announcements, /search, /_health) return 200
- /reset/gov_uk -> {ready: true}, md5 byte-identical pre/post
- Byte-identical after docker restart
## Asset PR
Seed DB (gov_uk.tar.gz, 32 KB) uploaded as HF PR:
https://huggingface.co/datasets/ChilleD/WebHarbor/discussions/22
.assets-revision will be bumped to the HF merge SHA once that PR lands.
TL;DR
Adds a Flask mirror of gov.uk as the 16th WebHarbor site (port 40015), with topic browse, guidance article detail, department directory, announcements, and search. Uses the official MIT-licensed govuk-frontend v6.1.0 for canonical Design System DOM.
Companion HuggingFace PR: https://huggingface.co/datasets/ChilleD/WebHarbor/discussions/22
What's in this PR
sites/gov_uk/:app.pyseed_data.pytemplates/*.htmlstatic/{css,js,fonts,icons}/tasks.jsonlRegistration (sync per AGENTS.md):
gov_ukadded towebsyn_start.shandcontrol_server.py,DockerfileEXPOSE bumped to 40000-40015.Verification
All checks in AGENTS.md § Pre-PR checks pass: image builds clean, 16/16 sites alive, every gov_uk route returns 200,
POST /reset/gov_ukbyte-identical pre/post (md5f6931b6c…), and identical afterdocker restart.Notes
govuk-frontend.min.cssonly patched with onesedto rewriteurl(/assets/...)→ relative paths so they resolve through Flask's/static/..assets-revisionstill points atmain; will bump to the HF merge SHA after that PR is reviewed.