Add Compass real-estate mirror (port 40015)#25
Conversation
Adds a Flask mirror of compass.com as the 16th WebHarbor site, with browse / search / filter, listing detail, agent directory, account flows (save, tour, inquiry, saved search, collection), and 18 WebVoyager-format benchmark tasks. sites/compass/: - app.py (1011 lines): 10 SQLAlchemy models, 35+ routes, token-overlap scored search with city/state/neighborhood boosts. User.check_password accepts both pbkdf2 and bcrypt prefixes so seed-time PBKDF2 hashes (deterministic) coexist with runtime Flask-Bcrypt writes. - seed_data.py (659 lines): idempotent function-level gates; PBKDF2 with fixed per-email salt to preserve byte-identical reset; Co-op pool backfilled to keep filter-based tasks at >=5 candidates. - 33 Jinja templates + 327-line hand-rolled CSS (white/black/serif to match the real Compass palette). - tasks.jsonl: 18 WebVoyager tasks (3 hard multi-step). - listings_clean.json: 524 normalized listings consumed by seed_data at build time (committed alongside the mirror, per the convention used by booking/, arxiv/, etc.). Registration (3 files, must stay in sync per AGENTS.md): - websyn_start.sh: compass appended to SITES, two ready-count 15s -> 16. - control_server.py: 'compass' appended to SITES. - Dockerfile: EXPOSE 8101 40000-40015. Heavy assets (instance_seed/compass.db, static/images/, ~129 MB packed) ship via the companion HuggingFace PR ChilleD/WebHarbor#3. .assets-revision already pins main, so once that merges this Just Works. Byte-identical reset verified: md5sum instance/compass.db instance_seed/compass.db -> 2a7458e3b6c3e3d0b39c32cca5d0f519 (both files). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ReviewTested on a fresh worktree of this branch. HF asset ( What works ✓Mechanical
Determinism trick — well-executed if (self.password_hash or "").startswith("pbkdf2:"):
return wz_check(self.password_hash, pw)
return bcrypt.check_password_hash(self.password_hash, pw)Verified end-to-end: registered Taylor Reed with Functional depth
Task quality
Should-fix (non-blocking)1.
In practice this isn't task-blocking — the homepage search box uses 2.
Either get the HF PR merged first then bump 3. PR base is A rebase before merge would be wise — both for cleanliness and to expose any silent conflict with sites added since ( 4. The Dockerfile pip-installs explicitly with locked versions (correct per AGENTS.md). 5.
Bottom lineThe most polished site visually of the four PRs I've reviewed — real photographs throughout, clean Compass-style design system, deep listing detail. Seed determinism handled with the cleanest hand-rolled PBKDF2 trick I've seen, and the tarball is the only one out of three so far without macOS junk. Two tiny pieces of friction (unused |
TL;DR
Adds a Flask mirror of compass.com as the 16th
WebHarbor site, with browse / search / filter, listing detail, agent
directory, account flows (save, tour, inquiry, saved search, collection),
and 18 WebVoyager-format benchmark tasks.
Companion HuggingFace PR: https://huggingface.co/datasets/ChilleD/WebHarbor/discussions/3
What's in this PR
Site code (
sites/compass/)app.pyseed_data.pytemplates/*.htmlstatic/css/compass.csslistings_clean.jsonseed_data.pyat build timetasks.jsonl_health.pyrequirements.txtRegistration (3 files modified, must stay in sync per
AGENTS.md)websyn_start.sh—compassappended toSITES=( … ), the two15sin ready-count log lines bumped to
16.control_server.py—'compass'appended toSITES.Dockerfile—EXPOSE 8101 40000-40015.Verification
All checks in
AGENTS.md§ Pre-PR checks pass.python3 -m py_compile sites/compass/{app.py,seed_data.py}— clean../scripts/build.sh webharbor:dev— image builds.docker runon alt ports8201/41000-41015:/healthreports all 16 sites alive with PIDs.200.tasks.jsonlwalk end-to-end against the running mirror.Design notes
(`sha1("salt-" + email)[:8]`), not bcrypt, because bcrypt's random salt
breaks byte-identical reset. `User.check_password` accepts both prefixes
so future writes from the running app (which uses Flask-Bcrypt) still
authenticate.
rather than strict `LIKE %q% AND %q%` — matches the booking-site pattern
in `sites/booking/app.py`.
`Listing.id` rather than `price.desc()` so the answers to Tasks 11 / 17
don't surface for free in the hero grid. Co-op pool was backfilled to
`compass.com/m/0//600x400.webp` images, resolved via Playwright
and downloaded with httpx. No placeholders, no AI stock photos.
Assets
Heavy assets (`instance_seed/compass.db`, `static/images/`, ~129 MB
packed) ship via the companion HuggingFace PR linked above.
`.assets-revision` already pins `main`, so once the HF PR merges this
code PR Just Works.