Skip to content

Add WebHarbor site registry audit script#46

Open
Lxr-max wants to merge 1 commit into
aiming-lab:mainfrom
Lxr-max:add-site-registry-audit
Open

Add WebHarbor site registry audit script#46
Lxr-max wants to merge 1 commit into
aiming-lab:mainfrom
Lxr-max:add-site-registry-audit

Conversation

@Lxr-max

@Lxr-max Lxr-max commented Jun 2, 2026

Copy link
Copy Markdown

Summary

  • adds scripts/audit_site_registry.py to audit WebHarbor site registration consistency, port mappings, reset integration, task URL wiring, and lightweight asset metadata
  • adds scripts/test_audit_site_registry.py with standard-library tests for duplicate ports, missing registrations, missing directories, strict mode, and JSON output
  • adds a short README usage section for the new audit tool

Why this helps WebHarbor review

  • catches registry drift between websyn_start.sh and control_server.py
  • surfaces port collisions and Docker exposure gaps before contributors open site PRs
  • checks task files are wired to the registered localhost port without duplicating the full task validator
  • gives reviewers a fast repo-level health check for site registration changes

CLI usage

python scripts/audit_site_registry.py
python scripts/audit_site_registry.py --site amazon
python scripts/audit_site_registry.py --strict
python scripts/audit_site_registry.py --json

Checks included

  • site discovery across sites/, websyn_start.sh, and control_server.py
  • registry list mismatches and duplicate site entries
  • port mapping and Docker EXPOSE coverage checks
  • reset example consistency from README
  • task JSONL parse + task localhost port integration checks
  • runtime file hygiene warnings for instance/, scraped_data/, logs, caches, and screenshots
  • lightweight .assetpaths coverage checks without fetching or editing assets

Test commands and results

  • py -m py_compile scripts/audit_site_registry.py
  • py -m py_compile scripts/test_audit_site_registry.py
  • py scripts/test_audit_site_registry.py ✅ (7 tests passed)
  • py scripts/audit_site_registry.py
  • py scripts/audit_site_registry.py --json

Audit run result on current repo

  • 15 site directories found
  • 15 registered sites found
  • 15 registered ports found
  • 15 task files checked
  • 643 tasks scanned for registry/task integration
  • 0 errors, 0 warnings on current upstream/main baseline

Notes

  • this is not a website contribution
  • no HF assets are involved
  • .assets-revision was not modified
  • no existing site implementation was changed

Known limitations

  • asset checks are metadata-only and intentionally do not require local HF assets to be fetched
  • task checks focus on registry and port integration, not the full task-content validation covered by the separate task-validator tooling

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant