Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,10 @@ sites/*/static/external_cache/
# =============================================================
# Intermediate / volatile — never committed anywhere.
# =============================================================
sites/*/scraped_data/ # scrape pipeline intermediate; runtime data lives in instance_seed/*.db
sites/*/instance/ # rebuilt at every container boot from instance_seed/
# scrape pipeline intermediate; runtime data lives in instance_seed/*.db
sites/*/scraped_data/
# rebuilt at every container boot from instance_seed/
sites/*/instance/
sites/*/venv/

# HF download metadata produced by `hf download`.
Expand Down Expand Up @@ -92,4 +94,4 @@ secrets.json
# ============================================================
# Agent demo results
# =============================================================
agent_demo/runs/
agent_demo/runs/
6 changes: 3 additions & 3 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,13 +48,13 @@ Inside the image, sites live at `/opt/WebSyn/<site>/`. The path predates the ren
# fresh clone
./scripts/fetch_assets.sh # pulls assets from HF
./scripts/build.sh # docker build -t webharbor:dev .
docker run -d -p 8101:8101 -p 40000-40014:40000-40014 webharbor:dev
docker run -d -p 8101:8101 -p 40000-40015:40000-40015 webharbor:dev
```

Or use the published image directly:

```bash
docker run -d -p 8101:8101 -p 40000-40014:40000-40014 \
docker run -d -p 8101:8101 -p 40000-40015:40000-40015 \
battalion7244/webharbor:latest
```

Expand Down Expand Up @@ -129,7 +129,7 @@ python3 -m py_compile sites/<site>/app.py

# 3. run on alt ports (don't collide with anything you already have running)
docker run -d --rm --name wh-test \
-p 8201:8101 -p 41000-41014:40000-40014 webharbor:dev
-p 8201:8101 -p 41000-41015:40000-40015 webharbor:dev

# 4. control plane healthy, all sites alive
curl -s http://localhost:8201/health | python3 -m json.tool | head
Expand Down
4 changes: 2 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# WebHarbor — slim, self-contained image.
# 15 Flask mirror sites + control plane on :8101.
# 16 Flask mirror sites + control plane on :8101.

FROM python:3.12-slim-bookworm

Expand Down Expand Up @@ -33,6 +33,6 @@ COPY control_server.py /opt/control_server.py
COPY site_runner.py /opt/site_runner.py
RUN chmod +x /opt/websyn_start.sh

EXPOSE 8101 40000-40014
EXPOSE 8101 40000-40015

CMD ["/opt/websyn_start.sh"]
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,17 +36,17 @@ WebHarbor takes a different approach. We leverage coding agent (e.g., Claude Cod
- **Deep features unlocked** — carts, checkouts, accounts, all fully testable
- **Evolving** — harder tasks drive richer mirrors; the environment grows with agents
- **RL-ready** — sub-second database resets between rollouts
- **Community-driven** — 15 sites today, scaling to 100+ together
- **Community-driven** — 16 sites today, scaling to 100+ together

## 🚀 Quickstart

One command to run all web environments:

```bash
docker run -p 8101:8101 -p 40000-40014:40000-40014 battalion7244/webharbor:latest
docker run -p 8101:8101 -p 40000-40015:40000-40015 battalion7244/webharbor:latest
```

Then point your agent at `http://localhost:40000` through `http://localhost:40014` to explore 15 local mirrors of webvoyager sites: `Allrecipes, Amazon, Apple, ArXiv, BBC News, Booking, GitHub, Google Flights, Google Maps, Google Search, Hugging Face, Wolfram Alpha, Cambridge Dictionary, Coursera, and ESPN`.
Then point your agent at `http://localhost:40000` through `http://localhost:40015` to explore 16 local mirrors of webvoyager sites: `Allrecipes, Amazon, Apple, ArXiv, BBC News, Booking, GitHub, Google Flights, Google Maps, Google Search, Hugging Face, Wolfram Alpha, Cambridge Dictionary, Coursera, ESPN, and IKEA`.

For sub-second reset between rollouts, expose the control plane and call `/reset/<site>`:

Expand Down Expand Up @@ -111,4 +111,4 @@ WebHarbor is initiated by UNC-Chapel Hill and Microsoft, with contributions from
url = {https://aiming-lab.github.io/webharbor.github.io},
note = {Project website.}
}
```
```
2 changes: 1 addition & 1 deletion control_server.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
'allrecipes', 'amazon', 'apple', 'arxiv', 'bbc_news', 'booking',
'github', 'google_flights', 'google_map', 'google_search',
'huggingface', 'wolfram_alpha', 'cambridge_dictionary',
'coursera', 'espn',
'coursera', 'espn', 'ikea',
]
BASE_PORT = 40000
WEBSYN_DIR = '/opt/WebSyn'
Expand Down
5 changes: 5 additions & 0 deletions sites/ikea/_health.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"""Per-site health probe (optional, called by control_server)."""


def health():
return {"ok": True, "site": "ikea", "note": "Local retail demo ready"}
Loading