Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
7b26c9a
feat(sportsbooks): dockerized Airflow + Postgres pipeline
omkar055 Nov 12, 2025
1c986b7
chore(sportsbooks): delete old requirements.txt
omkar055 Nov 12, 2025
c5847cf
chore(sportsbooks): update pyproject.toml with requirements
omkar055 Nov 13, 2025
ccd71ea
chore(sportsbooks): modify requests to >=2.32.3 for nba-api
omkar055 Nov 13, 2025
ff50ec5
chore(sportsbooks): modify pandas to pandas>=2.2.0 for nba-api
omkar055 Nov 13, 2025
03c807f
chore(sportsbooks): syntax error :(
omkar055 Nov 13, 2025
5c7f96b
chore(sportsbook): autofixed issues using just fix-python
omkar055 Nov 13, 2025
f92bc19
fix: remove old test_scripts to fix build
rhtruong Nov 13, 2025
0cd7c9d
fix(sportsbooks): removed unused variables
rhtruong Nov 14, 2025
9b0b617
fix(sportsbooks): moved import to top of file
rhtruong Nov 14, 2025
38c7919
fix(sportsbook): fix lint errors in DAG + prizepicks script
omkar055 Nov 14, 2025
c2a416a
chore(sportsbook): resolved requested changes in pr
omkar055 Nov 19, 2025
4435edc
chore(sportsbook): more requested changes resolved + env changes
omkar055 Nov 19, 2025
1c0f17f
fix(sportsbook): forgot to actually load the env variable (brain fog)
omkar055 Nov 19, 2025
c4d27ea
chore(sportsbook): branch cleanup
omkar055 Nov 19, 2025
1288745
feat(sportsbook): sql schema files
omkar055 Jan 12, 2026
ea0dea9
feat(sportsbook): sql schema files migration
omkar055 Jan 12, 2026
f41ba0b
Merge branch 'main' into feat/sportsbooks_webscraper
JonathanPLev Jan 21, 2026
24d8af9
refactor(sportsbook): moved hard-coded numbers to yml
rhtruong Jan 25, 2026
efbac56
feat(sportsbook): added db verifier
rhtruong Jan 25, 2026
b2788e0
fix(sportsbook): fixing urllib3 trivy error
rhtruong Feb 2, 2026
df34559
Merge branch 'main' into feat/sportsbooks_webscraper
rhtruong Feb 2, 2026
8cebc92
Merge branch 'main' into feat/sportsbooks_webscraper
rhtruong Feb 4, 2026
98abe27
fix(sportsbook): airflow creds hidden
omkar055 Mar 5, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Postgres
POSTGRES_USER=
POSTGRES_PASSWORD=
POSTGRES_DB=
POSTGRES_EXTERNAL_PORT=5433

# Airflow
AIRFLOW_ADMIN_USERNAME=
AIRFLOW_ADMIN_PASSWORD=
AIRFLOW_PORT=8080

#API links
BETTINGPROS_API_URL=
Binary file added model_state.pt
Binary file not shown.
Binary file added preproc.joblib
Binary file not shown.
8 changes: 7 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,17 @@ requires-python = ">=3.12"
dependencies = [
"nba-api>=1.10.2",
"scikit-learn>=1.7.2",
"requests>=2.32.3",
"pandas>=2.2.0",
"psycopg2-binary==2.9.9",
"SQLAlchemy==1.4.52",
"python-dotenv==1.0.1",
"python-dateutil==2.8.2",
"gdown",
"torch>=2.9.0",
"urllib3>=2.6.0",
]


[tool.poe.tasks]
# Python linting and formatting
lint = "ruff check ."
Expand Down
19 changes: 19 additions & 0 deletions src/sportsbook_webscraper_pipeline/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
FROM apache/airflow:3.0.0-python3.10

USER root
RUN apt-get update && apt-get install -y git curl && rm -rf /var/lib/apt/lists/*

USER airflow
WORKDIR /opt/airflow

COPY pyproject.toml .
COPY uv.lock .
RUN pip install uv && uv pip install --system .

COPY airflow/dags /opt/airflow/dags
COPY airflow_pipeline /opt/airflow/airflow_pipeline

ENV PYTHONPATH="/opt/airflow/airflow_pipeline:${PYTHONPATH}"
ENV AIRFLOW_HOME=/opt/airflow

CMD ["airflow", "standalone"]
73 changes: 73 additions & 0 deletions src/sportsbook_webscraper_pipeline/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# webscrape
Dockerized Airflow pipeline that scrapes NBA player prop lines using scripts under airflow_pipeline/api_scripts, normalizes and appends the results into Postgres on docker
itself. Airflow scheduled to run the nba_props_dag daily.

---

## Prerequisites
- Docker Desktop (or Docker Engine) with Compose v2
- Optional: Python 3.10+ if you want to run the scraper scripts locally

---

## Quick start (Docker)

1. **Configure**
```bash
git clone <repo>
cd webscrape
```
The Airflow container reads DB credentials from the environment (`DB_USERNAME`, `DB_PASSWORD`, `DB_HOST`, `DB_PORT`, `DB_NAME`). Defaults target the Postgres service defined in `docker-compose.yml` (`line_dancer / sportsbook_data @ postgres:5432 / nba_deeplearning`). Update either `.env` or the compose file if you need different values.

2. **Build and launch**
```bash
docker compose down # don't dompose down -v: will wipe old containers/volumes, as
Comment thread
JonathanPLev marked this conversation as resolved.
# well as the postgres database
docker compose up -d # no need to build

docker ps # check container status

```
This starts:
- `postgres`: stores the scraped data (volume `postgres_data` keeps rows between runs, exposed on `localhost:5433`)
- `airflow`: runs Airflow 3 with SequentialExecutor + SQLite metadata, but connects to the Postgres service for ETL output

3. **Access Airflow**
- UI: http://localhost:8080
- Credentials: `admin` and {password} (set in `docker-compose.yml`; rotate before sharing externally)

4. **Trigger the DAG**
- In the UI, unpause `nba_sportsbook_pipeline` and click “Trigger DAG”
- Or via CLI:
```bash
docker compose exec airflow airflow dags test nba_sportsbook_pipeline $(date +%Y-%m-%d)
```
The DAG imports `migrate_to_postgres.py`, which runs the BettingPros, PrizePicks, and DraftEdge scrapers, validates the schema, and appends rows into the `player_lines` table inside Postgres.

5. **Inspect the database**
```bash
psql -h localhost -p 5433 -U line_dancer -d nba_deeplearning
\dt
SELECT COUNT(*) FROM player_lines;


docker exec -it webscrape-postgres-1 psql -U line_dancer -d nba_deeplearning;
```

6. **Stop / clean up**
```bash
docker compose down # keep scraped data
docker compose down -v # remove containers + Postgres volume // DON'T DO THIS
```

---

## Local development (optional)
- Install dependencies: `pip install -r requirements.txt`
- Export the same `DB_*` vars as the container and run `python airflow/dags/migrate_to_postgres.py` to append data without Airflow.

---

## Notes
- Airflow metadata stays on SQLite (default) so Postgres holds only the scraper output.
- Update secrets (`DB_PASSWORD`, Airflow admin password) before committing or sharing the project.
Loading
Loading