-
Notifications
You must be signed in to change notification settings - Fork 0
feat(sportsbooks): dockerized Airflow + Postgres sportsbook webscraper pipeline #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
omkar055
wants to merge
24
commits into
main
Choose a base branch
from
feat/sportsbooks_webscraper
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
24 commits
Select commit
Hold shift + click to select a range
7b26c9a
feat(sportsbooks): dockerized Airflow + Postgres pipeline
omkar055 1c986b7
chore(sportsbooks): delete old requirements.txt
omkar055 c5847cf
chore(sportsbooks): update pyproject.toml with requirements
omkar055 ccd71ea
chore(sportsbooks): modify requests to >=2.32.3 for nba-api
omkar055 ff50ec5
chore(sportsbooks): modify pandas to pandas>=2.2.0 for nba-api
omkar055 03c807f
chore(sportsbooks): syntax error :(
omkar055 5c7f96b
chore(sportsbook): autofixed issues using just fix-python
omkar055 f92bc19
fix: remove old test_scripts to fix build
rhtruong 0cd7c9d
fix(sportsbooks): removed unused variables
rhtruong 9b0b617
fix(sportsbooks): moved import to top of file
rhtruong 38c7919
fix(sportsbook): fix lint errors in DAG + prizepicks script
omkar055 c2a416a
chore(sportsbook): resolved requested changes in pr
omkar055 4435edc
chore(sportsbook): more requested changes resolved + env changes
omkar055 1c0f17f
fix(sportsbook): forgot to actually load the env variable (brain fog)
omkar055 c4d27ea
chore(sportsbook): branch cleanup
omkar055 1288745
feat(sportsbook): sql schema files
omkar055 ea0dea9
feat(sportsbook): sql schema files migration
omkar055 f41ba0b
Merge branch 'main' into feat/sportsbooks_webscraper
JonathanPLev 24d8af9
refactor(sportsbook): moved hard-coded numbers to yml
rhtruong efbac56
feat(sportsbook): added db verifier
rhtruong b2788e0
fix(sportsbook): fixing urllib3 trivy error
rhtruong df34559
Merge branch 'main' into feat/sportsbooks_webscraper
rhtruong 8cebc92
Merge branch 'main' into feat/sportsbooks_webscraper
rhtruong 98abe27
fix(sportsbook): airflow creds hidden
omkar055 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,13 @@ | ||
| # Postgres | ||
| POSTGRES_USER= | ||
| POSTGRES_PASSWORD= | ||
| POSTGRES_DB= | ||
| POSTGRES_EXTERNAL_PORT=5433 | ||
|
|
||
| # Airflow | ||
| AIRFLOW_ADMIN_USERNAME= | ||
| AIRFLOW_ADMIN_PASSWORD= | ||
| AIRFLOW_PORT=8080 | ||
|
|
||
| #API links | ||
| BETTINGPROS_API_URL= |
Binary file not shown.
Binary file not shown.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,19 @@ | ||
| FROM apache/airflow:3.0.0-python3.10 | ||
|
|
||
| USER root | ||
| RUN apt-get update && apt-get install -y git curl && rm -rf /var/lib/apt/lists/* | ||
|
|
||
| USER airflow | ||
| WORKDIR /opt/airflow | ||
|
|
||
| COPY pyproject.toml . | ||
| COPY uv.lock . | ||
| RUN pip install uv && uv pip install --system . | ||
|
|
||
| COPY airflow/dags /opt/airflow/dags | ||
| COPY airflow_pipeline /opt/airflow/airflow_pipeline | ||
|
|
||
| ENV PYTHONPATH="/opt/airflow/airflow_pipeline:${PYTHONPATH}" | ||
| ENV AIRFLOW_HOME=/opt/airflow | ||
|
|
||
| CMD ["airflow", "standalone"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,73 @@ | ||
| # webscrape | ||
| Dockerized Airflow pipeline that scrapes NBA player prop lines using scripts under airflow_pipeline/api_scripts, normalizes and appends the results into Postgres on docker | ||
| itself. Airflow scheduled to run the nba_props_dag daily. | ||
|
|
||
| --- | ||
|
|
||
| ## Prerequisites | ||
| - Docker Desktop (or Docker Engine) with Compose v2 | ||
| - Optional: Python 3.10+ if you want to run the scraper scripts locally | ||
|
|
||
| --- | ||
|
|
||
| ## Quick start (Docker) | ||
|
|
||
| 1. **Configure** | ||
| ```bash | ||
| git clone <repo> | ||
| cd webscrape | ||
| ``` | ||
| The Airflow container reads DB credentials from the environment (`DB_USERNAME`, `DB_PASSWORD`, `DB_HOST`, `DB_PORT`, `DB_NAME`). Defaults target the Postgres service defined in `docker-compose.yml` (`line_dancer / sportsbook_data @ postgres:5432 / nba_deeplearning`). Update either `.env` or the compose file if you need different values. | ||
|
|
||
| 2. **Build and launch** | ||
| ```bash | ||
| docker compose down # don't dompose down -v: will wipe old containers/volumes, as | ||
| # well as the postgres database | ||
| docker compose up -d # no need to build | ||
|
|
||
| docker ps # check container status | ||
|
|
||
| ``` | ||
| This starts: | ||
| - `postgres`: stores the scraped data (volume `postgres_data` keeps rows between runs, exposed on `localhost:5433`) | ||
| - `airflow`: runs Airflow 3 with SequentialExecutor + SQLite metadata, but connects to the Postgres service for ETL output | ||
|
|
||
| 3. **Access Airflow** | ||
| - UI: http://localhost:8080 | ||
| - Credentials: `admin` and {password} (set in `docker-compose.yml`; rotate before sharing externally) | ||
|
|
||
| 4. **Trigger the DAG** | ||
| - In the UI, unpause `nba_sportsbook_pipeline` and click “Trigger DAG” | ||
| - Or via CLI: | ||
| ```bash | ||
| docker compose exec airflow airflow dags test nba_sportsbook_pipeline $(date +%Y-%m-%d) | ||
| ``` | ||
| The DAG imports `migrate_to_postgres.py`, which runs the BettingPros, PrizePicks, and DraftEdge scrapers, validates the schema, and appends rows into the `player_lines` table inside Postgres. | ||
|
|
||
| 5. **Inspect the database** | ||
| ```bash | ||
| psql -h localhost -p 5433 -U line_dancer -d nba_deeplearning | ||
| \dt | ||
| SELECT COUNT(*) FROM player_lines; | ||
|
|
||
|
|
||
| docker exec -it webscrape-postgres-1 psql -U line_dancer -d nba_deeplearning; | ||
| ``` | ||
|
|
||
| 6. **Stop / clean up** | ||
| ```bash | ||
| docker compose down # keep scraped data | ||
| docker compose down -v # remove containers + Postgres volume // DON'T DO THIS | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Local development (optional) | ||
| - Install dependencies: `pip install -r requirements.txt` | ||
| - Export the same `DB_*` vars as the container and run `python airflow/dags/migrate_to_postgres.py` to append data without Airflow. | ||
|
|
||
| --- | ||
|
|
||
| ## Notes | ||
| - Airflow metadata stays on SQLite (default) so Postgres holds only the scraper output. | ||
| - Update secrets (`DB_PASSWORD`, Airflow admin password) before committing or sharing the project. | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.