Skip to content

feat(sportsbooks): dockerized Airflow + Postgres sportsbook webscraper pipeline#4

Open
omkar055 wants to merge 24 commits into
mainfrom
feat/sportsbooks_webscraper
Open

feat(sportsbooks): dockerized Airflow + Postgres sportsbook webscraper pipeline#4
omkar055 wants to merge 24 commits into
mainfrom
feat/sportsbooks_webscraper

Conversation

@omkar055

Copy link
Copy Markdown
Collaborator

Initial implementation of the Airflow ETL pipeline to scrape various nba sportsbook data
Includes Airflow + Postgres Docker setup and NBA webscraper DAG
With this setup, hopefully everyone can run and access the Airflow pipeline and check Postgres to collect and access NBA player line data

This PR migrates the standalone sportsbook scraping pipeline from the Data team's webscrape repository into the TransformerPredictionModel repository

For review and testing — not for merge yet.

How it was tested

  • Built and ran Docker containers locally (docker compose up --build)
  • Verified Airflow web UI runs on http://localhost:8080
  • Confirmed successful data scraping from APIs (BettingPros, PrizePicks, DraftEdge)
  • Checked Postgres database for inserted data (SELECT COUNT(*) FROM player_lines;)
    • Successfully triggered Airflow DAG manually and validated 4,653 records inserted

Comment thread src/sportsbook_webscraper_pipeline/requirements.txt Fixed
Comment thread src/sportsbook_webscraper_pipeline/requirements.txt Fixed
@omkar055 omkar055 marked this pull request as ready for review November 12, 2025 01:26
@omkar055 omkar055 marked this pull request as draft November 12, 2025 01:27
@omkar055 omkar055 changed the title feat: Dockerized Airflow + Postgres sportsbook webscraper pipeline feat: dockerized Airflow + Postgres sportsbook webscraper pipeline Nov 13, 2025
@omkar055 omkar055 force-pushed the feat/sportsbooks_webscraper branch from 5a96c73 to c5847cf Compare November 13, 2025 05:21
@omkar055 omkar055 marked this pull request as ready for review November 14, 2025 09:22
@omkar055 omkar055 changed the title feat: dockerized Airflow + Postgres sportsbook webscraper pipeline feat(sportsbooks): dockerized Airflow + Postgres sportsbook webscraper pipeline Nov 15, 2025
Comment thread src/sportsbook_webscraper_pipeline/requirements.txt Outdated
Comment thread src/sportsbook_webscraper_pipeline/airflow/dags/migrate_to_postgres.py Outdated
Comment thread src/sportsbook_webscraper_pipeline/airflow/dags/migrate_to_postgres.py Outdated
Comment thread src/sportsbook_webscraper_pipeline/airflow/dags/migrate_to_postgres.py Outdated
metadata = MetaData()
table = Table(table_name, metadata, autoload_with=db_eng)

with db_eng.begin() as conn: # transaction automatically commits

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may want to include a verifier to make sure the commit was made correctly

thoughts on adding something to erase part of a commit if the entire thing failed for some reason halfway through?

Comment thread src/sportsbook_webscraper_pipeline/airflow/dags/nba_props_dag.py Outdated
Comment thread src/sportsbook_webscraper_pipeline/airflow/dags/nba_props_dag.py Outdated
Comment thread src/sportsbook_webscraper_pipeline/airflow/dags/nba_props_dag.py
Comment thread src/sportsbook_webscraper_pipeline/airflow/dags/nba_props_dag.py Outdated

@JonathanPLev JonathanPLev left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall good work, just a few changes. logic looks great. could use less comments that explain basic functions (maybe less what does it do and more of why does it do it, if it needs that comment). basically think about like do i know why this is doing x thing and can i easily understand it? if the answer is no then you should leave a comment

Comment thread src/sportsbook_webscraper_pipeline/docker-compose.yml Outdated
Comment thread src/sportsbook_webscraper_pipeline/README.md
@JonathanPLev

Copy link
Copy Markdown
Owner

please add a schema for the database, and then make sure thats included in your docker compose file so the database tables are created when you create the database, and then provide commands in the readme to rollback (Sql down) the table creation or create them again if something happens (sql up)

Comment thread src/sportsbook_webscraper_pipeline/README.md Outdated
@omkar055 omkar055 requested a review from JonathanPLev March 5, 2026 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants