Fix CDN 403 by downloading PDFs via curl_cffi before tabula parsing by btg94 · Pull Request #10 · mxufc29/nbainjuries

btg94 · 2026-03-10T14:56:32Z

Summary

The NBA CDN (Akamai) now blocks all programmatic HTTP requests with 403 Forbidden — both requests.get() and tabula.read_pdf() from URL fail
This uses curl_cffi with impersonate='chrome' to bypass Akamai's TLS fingerprinting
PDFs are downloaded to temp files, parsed locally by tabula, then cleaned up
Falls back to requests/aiohttp when curl_cffi is not installed

Changes

_parser.py: New _download_pdf_bytes() helper using curl_cffi. Updated validate_injrepurl() and extract_injrepurl() to download-then-parse-locally via temp files
_parser_asy.py: New _download_pdf_bytes_sync() for use with asyncio.to_thread(). Updated validate_irurl_async() and extract_irurl_async() with the same temp file pattern
pyproject.toml: Added curl_cffi>=0.7,<0.14 as a dependency
tests/test_cdn_bypass.py: 30 new tests (26 pass, 4 skip pending async helper rename)

Design decisions

curl_cffi is a core dependency (not optional) since 100% of remote fetches are broken without it
All function signatures are backwards-compatible
**kwargs pattern for custom headers still works
Temp files cleaned up in finally blocks to prevent leaks

Fixes #6

Test plan

_download_pdf_bytes uses curl_cffi with impersonate='chrome'
Falls back to requests when curl_cffi unavailable
extract_injrepurl passes local temp path to tabula (not URL)
Temp files cleaned up on both success and error paths
Async path uses asyncio.to_thread for blocking curl_cffi calls
Async falls back to aiohttp when curl_cffi unavailable
Verified end-to-end: successfully fetched 16,274 injury report entries for 2025-26 season

🤖 Generated with Claude Code

…parsing The NBA CDN (Akamai) now blocks all programmatic HTTP requests — both requests.get() and tabula.read_pdf() from URL return 403 Forbidden. This uses curl_cffi with impersonate='chrome' to bypass Akamai's TLS fingerprinting. PDFs are downloaded to temp files and parsed locally by tabula, then cleaned up. Falls back to requests/aiohttp when curl_cffi is not installed. Fixes mxufc29#6 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

alexanderjulianmartinez · 2026-04-01T19:40:16Z

Wondering if or when this will be shipped? 👀 cc: @mxufc29

Copilot AI mentioned this pull request Apr 10, 2026

Fix NBA CDN 403: use curl_cffi browser impersonation in nba_injury_pdf fetcher pbmconsulting-hub/SmartAI-NBA#250

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix CDN 403 by downloading PDFs via curl_cffi before tabula parsing#10

Fix CDN 403 by downloading PDFs via curl_cffi before tabula parsing#10
btg94 wants to merge 1 commit into
mxufc29:mainfrom
btg94:fix/curl-cffi-cdn-bypass

btg94 commented Mar 10, 2026

Uh oh!

alexanderjulianmartinez commented Apr 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

btg94 commented Mar 10, 2026

Summary

Changes

Design decisions

Test plan

Uh oh!

alexanderjulianmartinez commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alexanderjulianmartinez commented Apr 1, 2026 •

edited

Loading