ForexFactoryScrapper is a small Flask-based API that exposes scraping logic for several economic-calendar sources (ForexFactory, CryptoCraft, EnergyExch, MetalsMine).
What this repository provides:
- Flask HTTP API endpoints returning JSON (or HTML for the root page)
- Per-site scrapers under
src/scrapper/(site-specific logic) - Simple test-suite using
pytestundertests/ - A minimal OpenAPI spec (served at
/openapi.json) and a Swagger UI at/swagger
- Create and activate a virtual environment:
python -m venv .venv
source .venv/bin/activate- Install dependencies:
pip install -r requirements.txt- Run the app locally:
python main.py
# or
python src/app.pyBy default the app listens on 0.0.0.0:5000. You can configure HOST, PORT and DEBUG via environment variables or a .env file (the app uses python-dotenv if present).
Open the welcome page in your browser: http://localhost:5000/
Open API docs: http://localhost:5000/swagger
Open raw OpenAPI JSON: http://localhost:5000/openapi.json
- GET
/— Welcome HTML page (quick links) - GET
/api/hello— simple hello response - GET
/api/health— quick health check - GET
/api/forex/daily— ForexFactory daily events (query params:day,month,year, optionallimit,offset) - GET
/api/forex/sitemaps— ForexFactory sitemap URLs (optionalstart_date,end_date,limit,offset,max_pages) - GET
/api/cryptocraft/daily— CryptoCraft daily events (same parameters) - GET
/api/energyexch/daily— EnergyExch daily events (same parameters) - GET
/api/metalsmine/daily— MetalsMine daily events (same parameters) - GET
/api/bundle— Combined economic events from multiple sources within a date range (see below)
All /.../daily endpoints follow the same validation and paging semantics:
- Required query parameters:
day,month,year(integers) - Optional
limitandoffset(integers, >= 0) - On success, list results are wrapped in a pagination object:
{ total, offset, limit, results }. - On parameter validation error, endpoints return HTTP 400 with JSON:
{ "error": "..." }.
Fetches ForexFactory sitemap-index and child sitemaps to retrieve a paginated list of URLs:
- Optional
start_dateandend_date(ISO format:YYYY-MM-DD) — filters results by sitemap lastmod date - Optional
limitandoffset(integers, >= 0) — standard paging - Optional
max_pages(integer, >= 1, default 10) — limits number of child sitemaps to scan - Returns:
{ total, offset, limit, results }where each result is{ url, lastmod: date_or_null } - Example:
GET /api/forex/sitemaps?start_date=2026-05-15&max_pages=5
Fetches combined economic events from multiple sources within a date range:
- Required query parameters:
start_date(ISO format:YYYY-MM-DD) — Start date (inclusive)end_date(ISO format:YYYY-MM-DD) — End date (inclusive)
- Optional query parameters:
sources(comma-separated string, default:forex) — Sources to include:forex,crypto,metal,energylimit(integer, >= 0) — Max number of results to returnoffset(integer, >= 0) — Number of records to skip
- Returns:
{ total, offset, limit, start_date, end_date, sources, source_breakdown, results }- Each result includes
_source(which source it came from) and_date(which date it was fetched for) source_breakdownshows count of records per source
- Each result includes
- Examples:
GET /api/bundle?start_date=2026-05-20&end_date=2026-05-25— forex events for May 20-25GET /api/bundle?sources=forex,crypto&start_date=2026-05-20&end_date=2026-05-21&limit=50— forex and crypto events, max 50 results
- The OpenAPI document is available at
/openapi.jsonand is generated fromsrc/openapi_spec.py. - The interactive Swagger UI is served at
/swaggerand uses the OpenAPI JSON. If your environment blocks external CDN assets, the UI falls back to an inline minimal page.
If you update endpoints or schemas, please update src/openapi_spec.py accordingly so the docs stay accurate.
HOST— host to bind (default0.0.0.0)PORT— port to bind (default5000)DEBUG— debug mode (defaultTrue)DOTENV_PATH— optional path to a.envfile
Run tests with:
python -m pytest -qTests are under tests/ and use pytest and the Flask test client. Many tests monkeypatch src.app and main to avoid network calls.
A Dockerfile is provided for convenience; if you prefer to run inside Docker, build and run the image as usual (adjust ports as needed).
Contributions welcome. Suggested workflow:
- Create a branch for your change
- Add tests for any behavior you modify
- Run the full test suite
- Open a pull request describing the change
If you modify or add a new scraper under src/scrapper/, try to keep the get_records(url) and get_url(day, month, year, timeline) function signatures so the route helpers can call them interchangeably.
Code of Conduct: Please read CODE_OF_CONDUCT.md before contributing — it describes expected behaviour and reporting contacts.
Maintainer: Ata Can — atacanymc@gmail.com
If you want, I can also generate a short CONTRIBUTING.md or add CI steps to run lint/tests automatically on PRs. Let me know what else to update.