AutomationGeoGrid is the first-stage orchestrator of the PTI+ Clima data pipeline. It uses Apache Airflow to run quality control, gap-filling, homogenisation, and gridding of historical AEMET meteorological station observations, producing daily NetCDF grids consumed by AutomationIndices.
All computation is delegated to the lcsc-r-dataflow:latest Docker image, which packages the data_flow R pipeline. Airflow acts purely as a scheduler and dependency manager.
| DAG | Schedule | Purpose |
|---|---|---|
tr_full_dag |
15:00 UTC daily | Production real-time run: QC + data_flow for all 7 variables |
data_flow |
manual | Interactive run of the full pipeline for one variable |
data_flow_all |
manual | Triggers data_flow for all variables in parallel |
quality_control |
manual | Run quality control for one variable group |
AutomationGeoGrid/
├── compose/
│ ├── docker-compose.yaml # Airflow stack (CeleryExecutor, Redis, PostgreSQL, nginx)
│ ├── airflow.sh # Airflow CLI wrapper
│ ├── default.conf # nginx config
│ └── landpage/ # Static web portal
├── dags/
│ ├── DailyDag.py # tr_full_dag — scheduled production run
│ ├── DataFlow.py # data_flow — interactive pipeline
│ ├── DataFlowAll.py # data_flow_all — parallel trigger
│ ├── QualityControl.py # quality_control — interactive QC
│ ├── lcsc_common/ # Shared helpers (df_helper, qc_helper, yaml)
│ └── docker_operator.env # Extra env vars passed to DockerOperator containers
├── data-skel/
│ ├── geogrid/ # Directory scaffold for /media/data/geogrid/
│ └── qualitycontrol/ # Directory scaffold for /media/data/qualitycontrol/
└── scripts/
├── create_env.sh # Generate .env file
├── create_dirs.sh # Create required data directories
├── af_compose.sh # docker compose wrapper
├── af_add_cred.sh # Register Airflow connections (email, git)
├── af_add_bbdd_con.sh # Register BBDD connection
├── af_add_s3_con.sh # Register S3 connection
└── af_add_user.sh # Create Airflow UI user
- Docker and Docker Compose
- Data directories at
/media/data/(or override viaMEDIA_FILESin.env) lcsc-r-dataflow:latestDocker image- Airflow connections:
bbdd-aemet,s3-aemet,email_cfg,git_auth
sh scripts/create_env.sh
# Edit compose/.env to set AIRFLOW_UID, MEDIA_FILES, REPO_FILES, etc.
sh scripts/create_dirs.sh
sh scripts/af_compose.sh up -d
# Wait until healthy (http://localhost:8080)
sh scripts/af_add_cred.sh email email_cfg smtpin.csic.es lcsc@csic.es <password>
sh scripts/af_add_cred.sh generic git_auth github.com <user> <token>
sh scripts/af_add_user.sh aemet <password> aemet@aemet.es Aemet User Viewer
sh scripts/af_add_bbdd_con.sh bbdd-aemet <host> 5432 <user> postgres <password>
sh scripts/af_add_s3_con.sh s3-aemet <host> 9000 <bucket> <access_key> <secret>AEMET station data
│
quality_control ──── /media/data/qualitycontrol/
│
data_flow ──── /media/data/geogrid/
(pp → gf → hg → gr)
│
copy to indices ──── /media/data/indices/data_raw/
+ status.yaml
See docs/full_documentation.md for complete DAG descriptions, task-level flow diagrams, environment variable reference, and shared filesystem layout.