Skip to content

PTI-Clima/AutomationGeoGrid

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AutomationGeoGrid

AutomationGeoGrid is the first-stage orchestrator of the PTI+ Clima data pipeline. It uses Apache Airflow to run quality control, gap-filling, homogenisation, and gridding of historical AEMET meteorological station observations, producing daily NetCDF grids consumed by AutomationIndices.

All computation is delegated to the lcsc-r-dataflow:latest Docker image, which packages the data_flow R pipeline. Airflow acts purely as a scheduler and dependency manager.

DAGs

DAG Schedule Purpose
tr_full_dag 15:00 UTC daily Production real-time run: QC + data_flow for all 7 variables
data_flow manual Interactive run of the full pipeline for one variable
data_flow_all manual Triggers data_flow for all variables in parallel
quality_control manual Run quality control for one variable group

Repository structure

AutomationGeoGrid/
├── compose/
│   ├── docker-compose.yaml   # Airflow stack (CeleryExecutor, Redis, PostgreSQL, nginx)
│   ├── airflow.sh            # Airflow CLI wrapper
│   ├── default.conf          # nginx config
│   └── landpage/             # Static web portal
├── dags/
│   ├── DailyDag.py           # tr_full_dag — scheduled production run
│   ├── DataFlow.py           # data_flow — interactive pipeline
│   ├── DataFlowAll.py        # data_flow_all — parallel trigger
│   ├── QualityControl.py     # quality_control — interactive QC
│   ├── lcsc_common/          # Shared helpers (df_helper, qc_helper, yaml)
│   └── docker_operator.env   # Extra env vars passed to DockerOperator containers
├── data-skel/
│   ├── geogrid/              # Directory scaffold for /media/data/geogrid/
│   └── qualitycontrol/       # Directory scaffold for /media/data/qualitycontrol/
└── scripts/
    ├── create_env.sh         # Generate .env file
    ├── create_dirs.sh        # Create required data directories
    ├── af_compose.sh         # docker compose wrapper
    ├── af_add_cred.sh        # Register Airflow connections (email, git)
    ├── af_add_bbdd_con.sh    # Register BBDD connection
    ├── af_add_s3_con.sh      # Register S3 connection
    └── af_add_user.sh        # Create Airflow UI user

Prerequisites

  • Docker and Docker Compose
  • Data directories at /media/data/ (or override via MEDIA_FILES in .env)
  • lcsc-r-dataflow:latest Docker image
  • Airflow connections: bbdd-aemet, s3-aemet, email_cfg, git_auth

Setup

sh scripts/create_env.sh
# Edit compose/.env to set AIRFLOW_UID, MEDIA_FILES, REPO_FILES, etc.
sh scripts/create_dirs.sh

sh scripts/af_compose.sh up -d
# Wait until healthy (http://localhost:8080)

sh scripts/af_add_cred.sh email email_cfg smtpin.csic.es lcsc@csic.es <password>
sh scripts/af_add_cred.sh generic git_auth github.com <user> <token>
sh scripts/af_add_user.sh aemet <password> aemet@aemet.es Aemet User Viewer
sh scripts/af_add_bbdd_con.sh bbdd-aemet <host> 5432 <user> postgres <password>
sh scripts/af_add_s3_con.sh s3-aemet <host> 9000 <bucket> <access_key> <secret>

Data flow

AEMET station data
       │
  quality_control  ──── /media/data/qualitycontrol/
       │
   data_flow        ──── /media/data/geogrid/
   (pp → gf → hg → gr)
       │
  copy to indices ──── /media/data/indices/data_raw/
  + status.yaml

Further documentation

See docs/full_documentation.md for complete DAG descriptions, task-level flow diagrams, environment variable reference, and shared filesystem layout.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors