This repository is a curated portfolio of completed, mentor-reviewed Data Engineering projects.
The original project submissions were cleaned and adapted for public review: course pages, archives, credentials, local paths, and platform-specific instructions were removed, while the actual engineering logic was preserved.
The projects cover SQL data marts, Airflow orchestration, PostgreSQL and Vertica warehouses, Spark batch and streaming jobs, Kafka services, Docker, Kubernetes, and cloud-style object storage ingestion.
| # | Project | Domain / Use Case | Tech Stack | Key Topics | Link |
|---|---|---|---|---|---|
| 01 | Craft Market DWH and Customer Analytics Data Mart | Handmade marketplace analytics | PostgreSQL, SQL | DWH loading, incremental data mart, indexes | Open |
| 02 | Sales Mart ETL Orchestration with Airflow | Retail sales and retention analytics | Airflow, Python, pandas, PostgreSQL, SQL | ETL orchestration, API ingestion, dimensional marts | Open |
| 03 | Courier Settlement DWH | Delivery platform courier payouts | Airflow, Python, PostgreSQL, MongoDB, SQL | Multi-source DWH, STG/DDS/CDM layers, SCD2, settlement mart | Open |
| 04 | Vertica Group Conversion Analytics | Social group conversion analytics | Airflow, Python, Vertica, SQL, S3-compatible storage | Analytical DB design, Data Vault links/satellites, export workflow | Open |
| 05 | Geo Data Lake User Marts | Location-based social analytics | PySpark, Airflow, HDFS, Parquet | Data Lake layers, geospatial joins, user/zone/friend marts | Open |
| 06 | Restaurant Promotion Streaming Service | Real-time restaurant campaign targeting | Spark Structured Streaming, Kafka, PostgreSQL, Python | Stream processing, joins with reference data, Kafka output | Open |
| 07 | Kafka Order DDS/CDM Microservices | Event-driven order analytics | Python, Flask, Kafka, PostgreSQL, Docker, Kubernetes | Microservices, manual Kafka commits, DDS/CDM marts | Open |
| 08 | Fintech Global Metrics DWH | Financial transaction metrics | Airflow, Python, pandas, boto3, Vertica, SQL | S3 ingestion, columnar DWH, daily global metrics mart | Open |
| Skill | 01 | 02 | 03 | 04 | 05 | 06 | 07 | 08 |
|---|---|---|---|---|---|---|---|---|
| SQL data modeling | X | X | X | X | X | X | ||
| Incremental loading | X | X | X | X | X | X | X | |
| Data marts | X | X | X | X | X | X | X | |
| Airflow orchestration | X | X | X | X | X | |||
| PostgreSQL | X | X | X | X | X | |||
| Vertica | X | X | ||||||
| Spark / PySpark | X | X | ||||||
| Kafka | X | X | ||||||
| Docker / Kubernetes | X | |||||||
| Data quality / validation | X | X | X |
data-engineering-portfolio/
README.md
PROJECTS.md
projects/
01-craft-market-dwh-customer-datamart/
...
08-fintech-vertica-global-metrics/
Each project folder contains its own README, the cleaned implementation files, and any notes needed to understand excluded data or environment-specific dependencies.
Start with PROJECTS.md for a short project-by-project index.
Then open individual project README files for architecture notes, file structure, and realistic run instructions.
Data files, credentials, original archives, and saved course pages are intentionally not included.
- GitHub: TrupologDS
- Email: sorocawrk@outlook.com