Skip to content

TrupologDS/data-engineering-projects

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Engineering Portfolio

This repository is a curated portfolio of completed, mentor-reviewed Data Engineering projects.

The original project submissions were cleaned and adapted for public review: course pages, archives, credentials, local paths, and platform-specific instructions were removed, while the actual engineering logic was preserved.

The projects cover SQL data marts, Airflow orchestration, PostgreSQL and Vertica warehouses, Spark batch and streaming jobs, Kafka services, Docker, Kubernetes, and cloud-style object storage ingestion.

Projects

# Project Domain / Use Case Tech Stack Key Topics Link
01 Craft Market DWH and Customer Analytics Data Mart Handmade marketplace analytics PostgreSQL, SQL DWH loading, incremental data mart, indexes Open
02 Sales Mart ETL Orchestration with Airflow Retail sales and retention analytics Airflow, Python, pandas, PostgreSQL, SQL ETL orchestration, API ingestion, dimensional marts Open
03 Courier Settlement DWH Delivery platform courier payouts Airflow, Python, PostgreSQL, MongoDB, SQL Multi-source DWH, STG/DDS/CDM layers, SCD2, settlement mart Open
04 Vertica Group Conversion Analytics Social group conversion analytics Airflow, Python, Vertica, SQL, S3-compatible storage Analytical DB design, Data Vault links/satellites, export workflow Open
05 Geo Data Lake User Marts Location-based social analytics PySpark, Airflow, HDFS, Parquet Data Lake layers, geospatial joins, user/zone/friend marts Open
06 Restaurant Promotion Streaming Service Real-time restaurant campaign targeting Spark Structured Streaming, Kafka, PostgreSQL, Python Stream processing, joins with reference data, Kafka output Open
07 Kafka Order DDS/CDM Microservices Event-driven order analytics Python, Flask, Kafka, PostgreSQL, Docker, Kubernetes Microservices, manual Kafka commits, DDS/CDM marts Open
08 Fintech Global Metrics DWH Financial transaction metrics Airflow, Python, pandas, boto3, Vertica, SQL S3 ingestion, columnar DWH, daily global metrics mart Open

Skills Matrix

Skill 01 02 03 04 05 06 07 08
SQL data modeling X X X X X X
Incremental loading X X X X X X X
Data marts X X X X X X X
Airflow orchestration X X X X X
PostgreSQL X X X X X
Vertica X X
Spark / PySpark X X
Kafka X X
Docker / Kubernetes X
Data quality / validation X X X

Repository Structure

data-engineering-portfolio/
  README.md
  PROJECTS.md
  projects/
    01-craft-market-dwh-customer-datamart/
    ...
    08-fintech-vertica-global-metrics/

Each project folder contains its own README, the cleaned implementation files, and any notes needed to understand excluded data or environment-specific dependencies.

How to Review

Start with PROJECTS.md for a short project-by-project index.

Then open individual project README files for architecture notes, file structure, and realistic run instructions.

Data files, credentials, original archives, and saved course pages are intentionally not included.

Contact

About

Data engineering portfolio with SQL data marts, Airflow, Spark, Kafka, Docker/Kubernetes and warehouse pipelines.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors