My Journey from Data Analyst to Data Engineer
-
Updated
Apr 7, 2026
My Journey from Data Analyst to Data Engineer
A Python-based ETL pipeline to clean, deduplicate, and structure raw shipping data
Production-grade, serverless AWS data pipeline simulating large-scale autonomous-vehicle telemetry-processing at fleet scale. This repository demonstrates end-to-end ingestion, Distributed Stream Triage, Columnar Storage Optimization with Apache Parquet, and Data Lakehouse Partitioning — with future Terraform Infrastructure as Code (IaC) modules.
Add a description, image, and links to the python-data-engineering-etl-data-cleaning topic page so that developers can more easily learn about it.
To associate your repository with the python-data-engineering-etl-data-cleaning topic, visit your repo's landing page and select "manage topics."