This project implements a robust Medallion Architecture using SQL Server to transform raw CRM and ERP data into a structured Gold Layer for business intelligence.
The project follows a three-tier structure to ensure data lineage and quality:
- Bronze Layer: Raw data ingestion via
BULK INSERTfrom CSV files. - Silver Layer: Data cleaning, de-duplication, and standardization.
- Gold Layer: Dimensional modeling (Star Schema) with established relationships.
Raw data is loaded from a Docker-mounted volume into staging tables.
- CRM Data: Customer info, Product info, and Sales details.
- ERP Data: Category mapping, Location info, and additional Customer metadata.
This layer transforms messy raw data into a "Source of Truth".
- De-duplication: Used
ROW_NUMBER()to keep only the most recent record based oncst_create_date. - Standardization: Converted codes (M/F, S/M) into readable labels (Male/Female, Single/Married).
- Cross-Platform Fixes: Explicitly handled CRLF/LF issues by removing hidden
CHAR(13)andCHAR(10)characters that interfere with joins on Mac/Unix environments. - Business Logic: Derived product end-dates using the
LEAD()function for SCD Type 2 tracking.
The final layer creates a Star Schema for reporting.
- Dimensions:
dim_customersanddim_productscombine multiple sources into single, clean entities. - Facts:
fact_saleslinks transactions to dimensions using surrogate keys.
To ensure data consistency, the following relationships are established in the model:
- Fact Sales → Dim Products: Linked via
product_key(FK). - Fact Sales → Dim Customers: Linked via
customer_key(FK). - Dim Products → Category: Linked via
category_id.
- DBMS: Azure SQL Edge (Docker).
- IDE: Azure Data Studio.
- Language: T-SQL.