Costa Rica
Last updated: 2025-07-17
Table of References (Click to expand)
Table of Contents (Click to expand)
The medallion architecture is a data design pattern used in data lakes and lakehouses to organize data at different levels of refinement. It's a best practice for managing the data lifecycle and ensuring data quality
| Layer | Description |
|---|---|
| Bronze Layer | This layer contains the raw data ingested from various sources. The data is typically stored in its original format and is append-only. The Bronze layer acts as the landing zone for all incoming data. |
| Silver Layer | This layer contains cleaned and transformed data. The data in the Silver layer is often enriched with additional information and is structured in a way that makes it easier to query and analyze. This layer is also where data quality checks and transformations are applied. |
| Gold Layer | This layer contains curated and aggregated data that is ready for consumption by business intelligence and reporting tools. The data in the Gold layer is highly structured and optimized for performance. |
Note
This demo will be created step by step. Please note that Microsoft Fabric already assists by setting up the medallion flow for you.
Important
If you are not able to see the auto-create report option neither copilot be aware you need to enable AI features in your tenant, click here to see how.
Implementing a medallion architecture provides several benefits:
-
Data Quality: By organizing data into layers, you can apply quality checks and transformations in a structured manner, ensuring that the data in the Gold layer is reliable and ready for analysis.
-
Scalability: The architecture allows you to scale your data processing pipelines independently for each layer, providing flexibility and efficiency.
-
Performance: The Gold layer is optimized for performance, which means that your reporting and analytics queries will run faster.
-
Simplicity: It simplifies the data pipeline by breaking it down into smaller, manageable steps, each with a clear purpose.
-
Auditability: It provides a clear data lineage, making it easier to trace the origin of data and understand the transformations applied at each stage.
-
Create a Fabric Workspace: This will be your central hub for all activities.
-
Create Lakehouses: Set up three lakehouses for the Bronze, Silver, and Gold layers.
- In your Fabric workspace, create three lakehouses named
raw_Bronze,cleansed_Silver, andcurated_Gold.
Go to Data Engineering Create the 3 lakehouses 

- In your Fabric workspace, create three lakehouses named
-
Identify Data Sources:
-
Determine the sources from which you'll ingest data.
-
List all the data sources such as databases, APIs, file systems, etc.
Let's suppose you have your data locally as csv, you can upload your data to the raw layer:
You need to load the data as tables, either to a new table or to a existing one:
Load information as table To a new one or existing one 

-
-
Create Dataflows or Pipelines: Use Data Factory to create dataflows or pipelines that ingest data into the Bronze lakehouse.
-
In Data Factory, create a new pipeline.
-
Add activities to the pipeline to ingest data from the identified sources.
Image you want to extract data from your
sql database
Sample table:
CREATE TABLE dbo.Employees ( EmployeeID INT PRIMARY KEY, FirstName VARCHAR(50), LastName VARCHAR(50), BirthDate DATE, HireDate DATE, JobTitle VARCHAR(50), Salary DECIMAL(10, 4) ); INSERT INTO dbo.Employees (EmployeeID, FirstName, LastName, BirthDate, HireDate, JobTitle, Salary) VALUES (1, 'John', 'Doe', '1985-11-15', '2010-03-10', 'Software Engineer', 75000.0000), (2, 'Jane', 'Smith', '1990-05-22', '2012-07-18', 'Project Manager', 85000.0000), (3, 'Emily', 'Jones', '1988-04-17', '2014-06-25', 'Data Analyst', 65000.0000), (4, 'Michael', 'Brown', '1982-06-21', '2008-09-15', 'HR Specialist', 55000.0000), (5, 'Sarah', 'Davis', '1995-09-30', '2020-11-20', 'Marketing Specialist', 60000.0000);
-
Configure Data Ingestion: Set up the data ingestion process to load data into the Bronze layer in its raw format.
-
- Create Notebooks or Dataflows: Use Fabric's notebooks or dataflows to read data from the Bronze layer.
- Data Cleaning: Apply data cleaning steps to handle missing values, remove duplicates, and correct data types.
- Write code in the notebook to clean the data.
- Use functions like
dropna(),dropDuplicates(), andwithColumn()to clean the data.
- Data Enrichment: Enrich the data with additional information if needed.
- Join the data with reference tables or lookup values.
- Add new columns with enriched information.
- Write to Silver Layer: Write the cleaned and transformed data to the Silver lakehouse.
-
Use the
write.format("delta").save()method to save the data to thecleansed_Silverlakehouse.PySpark Code to Move Data from Bronze to Silver:
# Read data from the Bronze layer bronze_df = spark.read.format("delta").load("abfss://<your-container-name>@<your-storage-account-name>.dfs.core.windows.net/<your-bronze-lakehousename>.Lakehouse/Tables/<table name>") # Perform transformations (if any) silver_df = bronze_df # Assuming no transformations for simplicity # Write data to the Silver layer silver_df.write.mode("overwrite").option("mergeSchema", "true").format("delta").save("abfss://<your-container-name>@<your-storage-account-name>.dfs.core.windows.net/<your-silver-lakehousename>.Lakehouse/Tables/<table_name>")
If you want see more, click here to see a sample of the notebook.
-
- Read Data from Silver Layer: Use notebooks or dataflows to read data from the Silver lakehouse.
- In a new notebook, connect to the
cleansed_Silverlakehouse.
- In a new notebook, connect to the
- Apply Business Logic: Apply any additional business logic or aggregations.
- Write code to perform aggregations and calculations.
- Use functions like
groupBy(),agg(), andsum()to aggregate the data.
- Write to Gold Layer: Write the curated data to the Gold lakehouse.
-
Use the
write.format("delta").save()method to save the data to thecurated_Goldlakehouse.Before any changes:
Applying some transformations: If you want see more, click here to see a sample of the notebook.
PySpark Code to Move Data from Silver to Gold:
# Read data from the Silver layer silver_df = spark.read.format("delta").load("abfss://<your-container-name>@<your-storage-account-name>.dfs.core.windows.net/<your-silver-lakehouse>.Lakehouse/Tables/<table name>") # Perform aggregations gold_df = silver_df.groupBy("Name").agg( sum("Count").alias("TotalCount"), avg("price").alias("AveragePrice"), avg("tax").alias("AverageTax") ) # Write data to the Gold layer gold_df.write.mode("overwrite").option("mergeSchema", "true").format("delta").save("abfss://<your-container-name>@<your-storage-account-name>.dfs.core.windows.net/<your-gold-lakehouse name>.Lakehouse/Tables/<your table name>")
-
- Create Pipelines: Create pipelines to automate the movement of data from the Bronze layer to the Silver layer, and from the Silver layer to the Gold layer.
- In Data Factory, create a new pipeline.
- Add a copy activity to move data from the
raw_Bronzelakehouse to thecleansed_Silverlakehouse. - Add another copy activity to move data from the
cleansed_Silverlakehouse to thecurated_Goldlakehouse.
- Schedule Pipelines: Schedule these pipelines to run at appropriate intervals.
- Set up triggers in Data Factory to run the pipelines on a schedule that aligns with your data freshness requirements.
- Consider the frequency of data updates and the latency that is acceptable for your use case.
-
Configure SQL Analytics Endpoint:
- Validate if you have the SQL Analytics Endpoint configured, you can review it from workspace view, primarly is required for Gold layer to be accessible to your reporting tools.
-
Create a semantic models with the required tables for the report.
-
Create Power BI Reports: Use Power BI in Direct Lake Mode to create reports and dashboards from the data in the Gold layer.
-
Connect to the
curated_Goldlakehouse using the SQL analytics endpoint. -
Build reports and dashboards that provide insights into the data.
Create new report:
If you click
auto-create reportCopilot will create a draft report for you:
You can edit your report:
Also, you can levarage AI capabilities with Copilot:
-














