Costa Rica
Last updated: 2025-07-17
List of References (Click to expand)
Table of Contents (Click to expand)
The medallion architecture is a data design pattern used in data lakes and lakehouses to organize data at different levels of refinement. It's a best practice for managing the data lifecycle and ensuring data quality
| Layer | Description |
|---|---|
| Bronze Layer | This layer contains the raw data ingested from various sources. The data is typically stored in its original format and is append-only. The Bronze layer acts as the landing zone for all incoming data. |
| Silver Layer | This layer contains cleaned and transformed data. The data in the Silver layer is often enriched with additional information and is structured in a way that makes it easier to query and analyze. This layer is also where data quality checks and transformations are applied. |
| Gold Layer | This layer contains curated and aggregated data that is ready for consumption by business intelligence and reporting tools. The data in the Gold layer is highly structured and optimized for performance. |
Note
This demo will be created step by step. Please note that Microsoft Fabric already assists by setting up the medallion flow for you.
Important
If you are not able to see the auto-create report option neither copilot be aware you need to enable AI features in your tenant, click here to see how
Implementing a medallion architecture provides several benefits:
-
Data Quality: By organizing data into layers, you can apply quality checks and transformations in a structured manner, ensuring that the data in the Gold layer is reliable and ready for analysis.
-
Scalability: The architecture allows you to scale your data processing pipelines independently for each layer, providing flexibility and efficiency.
-
Performance: The Gold layer is optimized for performance, which means that your reporting and analytics queries will run faster.
-
Simplicity: It simplifies the data pipeline by breaking it down into smaller, manageable steps, each with a clear purpose.
-
Auditability: It provides a clear data lineage, making it easier to trace the origin of data and understand the transformations applied at each stage.
Note
If you need more visual help, check out the video that walks you through the steps. Below is a detailed step-by-step guide.
-
Create a Fabric Workspace: This will be your central hub for all activities.
-
Click on
Create a resourceand create your Fabric Capacity if you haven't done so already.
-
Go to the Fabric portal.
-
Click on
Workspaces, then select either your existing workspace or create a new one by clickingNew Workspace: -
Provide a name and other required details, then create the workspace.
-
Now, assign the Fabric Capacity to your workspace by clicking on
Workspace settingsand selecting the fabric capacity under the license.
fabricCapacity.mp4
-
-
Create Lakehouses: Set up three lakehouses for the Bronze, Silver, and Gold layers.
- Identify Data Sources:
-
Determine the sources from which you'll ingest data.
-
List all the data sources such as databases, APIs, file systems, etc.
Let's suppose you have your data locally as csv, you can upload your data to the raw layer. Click to access sample files in case you don't have any data at the moment
You need to load the data as tables, either to a new table or to a existing one:
Load information as table To a new one or existing one 

IngestDataIntoBronzeLayer.mp4
-
Suppose you need to extract data from your
sql database
Note
In case you don't have any data at the moment, please follow this video to create a sample:
CREATE TABLE dbo.Employees (
EmployeeID INT PRIMARY KEY,
FirstName VARCHAR(50),
LastName VARCHAR(50),
BirthDate DATE,
HireDate DATE,
JobTitle VARCHAR(50),
Salary DECIMAL(10, 4)
);
INSERT INTO dbo.Employees (EmployeeID, FirstName, LastName, BirthDate, HireDate, JobTitle, Salary)
VALUES
(1, 'John', 'Doe', '1985-11-15', '2010-03-10', 'Software Engineer', 75000.0000),
(2, 'Jane', 'Smith', '1990-05-22', '2012-07-18', 'Project Manager', 85000.0000),
(3, 'Emily', 'Jones', '1988-04-17', '2014-06-25', 'Data Analyst', 65000.0000),
(4, 'Michael', 'Brown', '1982-06-21', '2008-09-15', 'HR Specialist', 55000.0000),
(5, 'Sarah', 'Davis', '1995-09-30', '2020-11-20', 'Marketing Specialist', 60000.0000);ConnectSQLServerCreateTable.mp4
Important
Besides using Data pipelines to bring your SQL information, you can also leverage Microsoft Fabric's mirrored SQL capability. This feature allows you to create a mirrored copy of your SQL database, improving data availability, reliability, and disaster recovery. By maintaining a synchronized copy of your database in a different location, it ensures that your data is always accessible, even in the event of a failure or outage.
Note
The mirroring process can involve both inbound and outbound connections. Inbound connections refer to data coming into Azure from external sources, such as an on-premises SQL database being mirrored in Azure. Outbound connections, on the other hand, refer to data going out from Azure to external destinations, like mirroring an Azure SQL database to Microsoft Fabric. This setup allows for seamless data flow and integration across different platforms, ensuring data consistency and availability.
For example, both Azure SQL Database and Microsoft Fabric are Microsoft products. However, the concept of outbound connections still applies because the data is moving from one service (Azure SQL Database) to another service (Microsoft Fabric), even though they are both within the Microsoft ecosystem. This movement of data is considered outbound because it is leaving the Azure SQL Database environment and entering the Microsoft Fabric environment.
Under the Zero Trust Architecture, both inbound and outbound connections are treated with the same level of scrutiny and security protocols. This means that whether the connection is inbound or outbound, it is subject to strict verification processes to ensure it is safe and authorized. Key principles of Zero Trust include verification of every access request, least privilege access, continuous monitoring, and micro-segmentation. By applying these principles, Azure ensures that both inbound and outbound connections are secure, reducing the risk of unauthorized access and data breaches.
FabricMirroringSQL.mp4
- Create Dataflows or Pipelines: Use Data Factory to create dataflows or pipelines that ingest data into the Bronze lakehouse.
- In Data Factory, create a new pipeline.
- Add activities to the pipeline to ingest data from the identified sources.
- Configure Data Ingestion: Set up the data ingestion process to load data into the Bronze layer in its raw format.
- Create Notebooks or Dataflows: Use Fabric's notebooks or dataflows to read data from the Bronze layer.
- Data Cleaning: Apply data cleaning steps to handle missing values, remove duplicates, and correct data types.
- Write code in the notebook to clean the data.
- Use functions like
dropna(),dropDuplicates(), andwithColumn()to clean the data.
- Data Enrichment: Enrich the data with additional information if needed.
- Join the data with reference tables or lookup values.
- Add new columns with enriched information.
- Write to Silver Layer: Write the cleaned and transformed data to the Silver lakehouse.
-
Use the
write.format("delta").save()method to save the data to thecleansed_Silverlakehouse.PySpark Code to Move Data from Bronze to Silver:
# Read data from the Bronze layer bronze_df = spark.read.format("delta").load("abfss://<your-container-name>@<your-storage-account-name>.dfs.core.windows.net/<your-bronze-lakehousename>.Lakehouse/Tables/<table name>") # Perform transformations (if any) silver_df = bronze_df # Assuming no transformations for simplicity # Write data to the Silver layer silver_df.write.mode("overwrite").option("mergeSchema", "true").format("delta").save("abfss://<your-container-name>@<your-storage-account-name>.dfs.core.windows.net/<your-silver-lakehousename>.Lakehouse/Tables/<table_name>")
If you want see more, click here to see a sample of the notebook
-
- Read Data from Silver Layer: Use notebooks or dataflows to read data from the Silver lakehouse.
- In a new notebook, connect to the
cleansed_Silverlakehouse.
- In a new notebook, connect to the
- Apply Business Logic: Apply any additional business logic or aggregations.
- Write code to perform aggregations and calculations.
- Use functions like
groupBy(),agg(), andsum()to aggregate the data.
- Write to Gold Layer: Write the curated data to the Gold lakehouse.
-
Use the
write.format("delta").save()method to save the data to thecurated_Goldlakehouse.Before any changes:
Applying some transformations: If you want see more, click here to see a sample of the notebook
PySpark Code to Move Data from Silver to Gold:
# Read data from the Silver layer silver_df = spark.read.format("delta").load("abfss://<your-container-name>@<your-storage-account-name>.dfs.core.windows.net/<your-silver-lakehouse>.Lakehouse/Tables/<table name>") # Perform aggregations gold_df = silver_df.groupBy("Name").agg( sum("Count").alias("TotalCount"), avg("price").alias("AveragePrice"), avg("tax").alias("AverageTax") ) # Write data to the Gold layer gold_df.write.mode("overwrite").option("mergeSchema", "true").format("delta").save("abfss://<your-container-name>@<your-storage-account-name>.dfs.core.windows.net/<your-gold-lakehouse name>.Lakehouse/Tables/<your table name>")
-
- Create Pipelines: Create pipelines to automate the movement of data from the Bronze layer to the Silver layer, and from the Silver layer to the Gold layer.
- In Data Factory, create a new pipeline.
- Add a copy activity to move data from the
raw_Bronzelakehouse to thecleansed_Silverlakehouse. - Add another copy activity to move data from the
cleansed_Silverlakehouse to thecurated_Goldlakehouse.
- Schedule Pipelines: Schedule these pipelines to run at appropriate intervals.
- Set up triggers in Data Factory to run the pipelines on a schedule that aligns with your data freshness requirements.
- Consider the frequency of data updates and the latency that is acceptable for your use case.
-
Configure SQL Analytics Endpoint:
- Validate if you have the SQL Analytics Endpoint configured, you can review it from workspace view, primarly is required for Gold layer to be accessible to your reporting tools.
-
Create a semantic models with the required tables for the report.
-
Create Power BI Reports: Use Power BI in Direct Lake Mode to create reports and dashboards from the data in the Gold layer.
-
Connect to the
curated_Goldlakehouse using the SQL analytics endpoint. -
Build reports and dashboards that provide insights into the data.
Create new report:
If you click
auto-create reportCopilot will create a draft report for you:
You can edit your report:
Also, you can levarage AI capabilities with Copilot:
-














