Costa Rica
Last updated: 2025-05-03
List of References (Click to expand)
- Automated enterprise BI
- Data warehousing and analytics
- Pipeline Logic 1: Error Handling and Best Effort Step
- Source control in Azure Data Factory
- Register a Microsoft Entra app and create a service principal
- Accessing Azure KeyVault secret for Fabric CopyData Action
- Use Azure Key Vault secrets in pipeline activities - ADF
Table of Content (Click to expand)
- Clear Pipeline Structure
- Parameterization
- Incremental Loading
- Error Handling and Monitoring
- Security Measures
- Use Azure Key Vault
- Source Control
- Resource Management
- Testing and Validation
- Documentation
- Regular Updates
- Performance Tuning
- Recommended Training Modules on Microsoft Learn
- Architecture examples
Ensure your pipelines are well-organized and easy to understand.
| Best Practice | Description | Example |
|---|---|---|
| Consistent Naming Conventions | Use clear and descriptive names for pipelines, activities, datasets, and linked services. | - Instead of Pipeline1, use CopySalesDataPipeline. - Name datasets like SalesDataSource and SalesDataSink. - Use descriptive names for activities like ExtractSalesData and LoadSalesData. |
| Modular Design | Break down complex workflows into smaller, reusable pipelines. | - Create separate pipelines for data extraction, transformation, and loading (ETL). - Use a master pipeline to orchestrate the execution of these smaller pipelines. |
| Annotations and Comments | Add descriptions and comments to activities and pipelines. | - Add an annotation to a Copy Data activity: Copy sales data from SQL to Blob Storage. - Use comments to explain complex logic or business rules within the pipeline. |
| Parameterization | Use parameters to make your pipelines more flexible and easier to manage. | - Define parameters for source and destination paths, allowing the same pipeline for different datasets. - Use parameters for file names, dates, and other dynamic values. |
| Organized Layout | Arrange activities in a logical sequence and avoid overlapping lines. | - Place activities in a left-to-right or top-to-bottom flow to visually represent the data flow. - Group related activities together and use containers for better organization. |
| Error Handling and Logging | Include error handling and logging activities to capture and manage errors. | - Add a Web Activity to log errors to a monitoring system. - Use Try-Catch blocks to handle errors gracefully and ensure the pipeline continues running. |
Pipeline: CopySalesDataPipeline
Activities:
graph TD
A[LookupFiles] --> B[ForEachFile]
B --> C[CopySalesData]
B --> D[LogStatus]
-
Lookup Activity: Fetch the list of files to be processed.
-
ForEach Activity: Iterate over the list of files.
Use parameters to make your pipelines more flexible and easier to manage.
| Best Practice | Description | Example |
|---|---|---|
| Use Parameters | Define parameters to make your pipelines more flexible and easier to manage. | - Use parameters for source and destination paths, file names, and other configurable settings. - Define a parameter for the date range to dynamically filter data in a query. - Create a parameter for the environment (e.g., Dev, Test, Prod) to switch between different configurations. |
| Dynamic Content | Utilize dynamic content expressions to reference parameters within activities. | - @pipeline().parameters.SourcePath to dynamically set the source path in a Copy Data activity. - Use @concat('https://', pipeline().parameters.StorageAccountName, '.blob.core.windows.net/') to dynamically construct a URL. - Apply @formatDateTime(pipeline().parameters.ExecutionDate, 'yyyy-MM-dd') to format dates dynamically. |
| Global Parameters | Use global parameters for values that are used across multiple pipelines. | - Define a global parameter for a storage account name used in various pipelines. - Create a global parameter for a common API key used across multiple pipelines. - Use a global parameter for a base URL that is referenced in multiple activities. |
| Parameterize Datasets | Parameterize datasets to handle different data sources or destinations. | - Create a dataset with a parameterized file path to handle different file names dynamically. - Use parameters in datasets to switch between different databases or tables. - Define parameters for connection strings to dynamically connect to different data sources. |
Implement incremental data loading to improve efficiency.
| Best Practice | Description | Example |
|---|---|---|
| Use Timestamps | Implement incremental loading using timestamps to load only new or changed data. | - Use a watermark table to track the last loaded timestamp and load data newer than this value. - Add a column to your source table to store the last modified timestamp and use it in your queries. |
| Change Data Capture (CDC) | Utilize CDC to capture and load only the changes made to the source data. | - Enable CDC on a SQL Server table to capture insert, update, and delete operations. - Use the CDC functions to retrieve changes and apply them to your destination. |
| Delta Loads | Perform delta loads to update only the changed data instead of full loads. | - Use a query to fetch only the rows that have changed since the last load. - Implement a mechanism to track changes, such as a version number or a change flag. |
| Partitioning | Partition large datasets to improve performance and manageability. | - Partition data by date or another logical key to facilitate incremental loading. - Use partitioned tables in your data warehouse to improve query performance and manageability. |
Implement incremental loading using timestamps to load only new or changed data.
- Add a Last Modified Column:
- Ensure your source table has a column to store the last modified timestamp.
- Example:
LastModifiedcolumn in your source table.
- Create a Watermark Table:
- Create a table to track the last loaded timestamp.
- Example:
WatermarkTablewith columnsTableNameandLastLoadedTimestamp.
- Set Up the Pipeline:
- Use a Lookup activity to get the last loaded timestamp from the watermark table.
- Use a Copy Data activity to load data newer than the last loaded timestamp.
- Example query:
SELECT * FROM SourceTable WHERE LastModified > @pipeline().parameters.LastLoadedTimestamp.
- Update the Watermark Table:
- After loading the data, update the watermark table with the latest timestamp.
- Use a Stored Procedure activity to update the
LastLoadedTimestampin the watermark table.
Utilize CDC to capture and load only the changes made to the source data.
- Enable CDC on Source Table:
- Enable CDC on your SQL Server table.
- Example:
EXEC sys.sp_cdc_enable_table @source_schema = 'dbo', @source_name = 'SourceTable', @role_name = NULL.
- Set Up the Pipeline:
- Use a Lookup activity to get the changes from the CDC table.
- Example query:
SELECT * FROM cdc.dbo_SourceTable_CT WHERE __$operation IN (1, 2, 3).
- Process Changes:
- Use a ForEach activity to process each change.
- Inside the ForEach activity, use Copy Data activities to apply the changes to the destination.
Perform delta loads to update only the changed data instead of full loads.
- Track Changes:
- Implement a mechanism to track changes, such as a version number or a change flag.
- Example: Add a
ChangeFlagcolumn to your source table.
- Set Up the Pipeline:
- Use a Lookup activity to get the rows with changes.
- Example query:
SELECT * FROM SourceTable WHERE ChangeFlag = 1.
- Load Changed Data:
- Use a Copy Data activity to load only the changed data.
- After loading, reset the
ChangeFlagto 0.
Partition large datasets to improve performance and manageability.
- Partition Your Data:
- Partition your source table by a logical key, such as date.
- Example: Partition by
OrderDate.
- Set Up the Pipeline:
- Use a Lookup activity to get the list of partitions to process.
- Example query:
SELECT DISTINCT OrderDate FROM SourceTable WHERE OrderDate > @pipeline().parameters.LastProcessedDate.
- Process Each Partition:
- Use a ForEach activity to process each partition.
- Inside the ForEach activity, use a Copy Data activity to load data for each partition.
Set up robust error handling and monitoring to quickly identify and resolve issues.
| Best Practice | Description | Example |
|---|---|---|
| Try-Catch Activities | - Use Try-Catch blocks to handle errors gracefully within pipelines (ADF). - If block (ADF in Fabric) |
- Use an If Condition activity to check for errors and handle them in a Catch block. - Implement a Try activity to execute the main logic and a Catch activity to handle any errors that occur. (ADF) - Add a Set Variable activity in the If block to log the error message. |
| Retry Policies | Configure retry policies on activities to handle transient errors. | - Set a retry policy with 3 retries and a 5-minute interval for a Copy Data activity. - Configure a Lookup activity to retry twice with a 2-minute interval in case of failure. - Use the retry policy on a Web Activity to handle intermittent network issues. |
| Alerts and Notifications | Set up alerts and notifications to monitor pipeline runs and failures. | - Use Azure Monitor to create alerts for failed pipeline runs and send email notifications. - Configure alerts to trigger SMS notifications for critical pipeline failures. - Set up a Logic App to send Slack notifications when a pipeline fails. |
| Custom Logging | Implement custom logging to capture detailed error information. | - Use a Web Activity to log errors to an external logging service or database. - Implement an Azure Function to log detailed error information and call it from the pipeline. - Use a Set Variable activity to capture error details and write them to a log file in Azure Blob Storage. |
-
Create a Pipeline:
-
Add If Condition Activity:
-
In the Activities pane, search for
If Condition. -
Add your copy data activity or any other necessary activity. You may include an if condition for process success or failure if required. The standard approach is to check for failure, but sometimes both conditions are used for logging purposes. Click the
If conditionsicon, and connect each block to its respective status.
-
Configure the If Condition activity to check for specific conditions that indicate an error. For example, you can check the output of a previous activity.
-
-
Define True and False Paths:
-
Set Retry Policy:
-
Enable Logging: In the activity settings, enable the
Enable loggingoption to ensure the pipeline proceeds even if the activity fails.
-
Add a Web Activity to your pipeline.
-
Configure the Web Activity to call an external service or function to handle errors, such as sending an email notification or logging the error in a monitoring system.
-
In the ADF monitoring interface, navigate to the
Monitorsection, if you don't see it click on.... -
Check the status of individual activities within your pipelines for success, failure, and skipped activities. Or search for any specific pipeline.
-
Click on the activity to see the
Details, and click on thePipeline Run ID:
Apply security best practices to protect your data.
| Best Practice | Description | Example |
|---|---|---|
| Access Control | Implement role-based access control (RBAC) to manage permissions. | - Assign specific roles to users and groups to control access to ADF resources. - Use Azure Entra ID formerly known as Active Directory (AAD), to manage user permissions and access levels. - Create custom roles for specific tasks, such as DataFactoryContributor for pipeline authors. |
| Data Encryption | Ensure data is encrypted at rest and in transit. | - Use Azure Storage encryption and HTTPS for data transfers. - Enable Transparent Data Encryption (TDE) for SQL databases used in ADF. - Use Azure Key Vault to manage encryption keys for data at rest. |
| Network Security | Use virtual networks and private endpoints to secure data access. | - Configure ADF to use a private endpoint for accessing data in a storage account. - Set up a virtual network (VNet) to isolate and secure ADF resources. - Use Network Security Groups (NSGs) to control inbound and outbound traffic to ADF. |
| Audit Logs | Enable auditing to track access and changes to ADF resources. | - Use Azure Monitor to collect and analyze audit logs for ADF activities. - Enable diagnostic settings to send logs to Azure Log Analytics, Event Hubs, or a storage account. - Regularly review audit logs to detect and respond to unauthorized access or changes. |
Store sensitive information such as connection strings, passwords, and API keys in Azure Key Vault to enhance security and manage secrets efficiently.
| Best Practice | Description | Example |
|---|---|---|
| Store Secrets | Store sensitive information such as connection strings and passwords in Key Vault. | - Store a database connection string in Azure Key Vault and reference it in ADF. - Save API keys and other credentials in Key Vault to avoid hardcoding them in pipelines. - Use Key Vault to store SAS tokens for accessing Azure Storage. |
| Access Policies | Configure access policies to control who can access secrets. | - Grant ADF managed identity access to specific secrets in Key Vault. - Set up access policies to allow only specific users or applications to retrieve secrets. - Use Key Vault access policies to restrict access based on roles and responsibilities. |
| Secure Access | Use managed identities to securely access Key Vault secrets. | - Configure ADF to use its managed identity to retrieve secrets from Key Vault. - Enable managed identity for ADF and grant it access to Key Vault secrets. - Use managed identities to avoid storing credentials in code or configuration files. |
| Rotate Secrets | Regularly rotate secrets to enhance security. | - Update secrets in Key Vault periodically and update references in ADF. - Implement a process to rotate secrets automatically using Azure Automation or Logic Apps. - Notify relevant teams when secrets are rotated to ensure they update their configurations. |
Store sensitive information such as connection strings, passwords, and API keys in Key Vault.
-
Create an Azure Key Vault:
-
Add Secrets to Key Vault:
-
Reference Secrets in Data Factory in Microsoft Fabric:
Configure access policies to control who can access secrets.
-
Set Up Access Policies in Key Vault:
- In the Key Vault, navigate to the Access policies section.
- Add an access policy to grant the Data Factory managed identity access to specific secrets.
- Example: Grant
GetandListpermissions to the Data Factory managed identity.
-
Restrict Access Based on Roles:
- Define access policies to allow only specific users or applications to retrieve secrets.
- Example: Grant access to specific roles such as
DataFactoryContributorfor managing secrets.
Use managed identities to securely access Key Vault secrets.
Grant Key Vault Access to Managed Identity:
- In the Key Vault, add an access policy to grant the Data Factory managed identity access to the required secrets.
- Example: Grant
GetandListpermissions to the managed identity.
Regularly rotate secrets to enhance security.
- Update Secrets in Key Vault:
- Periodically update secrets in Key Vault to ensure they remain secure.
- Example: Update the
DBConnectionStringsecret with a new connection string.
- Implement Automated Secret Rotation:
- Use Azure Automation or Logic Apps to automate the process of rotating secrets.
- Example: Create an Azure Automation runbook to update secrets and notify relevant teams.
- Notify Teams of Secret Rotation:
- Ensure that relevant teams are notified when secrets are rotated.
- Example: Use Logic Apps to send email notifications when secrets are updated.
Benefits of Git Integration:
- Version Control: Track and audit changes, and revert to previous versions if needed.
- Collaboration: Multiple team members can work on the same project simultaneously.
- Incremental Saves: Save partial changes without publishing them live.
- Change Tracking: Easily identify who made changes and when.
- Create a Git Repository:
- You can use either Azure Repos or GitHub for your Git repository.
- Create a new repository if you don't already have one.
- Configure Git Integration in Data Factory:
-
Open your Data Factory in Microsoft Fabric.
-
Go to the
Workspace settingstab.
-
Under
Git integration, click onConnect Git provider and account. -
Select the repository type (Azure DevOps Git or GitHub).
-
Provide the necessary details such as the repository name, collaboration branch, and publish branch.
-
Optionally, import existing resources into the repository.
-
- Branch Management:
- Feature Branch: Create feature branches for development. These branches allow you to work on new features or changes without affecting the main codebase.
- Collaboration Branch: This branch is used for testing and reviewing changes before they are merged into the main branch.
- Publish Branch: The publish branch (e.g.,
adf_publish) contains the ARM templates of the live code.
- Develop and Save Changes:
- Work on your pipelines in the feature branch.
- Save changes incrementally in the Git repository.
- Use the Publish button to push changes from the collaboration branch to the publish branch, making them live.
- Collaboration and Review:
- Use pull requests to review and merge changes from feature branches to the collaboration branch.
- Collaborate with team members through code reviews and comments.
Optimize resource usage to improve performance and reduce costs.
| Best Practice | Description | Example |
|---|---|---|
| Optimize Resource Usage | Monitor and optimize resource usage to improve performance and reduce costs. | - Use Azure Monitor to track resource usage and identify optimization opportunities. - Analyze pipeline execution times to find and eliminate bottlenecks. - Use cost analysis tools to identify and reduce unnecessary expenses. |
| Scaling | Scale ADF resources based on workload requirements. | - Use Azure Data Factory's scaling features to adjust resources during peak loads. - Implement auto-scaling for data flows to handle varying data volumes. - Adjust the number of Data Integration Units (DIUs) based on workload requirements. |
| Cost Management | Implement cost management practices to control expenses. | - Use Azure Cost Management to monitor and manage ADF costs. - Set budgets and alerts to avoid unexpected expenses. - Review and optimize the use of Data Integration Units (DIUs) to balance cost and performance. |
| Resource Tagging | Tag resources for better organization and cost tracking. | - Apply tags to ADF resources to categorize and track costs by project or department. - Use tags to identify and manage resources associated with specific business units. - Implement tagging policies to ensure consistent resource tagging across the organization. |
Regularly test and validate your pipelines to ensure they work as expected.
| Best Practice | Description | Example |
|---|---|---|
| Unit Testing | Implement unit tests for individual pipeline components. | - Use Azure DevOps to create unit tests for ADF activities and datasets. - Test individual activities like Copy Data or Lookup to ensure they function correctly. - Validate parameter values and outputs for each activity. |
| Integration Testing | Perform integration testing to ensure end-to-end functionality. | - Test the entire data pipeline from source to destination to validate data flow. - Simulate real-world scenarios to ensure the pipeline handles data correctly. - Verify data transformations and loading processes in an integrated environment. |
| Validation Activities | Use validation activities to check data quality and integrity. | - Add a validation activity to verify the row count or data format after a Copy Data activity. - Implement data quality checks to ensure data accuracy and completeness. - Use custom scripts or functions to validate complex data transformations. |
| Automated Testing | Automate testing processes to ensure consistency and reliability. | - Use Azure DevOps pipelines to automate the testing of ADF pipelines. - Schedule automated tests to run after each deployment or code change. - Integrate automated testing with CI/CD pipelines to ensure continuous validation. |
Maintain comprehensive documentation for your pipelines.
| Best Practice | Description | Example |
|---|---|---|
| Comprehensive Documentation | Maintain detailed documentation for all pipelines and activities. | - Document the purpose, inputs, outputs, and dependencies of each pipeline. - Include step-by-step instructions for setting up and running pipelines. - Provide examples and use cases to illustrate pipeline functionality. |
| Version Control | Use version control to manage documentation updates. | - Store documentation in a Git repository alongside the pipeline code. - Track changes to documentation to maintain an accurate history. - Use pull requests to review and approve documentation updates. |
| Annotations | Use annotations within ADF to provide context and explanations. | - Add annotations to activities to describe their function and any important details. - Use comments to explain complex logic or business rules within the pipeline. - Highlight key parameters and settings with annotations for easy reference. |
| Knowledge Sharing | Share documentation with the team to ensure everyone is informed. | - Use a shared platform like SharePoint or Confluence to store and share documentation. - Conduct regular training sessions to keep the team updated on best practices. - Encourage team members to contribute to and update the documentation. |
Keep your pipelines and ADF environment up to date.
| Best Practice | Description | Example |
|---|---|---|
| Keep ADF Up to Date | Regularly update ADF to the latest version to benefit from new features and fixes. | - Monitor Azure updates and apply them to your ADF environment. - Subscribe to Azure updates and announcements to stay informed. - Test updates in a staging environment before applying them to production. |
| Pipeline Maintenance | Regularly review and update pipelines to ensure they remain efficient and effective. | - Schedule periodic reviews of pipeline performance and make necessary adjustments. - Optimize pipeline logic and activities to improve efficiency. - Remove or update deprecated activities and datasets. |
| Dependency Management | Keep dependencies up to date to avoid compatibility issues. | - Update linked services and datasets to use the latest versions of data sources. - Regularly review and update external dependencies like libraries and APIs. - Ensure compatibility between ADF and other integrated services. |
| Security Patches | Apply security patches promptly to protect against vulnerabilities. | - Monitor security advisories and apply patches to ADF and related services. - Implement a patch management process to ensure timely updates. - Conduct regular security assessments to identify and address vulnerabilities. |
Continuously monitor and tune performance.
| Best Practice | Description | Example |
|---|---|---|
| Monitor Performance | Continuously monitor pipeline performance to identify bottlenecks. | - Use Azure Monitor to track pipeline execution times and resource usage. - Set up alerts for performance thresholds to detect issues early. - Analyze performance metrics to identify and resolve bottlenecks. |
| Optimize Queries | Optimize queries used in activities to improve performance. | - Rewrite SQL queries to reduce execution time and resource consumption. - Use indexing and partitioning to improve query performance. - Avoid complex joins and subqueries that can slow down data processing. |
| Parallel Processing | Use parallel processing to speed up data processing tasks. | - Configure ForEach activities to run in parallel for faster execution. - Use parallelism in data flows to process large datasets more efficiently. - Balance parallel tasks to avoid overloading resources. |
| Resource Allocation | Allocate resources efficiently to balance performance and cost. | - Adjust the number of Data Integration Units (DIUs) based on workload requirements. - Use resource groups to manage and allocate resources effectively. - Monitor resource usage and adjust allocations to optimize performance. |
- Introductory training modules for Azure Data Factory
- Quickstart: Get started with Azure Data Factory
- Introduction to Azure Data Factory: This module covers the basics of ADF and how it can help integrate your data sources
- Integrate data with Azure Data Factory or Azure Synapse Pipeline: Learn about data integration patterns, setting up ADF, and creating pipelines
- Petabyte scale ingestion with Azure Data Factory or Azure Synapse Pipeline: Explore the different methods for transferring data between various data stores using Azure Data Factory.
- A categorized list of Azure Data Factory tutorials by scenarios
- Full list of Data Factory trainings
Consider lakehouse or warehouse for storage:



















