Skip to content

Commit 0607427

Browse files
authored
format, change titles, order and some small context
1 parent 3478c7a commit 0607427

1 file changed

Lines changed: 74 additions & 75 deletions

File tree

Workloads-Specific/DataFactory/BestPractices.md

Lines changed: 74 additions & 75 deletions
Original file line numberDiff line numberDiff line change
@@ -26,46 +26,38 @@ Last updated: 2025-04-21
2626
<details>
2727
<summary><b>Table of Content </b> (Click to expand)</summary>
2828

29-
- [Architecture examples](#architecture-examples)
30-
- [Best Practices for ADF Pipelines](#best-practices-for-adf-pipelines)
31-
- [Clear Pipeline Structure](#clear-pipeline-structure)
32-
- [Example Pipeline Structure](#example-pipeline-structure)
33-
- [Parameterization](#parameterization)
34-
- [Incremental Loading](#incremental-loading)
35-
- [Use Timestamps](#use-timestamps)
36-
- [Change Data Capture CDC](#change-data-capture-cdc)
37-
- [Delta Loads](#delta-loads)
38-
- [Partitioning](#partitioning)
39-
- [Error Handling and Monitoring](#error-handling-and-monitoring)
40-
- [a. Use If Condition Activity](#a-use-if-condition-activity)
41-
- [b. Configure Activity Fault Tolerance](#b-configure-activity-fault-tolerance)
42-
- [c. Custom Error Handling: Use Web Activity for error handling](#c-custom-error-handling-use-web-activity-for-error-handling)
43-
- [d. Pipeline Monitoring: Monitor activity runs.](#d-pipeline-monitoring-monitor-activity-runs)
44-
- [Security Measures](#security-measures)
45-
- [Use Azure Key Vault](#use-azure-key-vault)
46-
- [Store Secrets](#store-secrets)
47-
- [Access Policies](#access-policies)
48-
- [Secure Access](#secure-access)
49-
- [Rotate Secrets](#rotate-secrets)
50-
- [Source Control](#source-control)
51-
- [Resource Management](#resource-management)
52-
- [Testing and Validation](#testing-and-validation)
53-
- [Documentation](#documentation)
54-
- [Regular Updates](#regular-updates)
55-
- [Performance Tuning](#performance-tuning)
29+
- [Clear Pipeline Structure](#clear-pipeline-structure)
30+
- [Example Pipeline Structure](#example-pipeline-structure)
31+
- [Parameterization](#parameterization)
32+
- [Incremental Loading](#incremental-loading)
33+
- [Use Timestamps](#use-timestamps)
34+
- [Change Data Capture CDC](#change-data-capture-cdc)
35+
- [Delta Loads](#delta-loads)
36+
- [Partitioning](#partitioning)
37+
- [Error Handling and Monitoring](#error-handling-and-monitoring)
38+
- [a. Use If Condition Activity](#a-use-if-condition-activity)
39+
- [b. Configure Activity Fault Tolerance](#b-configure-activity-fault-tolerance)
40+
- [c. Custom Error Handling: Use Web Activity for error handling](#c-custom-error-handling-use-web-activity-for-error-handling)
41+
- [d. Pipeline Monitoring: Monitor activity runs.](#d-pipeline-monitoring-monitor-activity-runs)
42+
- [Security Measures](#security-measures)
43+
- [Use Azure Key Vault](#use-azure-key-vault)
44+
- [Store Secrets](#store-secrets)
45+
- [Access Policies](#access-policies)
46+
- [Secure Access](#secure-access)
47+
- [Rotate Secrets](#rotate-secrets)
48+
- [Source Control](#source-control)
49+
- [Resource Management](#resource-management)
50+
- [Testing and Validation](#testing-and-validation)
51+
- [Documentation](#documentation)
52+
- [Regular Updates](#regular-updates)
53+
- [Performance Tuning](#performance-tuning)
5654
- [Recommended Training Modules on Microsoft Learn](#recommended-training-modules-on-microsoft-learn)
55+
- [Architecture examples](#architecture-examples)
5756

5857
</details>
5958

60-
## Architecture examples
61-
62-
<img width="550" alt="image" src="https://github.com/user-attachments/assets/42bbf7f5-eb6d-455b-886d-d8f665f0dfa0">
63-
64-
<img width="550" alt="image" src="https://github.com/user-attachments/assets/2c06eaaf-3689-48f3-8e97-7c9c128800d9">
65-
66-
## Best Practices for ADF Pipelines
6759

68-
### Clear Pipeline Structure
60+
## Clear Pipeline Structure
6961

7062
> Ensure your pipelines are well-organized and easy to understand.
7163
@@ -78,7 +70,7 @@ Last updated: 2025-04-21
7870
| **Organized Layout** | Arrange activities in a logical sequence and avoid overlapping lines. | - Place activities in a left-to-right or top-to-bottom flow to visually represent the data flow. <br/> - Group related activities together and use containers for better organization. |
7971
| **Error Handling and Logging**| Include error handling and logging activities to capture and manage errors. | - Add a Web Activity to log errors to a monitoring system. <br/> - Use Try-Catch blocks to handle errors gracefully and ensure the pipeline continues running. |
8072

81-
#### Example Pipeline Structure
73+
### Example Pipeline Structure
8274

8375
> Pipeline: CopySalesDataPipeline
8476
@@ -135,8 +127,8 @@ graph TD
135127

136128
<img width="550" alt="image" src="https://github.com/user-attachments/assets/63b0db12-8a4e-4dae-ac5a-cdf74ab6f7bf" />
137129

138-
### Parameterization
139-
>
130+
## Parameterization
131+
140132
> Use parameters to make your pipelines more flexible and easier to manage.
141133
142134
| **Best Practice** | **Description** | **Example** |
@@ -146,8 +138,8 @@ graph TD
146138
| **Global Parameters** | Use global parameters for values that are used across multiple pipelines. | - Define a global parameter for a storage account name used in various pipelines. <br/> - Create a global parameter for a common API key used across multiple pipelines. <br/> - Use a global parameter for a base URL that is referenced in multiple activities. |
147139
| **Parameterize Datasets** | Parameterize datasets to handle different data sources or destinations. | - Create a dataset with a parameterized file path to handle different file names dynamically. <br/> - Use parameters in datasets to switch between different databases or tables. <br/> - Define parameters for connection strings to dynamically connect to different data sources. |
148140

149-
### Incremental Loading
150-
>
141+
## Incremental Loading
142+
151143
> Implement incremental data loading to improve efficiency.
152144
153145
| **Best Practice** | **Description** | **Example** |
@@ -157,7 +149,7 @@ graph TD
157149
| **Delta Loads** | Perform delta loads to update only the changed data instead of full loads. | - Use a query to fetch only the rows that have changed since the last load. <br/> - Implement a mechanism to track changes, such as a version number or a change flag. |
158150
| **Partitioning** | Partition large datasets to improve performance and manageability. | - Partition data by date or another logical key to facilitate incremental loading. <br/> - Use partitioned tables in your data warehouse to improve query performance and manageability. |
159151

160-
#### Use Timestamps
152+
### Use Timestamps
161153

162154
> Implement incremental loading using timestamps to load only new or changed data.
163155
@@ -175,8 +167,8 @@ graph TD
175167
- After loading the data, update the watermark table with the latest timestamp.
176168
- Use a Stored Procedure activity to update the `LastLoadedTimestamp` in the watermark table.
177169

178-
#### Change Data Capture (CDC)
179-
>
170+
### Change Data Capture (CDC)
171+
180172
> Utilize CDC to capture and load only the changes made to the source data.
181173
182174
1. **Enable CDC on Source Table**:
@@ -189,8 +181,8 @@ graph TD
189181
- Use a ForEach activity to process each change.
190182
- Inside the ForEach activity, use Copy Data activities to apply the changes to the destination.
191183

192-
#### Delta Loads
193-
>
184+
### Delta Loads
185+
194186
> Perform delta loads to update only the changed data instead of full loads.
195187
196188
1. **Track Changes**:
@@ -203,8 +195,8 @@ graph TD
203195
- Use a Copy Data activity to load only the changed data.
204196
- After loading, reset the `ChangeFlag` to 0.
205197

206-
#### Partitioning
207-
>
198+
### Partitioning
199+
208200
> Partition large datasets to improve performance and manageability.
209201
210202
1. **Partition Your Data**:
@@ -217,8 +209,8 @@ graph TD
217209
- Use a ForEach activity to process each partition.
218210
- Inside the ForEach activity, use a Copy Data activity to load data for each partition.
219211

220-
### Error Handling and Monitoring
221-
>
212+
## Error Handling and Monitoring
213+
222214
> Set up robust error handling and monitoring to quickly identify and resolve issues.
223215
224216
| **Best Practice** | **Description** | **Example** |
@@ -228,7 +220,7 @@ graph TD
228220
| **Alerts and Notifications** | Set up alerts and notifications to monitor pipeline runs and failures. | - Use Azure Monitor to create alerts for failed pipeline runs and send email notifications. <br/> - Configure alerts to trigger SMS notifications for critical pipeline failures. <br/> - Set up a Logic App to send Slack notifications when a pipeline fails. |
229221
| **Custom Logging** | Implement custom logging to capture detailed error information. | - Use a Web Activity to log errors to an external logging service or database. <br/> - Implement an Azure Function to log detailed error information and call it from the pipeline. <br/> - Use a Set Variable activity to capture error details and write them to a log file in Azure Blob Storage. |
230222

231-
#### a. **Use If Condition Activity**
223+
### a. **Use If Condition Activity**
232224

233225
1. **Create a Pipeline**:
234226
- Open Microsoft Fabric and navigate to Azure Data Factory.
@@ -255,7 +247,7 @@ graph TD
255247

256248
<img width="550" alt="image" src="https://github.com/user-attachments/assets/4d4f447e-0924-4f8c-8cb0-3dddc72ef85b" />
257249

258-
#### b. **Configure Activity Fault Tolerance**
250+
### b. **Configure Activity Fault Tolerance**
259251

260252
1. **Set Retry Policy**:
261253
- Select an activity within your pipeline.
@@ -267,7 +259,7 @@ graph TD
267259

268260
<img width="550" alt="image" src="https://github.com/user-attachments/assets/35184bf2-de54-40d8-8968-d160760bc0bd" />
269261

270-
#### c. **Custom Error Handling**: Use Web Activity for error handling
262+
### c. **Custom Error Handling**: Use Web Activity for error handling
271263

272264
- Add a Web Activity to your pipeline.
273265

@@ -277,7 +269,7 @@ graph TD
277269

278270
<img width="550" alt="image" src="https://github.com/user-attachments/assets/a16b6487-45c5-40b5-8cd8-82146eb5456d" />
279271

280-
#### d. **Pipeline Monitoring**: Monitor activity runs
272+
### d. **Pipeline Monitoring**: Monitor activity runs
281273

282274
- In the ADF monitoring interface, navigate to the `Monitor` section, if you don't see it click on `...`.
283275
- Check the status of individual activities within your pipelines for success, failure, and skipped activities. Or search for any specific pipeline.
@@ -287,8 +279,8 @@ graph TD
287279

288280
<img width="550" alt="image" src="https://github.com/user-attachments/assets/bd52e6c5-c530-4df9-bbaf-8640ebbde336" />
289281

290-
### Security Measures
291-
>
282+
## Security Measures
283+
292284
> Apply security best practices to protect your data.
293285
294286
| **Best Practice** | **Description** | **Example** |
@@ -298,8 +290,8 @@ graph TD
298290
| **Network Security** | Use virtual networks and private endpoints to secure data access. | - Configure ADF to use a private endpoint for accessing data in a storage account. <br/> - Set up a virtual network (VNet) to isolate and secure ADF resources. <br/> - Use Network Security Groups (NSGs) to control inbound and outbound traffic to ADF. |
299291
| **Audit Logs** | Enable auditing to track access and changes to ADF resources. | - Use Azure Monitor to collect and analyze audit logs for ADF activities. <br/> - Enable diagnostic settings to send logs to Azure Log Analytics, Event Hubs, or a storage account. <br/> - Regularly review audit logs to detect and respond to unauthorized access or changes. |
300292

301-
### Use Azure Key Vault
302-
>
293+
## Use Azure Key Vault
294+
303295
> Store sensitive information such as connection strings, passwords, and API keys in Azure Key Vault to enhance security and manage secrets efficiently.
304296
305297
| **Best Practice** | **Description** | **Example** |
@@ -309,7 +301,7 @@ graph TD
309301
| **Secure Access** | Use managed identities to securely access Key Vault secrets. | - Configure ADF to use its managed identity to retrieve secrets from Key Vault. <br/> - Enable managed identity for ADF and grant it access to Key Vault secrets. <br/> - Use managed identities to avoid storing credentials in code or configuration files. |
310302
| **Rotate Secrets** | Regularly rotate secrets to enhance security. | - Update secrets in Key Vault periodically and update references in ADF. <br/> - Implement a process to rotate secrets automatically using Azure Automation or Logic Apps. <br/> - Notify relevant teams when secrets are rotated to ensure they update their configurations. |
311303

312-
#### Store Secrets
304+
### Store Secrets
313305

314306
> Store sensitive information such as connection strings, passwords, and API keys in Key Vault.
315307
@@ -333,8 +325,8 @@ graph TD
333325

334326
<img width="550" alt="image" src="https://github.com/user-attachments/assets/92787052-7512-4e1a-b5d9-50d05ec8219c" />
335327

336-
#### Access Policies
337-
>
328+
### Access Policies
329+
338330
> Configure access policies to control who can access secrets.
339331
340332
1. **Set Up Access Policies in Key Vault**:
@@ -346,7 +338,7 @@ graph TD
346338
- Define access policies to allow only specific users or applications to retrieve secrets.
347339
- Example: Grant access to specific roles such as `DataFactoryContributor` for managing secrets.
348340

349-
#### Secure Access
341+
### Secure Access
350342

351343
> Use managed identities to securely access Key Vault secrets.
352344
@@ -355,8 +347,8 @@ graph TD
355347
- In the Key Vault, add an access policy to grant the Data Factory managed identity access to the required secrets.
356348
- Example: Grant `Get` and `List` permissions to the managed identity.
357349

358-
#### Rotate Secrets
359-
>
350+
### Rotate Secrets
351+
360352
> Regularly rotate secrets to enhance security.
361353
362354
1. **Update Secrets in Key Vault**:
@@ -369,10 +361,9 @@ graph TD
369361
- Ensure that relevant teams are notified when secrets are rotated.
370362
- Example: Use Logic Apps to send email notifications when secrets are updated.
371363

372-
### Source Control
364+
## Source Control
373365

374366
> Benefits of Git Integration: <br/>
375-
>
376367
> - **Version Control**: Track and audit changes, and revert to previous versions if needed. <br/>
377368
> - **Collaboration**: Multiple team members can work on the same project simultaneously. <br/>
378369
> - **Incremental Saves**: Save partial changes without publishing them live. <br/>
@@ -406,8 +397,8 @@ graph TD
406397
- Use pull requests to review and merge changes from feature branches to the collaboration branch.
407398
- Collaborate with team members through code reviews and comments.
408399

409-
### Resource Management
410-
>
400+
## Resource Management
401+
411402
> Optimize resource usage to improve performance and reduce costs.
412403
413404
| **Best Practice** | **Description** | **Example** |
@@ -417,8 +408,8 @@ graph TD
417408
| **Cost Management** | Implement cost management practices to control expenses. | - Use Azure Cost Management to monitor and manage ADF costs. <br/> - Set budgets and alerts to avoid unexpected expenses. <br/> - Review and optimize the use of Data Integration Units (DIUs) to balance cost and performance. |
418409
| **Resource Tagging** | Tag resources for better organization and cost tracking. | - Apply tags to ADF resources to categorize and track costs by project or department. <br/> - Use tags to identify and manage resources associated with specific business units. <br/> - Implement tagging policies to ensure consistent resource tagging across the organization. |
419410

420-
### Testing and Validation
421-
>
411+
## Testing and Validation
412+
422413
> Regularly test and validate your pipelines to ensure they work as expected.
423414
424415
| **Best Practice** | **Description** | **Example** |
@@ -428,8 +419,8 @@ graph TD
428419
| **Validation Activities** | Use validation activities to check data quality and integrity. | - Add a validation activity to verify the row count or data format after a Copy Data activity. <br/> - Implement data quality checks to ensure data accuracy and completeness. <br/> - Use custom scripts or functions to validate complex data transformations. |
429420
| **Automated Testing** | Automate testing processes to ensure consistency and reliability. | - Use Azure DevOps pipelines to automate the testing of ADF pipelines. <br/> - Schedule automated tests to run after each deployment or code change. <br/> - Integrate automated testing with CI/CD pipelines to ensure continuous validation. |
430421

431-
### Documentation
432-
>
422+
## Documentation
423+
433424
> Maintain comprehensive documentation for your pipelines.
434425
435426
| **Best Practice** | **Description** | **Example** |
@@ -439,8 +430,8 @@ graph TD
439430
| **Annotations** | Use annotations within ADF to provide context and explanations. | - Add annotations to activities to describe their function and any important details. <br/> - Use comments to explain complex logic or business rules within the pipeline. <br/> - Highlight key parameters and settings with annotations for easy reference. |
440431
| **Knowledge Sharing** | Share documentation with the team to ensure everyone is informed. | - Use a shared platform like SharePoint or Confluence to store and share documentation. <br/> - Conduct regular training sessions to keep the team updated on best practices. <br/> - Encourage team members to contribute to and update the documentation. |
441432

442-
### Regular Updates
443-
>
433+
## Regular Updates
434+
444435
> Keep your pipelines and ADF environment up to date.
445436
446437
| **Best Practice** | **Description** | **Example** |
@@ -450,8 +441,8 @@ graph TD
450441
| **Dependency Management** | Keep dependencies up to date to avoid compatibility issues. | - Update linked services and datasets to use the latest versions of data sources. <br/> - Regularly review and update external dependencies like libraries and APIs. <br/> - Ensure compatibility between ADF and other integrated services. |
451442
| **Security Patches** | Apply security patches promptly to protect against vulnerabilities. | - Monitor security advisories and apply patches to ADF and related services. <br/> - Implement a patch management process to ensure timely updates. <br/> - Conduct regular security assessments to identify and address vulnerabilities. |
452443

453-
### Performance Tuning
454-
>
444+
## Performance Tuning
445+
455446
> Continuously monitor and tune performance.
456447
457448
| **Best Practice** | **Description** | **Example** |
@@ -471,6 +462,14 @@ graph TD
471462
- [A categorized list of Azure Data Factory tutorials by scenarios](https://learn.microsoft.com/en-us/azure/data-factory/data-factory-tutorials)
472463
- [Full list of Data Factory trainings](https://learn.microsoft.com/en-us/training/browse/?expanded=azure&products=azure-data-factory)
473464

465+
## Architecture examples
466+
467+
> Consider lakehouse or warehouse for storage:
468+
469+
<img width="550" alt="image" src="https://github.com/user-attachments/assets/42bbf7f5-eb6d-455b-886d-d8f665f0dfa0">
470+
471+
<img width="550" alt="image" src="https://github.com/user-attachments/assets/2c06eaaf-3689-48f3-8e97-7c9c128800d9">
472+
474473
<div align="center">
475474
<h3 style="color: #4CAF50;">Total Visitors</h3>
476475
<img src="https://profile-counter.glitch.me/brown9804/count.svg" alt="Visitor Count" style="border: 2px solid #4CAF50; border-radius: 5px; padding: 5px;"/>

0 commit comments

Comments
 (0)