You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/content/import_data/import_intro/import_vs_reimport.md
+8-2Lines changed: 8 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,5 @@
1
1
---
2
-
title: "Import vs Reimport"
2
+
title: "Reimport"
3
3
description: "Learn how to import data manually, through the API, or via a connector"
4
4
weight: 2
5
5
aliases:
@@ -80,7 +80,13 @@ This header indicates the actions taken by an Import/Reimport.
80
80
***\# left untouched shows the count of Open Findings which were unchanged by a Reimport (because they also existed in the incoming report).**
81
81
***\#****reactivated** shows any Closed Findings which were reopened by an incoming Reimport.
82
82
83
-
## Reimport via API \- special note
83
+
## Reimport Deduplication
84
+
85
+
Reimport decides whether an incoming item matches an existing Finding using **[Reimport Deduplication](/triage_findings/finding_deduplication/about_deduplication/)** settings. This is separate from “Same Tool Deduplication” and “Cross Tool Deduplication,” which operate after Findings exist.
86
+
87
+
If you are seeing Reimport close old Findings and create new Findings when only a minor attribute changes (for example, a line number shift), tune **Reimport Deduplication** for that tool to use stable identifiers that ignore those attributes (such as Unique ID From Tool).
88
+
89
+
## Reimport via API - special note
84
90
85
91
Note that the /reimport API endpoint can both **extend an existing Test** (apply the method in this article) **or create a new Test** with new data \- an initial call to `/import`, or setting up a Test in advance is not required.
Copy file name to clipboardExpand all lines: docs/content/triage_findings/finding_deduplication/OS__deduplication_tuning.md
+8-1Lines changed: 8 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,5 @@
1
1
---
2
-
title: "Deduplication Tuning"
2
+
title: "Deduplication Tuning (Open Source)"
3
3
description: "Configure deduplication in DefectDojo Open Source: algorithms, hash fields, endpoints, and service"
4
4
weight: 5
5
5
audience: opensource
@@ -106,6 +106,10 @@ Notes:
106
106
107
107
## After changing deduplication settings
108
108
109
+
After changing algorithms or Hash computation, you will need to **recompute hashes** for the affected parser/test type before the new matching behavior will apply consistently across existing data.
110
+
111
+
Note: Recomputing hashes can be lead to on large instances. Plan maintenance windows accordingly.
112
+
109
113
- Changes to dedupe configuration (e.g., `HASHCODE_FIELDS_PER_SCANNER`, `HASH_CODE_FIELDS_ALWAYS`, `DEDUPLICATION_ALGORITHM_PER_PARSER`) are not applied retroactively automatically. To re-evaluate existing findings you must run the management command below.
110
114
111
115
Run inside the uwsgi container. Example (hash codes only, no dedupe):
@@ -141,3 +145,6 @@ To help troubleshooting deduplication use the following tools:
141
145

142
146
143
147

Deduplication Tuning is a DefectDojo Pro feature that gives you fine-grained control over how findings are deduplicated, allowing you to optimize duplicate detection for your specific security testing workflow.
10
11
11
12
## Deduplication Settings
@@ -41,6 +42,8 @@ Uses a combination of selected fields to generate a unique hash. When selected,
41
42
#### Unique ID From Tool
42
43
Leverages the security tool's own internal identifier for findings, ensuring perfect deduplication when the scanner provides reliable unique IDs.
43
44
45
+
This algorithm can be useful when working with SAST scanners, or situations where a Finding can "move around" in source code as development progresses.
46
+
44
47
#### Unique ID From Tool or Hash Code
45
48
Attempts to use the tool's unique ID first, then falls back to the hash code if no unique ID is available. This provides the most flexible deduplication option.
46
49
@@ -60,7 +63,11 @@ Unlike Same Tool Deduplication, Cross Tool Deduplication only supports the Hash
60
63
61
64
## Reimport Deduplication
62
65
63
-
Reimport Deduplication Settings are specifically designed for reimporting data using Universal Parsers or the Generic Parser.
66
+
**⚠️ Reimport processes can completely discard Findings before they are recorded. This can lead to data loss if set incorrectly, so Reimport Deduplication settings should be adjusted with caution.**
67
+
68
+
Reimport Deduplication Settings can be used to set an algorithm for Universal Parsers, or for a Generic Findings Import Parser.
69
+
70
+
Reimport Deduplication cannot be adjusted for other tools by default. Users who want to adjust the Reimport Deduplication algorithm for other tools in their instance should reach out to [DefectDojo Support](mailto:support@defectdojo.com) for assistance.
64
71
65
72

66
73
@@ -74,6 +81,8 @@ The same three algorithm options are available for Reimport Deduplication as for
74
81
- Unique ID From Tool
75
82
- Unique ID From Tool or Hash Code
76
83
84
+
Reimport can completely discard Findings before they are recorded, so Reimport Deduplication settings should be adjusted with caution.
85
+
77
86
## Deduplication Best Practices
78
87
79
88
For optimal results with Deduplication Tuning:
@@ -85,3 +94,7 @@ For optimal results with Deduplication Tuning:
85
94
-**Avoid overly broad deduplication**: Cross-tool deduplication with too few hash fields may result in false duplicates
86
95
87
96
By tuning deduplication settings to your specific tools, you can significantly reduce duplicate noise.
97
+
98
+
## Locked Findings
99
+
100
+
Whenever Deduplication Settings are changed for a given tool, Deduplication hashes will need to be re-calculated for that tool across the entire DefectDojo instance. During this process, Findings of this tool will be "locked", and their Deduplication Algorithm cannot not be changed again until the recalculation is complete.
Copy file name to clipboardExpand all lines: docs/content/triage_findings/finding_deduplication/about_deduplication.md
+32-5Lines changed: 32 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -26,13 +26,29 @@ By default, these Tests would need to be nested under the same Product for Dedup
26
26
27
27
Duplicate Findings are set as Inactive by default. This does not mean the Duplicate Finding itself is Inactive. Rather, this is so that your team only has a single active Finding to work on and remediate, with the implication being that once the original Finding is Mitigated, the Duplicates will also be Mitigated.
28
28
29
-
## Deduplication vs Reimport
29
+
## Reimport Deduplication
30
30
31
-
Deduplication and Reimport are similar processes but they have a key difference:
31
+
Deduplication and Reimport are similar processes, but they use different algorithms to identify Finding matches.
32
32
33
-
* When you Reimport to a Test, the Reimport process looks at incoming Findings, **filters and****discards any matches**. Those matches will never be created as Findings or Finding Duplicates.
34
-
* Deduplication is applied 'passively' on Findings that have already been created. It will identify duplicates in scope and **label them**, but it will not delete or discard the Finding unless 'Delete Deduplicate Findings' is enabled.
35
-
* The 'reimport' action of discarding a Finding always happens before deduplication; DefectDojo **cannot deduplicate Findings that are never created** as a result of Reimport's filtering.
33
+
* When you Reimport to a Test, the Reimport process looks at incoming Findings, **compares hash codes, and then discards any matches**. Those matches will never be created as Findings or Finding Duplicates.
34
+
35
+
However, any Findings that remain after Reimport Deduplication are still subject to Same-Tool Deduplication. So if you use narrower a scope for Same-Tool Deduplication, you can end up with Duplicates within a Reimport pipeline.
36
+
37
+
### Example
38
+
39
+
Here's a tool with a Reimport Deduplication algorithm which is different from the Same-Tool Deduplication algorithm.
40
+
41
+
| Deduplication Algorithm | Hash Code Fields |
42
+
| ----- | ---- |
43
+
| Reimport | Title, CWE, Severity, Description, Line Number |
44
+
| Same-Tool | Title, CWE, Severity, Description |
45
+
46
+
Let's say you had a Finding in DefectDojo with a given line number. You re-scanned your environment and the line number of that vulnerability changed. You reimport to the same Test. Here's what will happen during reimport, and deduplication:
47
+
48
+
* During Reimport, the Finding will not be matched to any Findings that already exist, because the line number is different. So a new Finding will be created in the Test.
49
+
* After Reimport is complete, the Same-Tool Deduplication algorithm will run. Same-Tool Deduplication does not consider line number in this configuration, so the new Finding will be labelled as a duplicate.
50
+
51
+
Reimport can completely discard Findings before they are recorded, so Reimport Deduplication settings should be adjusted with caution.
36
52
37
53
## When are duplicates appropriate?
38
54
@@ -119,3 +135,14 @@ For example, let’s say that you had your Maximum Duplicates field set to ‘1
119
135
### Applying this setting
120
136
121
137
Applying **Delete Deduplicate Findings** will begin a deletion process immediately. This setting can be applied on the **System Settings** page. See Enabling Deduplication for more information.
138
+
139
+
## Troubleshooting Deduplication
140
+
141
+
Sometimes, Deduplication does not work as expected. Here are some examples of ways that Deduplication might not be working correctly, along with possible solutions.
142
+
143
+
| What you see | Most likely cause | What to tune |
144
+
| --- | --- | --- |
145
+
| Reimport closes an old Finding and creates a new one when only the line number changed | Reimport matching uses unstable fields (for example, line number) | <strong>Reimport Deduplication</strong> (prefer stable IDs or stable hash fields) |
146
+
| Multiple Findings are created in the same Test that you believe should be duplicates | Deduplication matching is not configured for that tool or scope | <strong>Same Tool Deduplication</strong> (and consider “Delete Deduplicate Findings” behavior) |
147
+
| Duplicates are created across different tools | Cross-tool matching is disabled or too strict | <strong>Cross Tool Deduplication (Pro only)</strong> (hash-based matching) |
148
+
| Excess duplicates of the same Finding are being created, across Tests | Asset Hierarchy is not set up correctly |[Consider Reimport for continual testing](/triage_findings/finding_deduplication/avoid_excess_duplicates/)|
One of DefectDojo’s strengths is that the data model can accommodate many different use\-cases and applications. You’ll likely change your approach as you master the software and discover ways to optimize your workflow.
8
+
One of DefectDojo’s strengths is that the data model can accommodate many different use-cases and applications. You’ll likely change your approach as you master the software and discover ways to optimize your workflow.
9
9
10
10
By default, DefectDojo does not delete any duplicate Findings that are created. Each Finding is considered to be a separate instance of a vulnerability. So in this case, **Duplicate Findings** can be an indicator that a process change is required to your workflow.
0 commit comments