The DEDUPE_ALGO_UNIQUE_ID_FROM_TOOL_OR_HASH_CODE algorithm looks at existing findings with the same unique_id_from_tool or hash_code value and assesses if the new/current finding is a duplicate of one of those findings.
What happens is that only the first possible candidate is considered. That candidated is selected as the original if the endpoints are also matching. If these do not match, the deduplication is stopped and the finding is not marked as a duplicate.
What should happen is that the algorithm should continue with the next finding from the list of findings with the same unique_id_from_tool or hash_code value. There might be one that does have identical endpoints and the current finding is a duplicate of that existing finding.
The code where it "stops" processing is this break statement at the end:
|
def deduplicate_uid_or_hash_code(new_finding): |
|
if new_finding.test.engagement.deduplication_on_engagement: |
|
existing_findings = Finding.objects.filter( |
|
(Q(hash_code__isnull=False) & Q(hash_code=new_finding.hash_code)) |
|
# unique_id_from_tool can only apply to the same test_type because it is parser dependent |
|
| (Q(unique_id_from_tool__isnull=False) & Q(unique_id_from_tool=new_finding.unique_id_from_tool) & Q(test__test_type=new_finding.test.test_type)), |
|
test__engagement=new_finding.test.engagement).exclude( |
|
id=new_finding.id).exclude( |
|
duplicate=True).order_by("id") |
|
else: |
|
# same without "test__engagement=new_finding.test.engagement" condition |
|
existing_findings = Finding.objects.filter( |
|
(Q(hash_code__isnull=False) & Q(hash_code=new_finding.hash_code)) |
|
| (Q(unique_id_from_tool__isnull=False) & Q(unique_id_from_tool=new_finding.unique_id_from_tool) & Q(test__test_type=new_finding.test.test_type)), |
|
test__engagement__product=new_finding.test.engagement.product).exclude( |
|
id=new_finding.id).exclude( |
|
duplicate=True).order_by("id") |
|
deduplicationLogger.debug("Found " |
|
+ str(len(existing_findings)) + " findings with either the same unique_id_from_tool or hash_code") |
|
for find in existing_findings: |
|
if is_deduplication_on_engagement_mismatch(new_finding, find): |
|
deduplicationLogger.debug( |
|
"deduplication_on_engagement_mismatch, skipping dedupe.") |
|
continue |
|
try: |
|
if are_endpoints_duplicates(new_finding, find): |
|
set_duplicate(new_finding, find) |
|
except Exception as e: |
|
deduplicationLogger.debug(str(e)) |
|
continue |
|
break |
The
DEDUPE_ALGO_UNIQUE_ID_FROM_TOOL_OR_HASH_CODEalgorithm looks at existing findings with the sameunique_id_from_toolorhash_codevalue and assesses if the new/current finding is a duplicate of one of those findings.What happens is that only the first possible candidate is considered. That candidated is selected as the
originalif the endpoints are also matching. If these do not match, the deduplication is stopped and the finding is not marked as a duplicate.What should happen is that the algorithm should continue with the next finding from the list of findings with the same
unique_id_from_toolorhash_codevalue. There might be one that does have identical endpoints and the current finding is a duplicate of that existing finding.The code where it "stops" processing is this
breakstatement at the end:django-DefectDojo/dojo/utils.py
Lines 493 to 523 in 7c0d92a