Skip to content

UNIQUE_ID_FROM_TOOL_OR_HASH_CODE only consider the first possible match when deduplicating #13497

@valentijnscholten

Description

@valentijnscholten

The DEDUPE_ALGO_UNIQUE_ID_FROM_TOOL_OR_HASH_CODE algorithm looks at existing findings with the same unique_id_from_tool or hash_code value and assesses if the new/current finding is a duplicate of one of those findings.

What happens is that only the first possible candidate is considered. That candidated is selected as the original if the endpoints are also matching. If these do not match, the deduplication is stopped and the finding is not marked as a duplicate.

What should happen is that the algorithm should continue with the next finding from the list of findings with the same unique_id_from_tool or hash_code value. There might be one that does have identical endpoints and the current finding is a duplicate of that existing finding.

The code where it "stops" processing is this break statement at the end:

def deduplicate_uid_or_hash_code(new_finding):
if new_finding.test.engagement.deduplication_on_engagement:
existing_findings = Finding.objects.filter(
(Q(hash_code__isnull=False) & Q(hash_code=new_finding.hash_code))
# unique_id_from_tool can only apply to the same test_type because it is parser dependent
| (Q(unique_id_from_tool__isnull=False) & Q(unique_id_from_tool=new_finding.unique_id_from_tool) & Q(test__test_type=new_finding.test.test_type)),
test__engagement=new_finding.test.engagement).exclude(
id=new_finding.id).exclude(
duplicate=True).order_by("id")
else:
# same without "test__engagement=new_finding.test.engagement" condition
existing_findings = Finding.objects.filter(
(Q(hash_code__isnull=False) & Q(hash_code=new_finding.hash_code))
| (Q(unique_id_from_tool__isnull=False) & Q(unique_id_from_tool=new_finding.unique_id_from_tool) & Q(test__test_type=new_finding.test.test_type)),
test__engagement__product=new_finding.test.engagement.product).exclude(
id=new_finding.id).exclude(
duplicate=True).order_by("id")
deduplicationLogger.debug("Found "
+ str(len(existing_findings)) + " findings with either the same unique_id_from_tool or hash_code")
for find in existing_findings:
if is_deduplication_on_engagement_mismatch(new_finding, find):
deduplicationLogger.debug(
"deduplication_on_engagement_mismatch, skipping dedupe.")
continue
try:
if are_endpoints_duplicates(new_finding, find):
set_duplicate(new_finding, find)
except Exception as e:
deduplicationLogger.debug(str(e))
continue
break

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions