`UNIQUE_ID_FROM_TOOL_OR_HASH_CODE` only consider the first possible match when deduplicating

The `DEDUPE_ALGO_UNIQUE_ID_FROM_TOOL_OR_HASH_CODE` algorithm looks at existing findings with the same `unique_id_from_tool` or `hash_code` value and assesses if the new/current finding is a duplicate of one of those findings.

What happens is that only the first possible candidate is considered. That candidated is selected as the `original` if the endpoints are also matching. If these do not match, the deduplication is stopped and the finding is not marked as a duplicate.

What should happen is that the algorithm should continue with the next finding from the list of findings with the same `unique_id_from_tool` or `hash_code` value. There might be one that does have identical endpoints and the current finding is a duplicate of that existing finding.

The code where it "stops" processing is this `break` statement at the end:

https://github.com/DefectDojo/django-DefectDojo/blob/7c0d92a36cafd011c3491e965a3915bee7c48d60/dojo/utils.py#L493-L523


	def deduplicate_uid_or_hash_code(new_finding):
	if new_finding.test.engagement.deduplication_on_engagement:
	existing_findings = Finding.objects.filter(
	(Q(hash_code__isnull=False) & Q(hash_code=new_finding.hash_code))
	# unique_id_from_tool can only apply to the same test_type because it is parser dependent
	\| (Q(unique_id_from_tool__isnull=False) & Q(unique_id_from_tool=new_finding.unique_id_from_tool) & Q(test__test_type=new_finding.test.test_type)),
	test__engagement=new_finding.test.engagement).exclude(
	id=new_finding.id).exclude(
	duplicate=True).order_by("id")
	else:
	# same without "test__engagement=new_finding.test.engagement" condition
	existing_findings = Finding.objects.filter(
	(Q(hash_code__isnull=False) & Q(hash_code=new_finding.hash_code))
	\| (Q(unique_id_from_tool__isnull=False) & Q(unique_id_from_tool=new_finding.unique_id_from_tool) & Q(test__test_type=new_finding.test.test_type)),
	test__engagement__product=new_finding.test.engagement.product).exclude(
	id=new_finding.id).exclude(
	duplicate=True).order_by("id")
	deduplicationLogger.debug("Found "
	+ str(len(existing_findings)) + " findings with either the same unique_id_from_tool or hash_code")
	for find in existing_findings:
	if is_deduplication_on_engagement_mismatch(new_finding, find):
	deduplicationLogger.debug(
	"deduplication_on_engagement_mismatch, skipping dedupe.")
	continue
	try:
	if are_endpoints_duplicates(new_finding, find):
	set_duplicate(new_finding, find)
	except Exception as e:
	deduplicationLogger.debug(str(e))
	continue
	break

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`UNIQUE_ID_FROM_TOOL_OR_HASH_CODE` only consider the first possible match when deduplicating #13497

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

UNIQUE_ID_FROM_TOOL_OR_HASH_CODE only consider the first possible match when deduplicating #13497

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`UNIQUE_ID_FROM_TOOL_OR_HASH_CODE` only consider the first possible match when deduplicating #13497