Skip to content

cross-scanner deduplication incorrect endpoint parsing #10215

@macsoun

Description

@macsoun

Bug description
I tried to set up deduplication between two scanners (in my case between Nessus and Nuclei).
My config settings.dist.py:

DEDUPLICATION_ALGORITHM_PER_PARSER = {
    'Tenable Scan': DEDUPE_ALGO_HASH_CODE,
    'Nuclei Scan': DEDUPE_ALGO_HASH_CODE,
}

HASHCODE_FIELDS_PER_SCANNER = {
    'Tenable Scan': ['component_name', 'severity'],
    'Nuclei Scan': ['component_name', 'severity'],
} 

HASH_CODE_FIELDS_ALWAYS = []
DEDUPE_ALGO_ENDPOINT_FIELDS = ['host', 'port', 'path']

I loaded test findings with the same component_name, severity and the same host, port and path into the endpoint. I noticed that the hash_code is the same in both findings, but finding is not set as a duplicate after starting deduplication. Logs from dojo.specific-loggers.deduplication end after this line:

django-defectdojo-celeryworker-1  | [11/May/2024 16:22:42] DEBUG [dojo.specific-loggers.deduplication:256] Starting deduplication by endpoint fields for finding 40958 with urls [DecodedURL(url=URL.from_text('tcp://10.20.197.218:6379'))] and finding 40957 with urls [DecodedURL(url=URL.from_text('10.20.197.218:6379'))]

The only difference was in the endpoints. As a result of the debug, I discovered that django-DefectDojo/dojo/utils.py function are_urls_equal returns False. The thing is that endpoints parsed by the python hyperlink module (hyperlink.parse(str(e))) are passed into the function, and if the endpoint does not have a scheme, then it is parsed incorrectly.

# endpoint with scheme
>>> import hyperlink
>>> e = hyperlink.parse("tcp://10.20.197.218:6379")
>>> e.scheme
'tcp'
>>> e.host
'10.20.197.218'
>>> e.port
6379

# endpoint without scheme
>>> e = hyperlink.parse("10.20.197.218:6379")
>>> e.scheme
'10.20.197.218'
>>> e.host
''
>>> e.port
>>> e.path
('6379',)

I think this can be fixed by adding // to the beginning of the endpoint if scheme is missing in the django-DefectDojo/dojo/utils.py get_endpoints_as_url function, as is done in the tools parsers.

Steps to reproduce
Steps to reproduce the behavior:

  1. Set up the config as in my description
  2. Enable deduplication
  3. Import added examples with redis from scanners Nuclei and Nessus
  4. Deduplication won't work

Logs

django-defectdojo-uwsgi-1         | [11/May/2024 16:15:22] DEBUG [titlecase:201] Redis - Default Logins
django-defectdojo-uwsgi-1         | [11/May/2024 16:15:22] DEBUG [dojo.specific-loggers.deduplication:2074] using HASHCODE_FIELDS_PER_SCANNER for test_type.name: Nuclei Scan
django-defectdojo-uwsgi-1         | [11/May/2024 16:15:22] DEBUG [dojo.specific-loggers.deduplication:2082] HASHCODE_FIELDS_PER_SCANNER is: ['severity', 'component_name']
django-defectdojo-uwsgi-1         | [11/May/2024 16:15:22] DEBUG [dojo.specific-loggers.deduplication:2091] using HASHCODE_ALLOWS_NULL_CWE for test_type.name: Nuclei Scan
django-defectdojo-uwsgi-1         | [11/May/2024 16:15:22] DEBUG [dojo.specific-loggers.deduplication:2099] HASHCODE_ALLOWS_NULL_CWE is: True
django-defectdojo-uwsgi-1         | [11/May/2024 16:15:22] DEBUG [dojo.specific-loggers.deduplication:2633] computing hash_code for finding id 40957 based on: severity, component_name
django-defectdojo-uwsgi-1         | [11/May/2024 16:15:22] DEBUG [dojo.specific-loggers.deduplication:2650] severity : Critical
django-defectdojo-uwsgi-1         | [11/May/2024 16:15:22] DEBUG [dojo.specific-loggers.deduplication:2650] component_name : redis
django-defectdojo-uwsgi-1         | [11/May/2024 16:15:22] DEBUG [dojo.specific-loggers.deduplication:2651] compute_hash_code - fields_to_hash = Criticalredis
django-defectdojo-uwsgi-1         | [11/May/2024 16:15:22] DEBUG [dojo.models:2734] fields_to_hash      : Criticalredis
django-defectdojo-uwsgi-1         | [11/May/2024 16:15:22] DEBUG [dojo.models:2735] fields_to_hash lower: criticalredis
django-defectdojo-uwsgi-1         | [11/May/2024 16:15:22] DEBUG [dojo.specific-loggers.deduplication:2999] Hash_code computed for finding: 56d70dd7468d3a76c5282c21dfb6d96dfc41e0e87b087946e20b612df22da60d
django-defectdojo-uwsgi-1         | [11/May/2024 16:15:22] DEBUG [dojo.models:3028] Saving finding of id 40957 dedupe_option:True (self.pk is not None)
...
django-defectdojo-celeryworker-1  | [11/May/2024 16:15:22] DEBUG [dojo.specific-loggers.deduplication:282] dedupe for: 40957:Redis - Default Logins
django-defectdojo-uwsgi-1         | [11/May/2024 16:15:22] DEBUG [dojo.notifications.helper:78] creating personal notifications for event: test_added
django-defectdojo-celeryworker-1  | [11/May/2024 16:15:22] DEBUG [dojo.specific-loggers.deduplication:2057] using DEDUPLICATION_ALGORITHM_PER_PARSER for test_type.name: Nuclei Scan
django-defectdojo-celeryworker-1  | [11/May/2024 16:15:22] DEBUG [dojo.specific-loggers.deduplication:2065] DEDUPLICATION_ALGORITHM_PER_PARSER is: hash_code
django-defectdojo-celeryworker-1  | [11/May/2024 16:15:22] DEBUG [dojo.specific-loggers.deduplication:285] deduplication algorithm: hash_code
django-defectdojo-uwsgi-1         | [11/May/2024 16:15:22] DEBUG [dojo.notifications.helper:93] Filtering users for the product Deduplication Test
django-defectdojo-celeryworker-1  | [11/May/2024 16:15:22] DEBUG [dojo.specific-loggers.deduplication:469] Found 0 findings with the same hash_code
...
django-defectdojo-uwsgi-1         | [11/May/2024 16:22:42] DEBUG [titlecase:201] Redis Server Unprotected by Password Authentication
django-defectdojo-uwsgi-1         | [11/May/2024 16:22:42] DEBUG [dojo.specific-loggers.deduplication:2074] using HASHCODE_FIELDS_PER_SCANNER for test_type.name: Tenable Scan
django-defectdojo-uwsgi-1         | [11/May/2024 16:22:42] DEBUG [dojo.specific-loggers.deduplication:2082] HASHCODE_FIELDS_PER_SCANNER is: ['severity', 'component_name']
django-defectdojo-uwsgi-1         | [11/May/2024 16:22:42] DEBUG [dojo.specific-loggers.deduplication:2091] using HASHCODE_ALLOWS_NULL_CWE for test_type.name: Tenable Scan
django-defectdojo-uwsgi-1         | [11/May/2024 16:22:42] DEBUG [dojo.specific-loggers.deduplication:2099] HASHCODE_ALLOWS_NULL_CWE is: True
django-defectdojo-uwsgi-1         | [11/May/2024 16:22:42] DEBUG [dojo.specific-loggers.deduplication:2633] computing hash_code for finding id 40958 based on: severity, component_name
django-defectdojo-uwsgi-1         | [11/May/2024 16:22:42] DEBUG [dojo.specific-loggers.deduplication:2650] severity : Critical
django-defectdojo-uwsgi-1         | [11/May/2024 16:22:42] DEBUG [dojo.specific-loggers.deduplication:2650] component_name : redis
django-defectdojo-uwsgi-1         | [11/May/2024 16:22:42] DEBUG [dojo.specific-loggers.deduplication:2651] compute_hash_code - fields_to_hash = Criticalredis
django-defectdojo-uwsgi-1         | [11/May/2024 16:22:42] DEBUG [dojo.models:2734] fields_to_hash      : Criticalredis
django-defectdojo-uwsgi-1         | [11/May/2024 16:22:42] DEBUG [dojo.models:2735] fields_to_hash lower: criticalredis
django-defectdojo-uwsgi-1         | [11/May/2024 16:22:42] DEBUG [dojo.specific-loggers.deduplication:2999] Hash_code computed for finding: 56d70dd7468d3a76c5282c21dfb6d96dfc41e0e87b087946e20b612df22da60d
django-defectdojo-uwsgi-1         | [11/May/2024 16:22:42] DEBUG [dojo.models:3028] Saving finding of id 40958 dedupe_option:True (self.pk is not None)
...
django-defectdojo-celeryworker-1  | [11/May/2024 16:22:42] DEBUG [dojo.specific-loggers.deduplication:282] dedupe for: 40958:Redis Server Unprotected by Password Authentication
django-defectdojo-celeryworker-1  | [11/May/2024 16:22:42] DEBUG [dojo.specific-loggers.deduplication:2057] using DEDUPLICATION_ALGORITHM_PER_PARSER for test_type.name: Tenable Scan
django-defectdojo-celeryworker-1  | [11/May/2024 16:22:42] DEBUG [dojo.specific-loggers.deduplication:2065] DEDUPLICATION_ALGORITHM_PER_PARSER is: hash_code
django-defectdojo-celeryworker-1  | [11/May/2024 16:22:42] DEBUG [dojo.specific-loggers.deduplication:285] deduplication algorithm: hash_code
django-defectdojo-celeryworker-1  | [11/May/2024 16:22:42] DEBUG [dojo.specific-loggers.deduplication:469] Found 1 findings with the same hash_code
django-defectdojo-celeryworker-1  | [11/May/2024 16:22:42] DEBUG [dojo.specific-loggers.deduplication:256] Starting deduplication by endpoint fields for finding 40958 with urls [DecodedURL(url=URL.from_text('tcp://10.20.197.218:6379'))] and finding 40957 with urls [DecodedURL(url=URL.from_text('10.20.197.218:6379'))]
django-defectdojo-celeryworker-1  | [11/May/2024 16:22:42] DEBUG [dojo.specific-loggers.deduplication:215] Check if url tcp://10.20.197.218:6379 and url 10.20.197.218:6379 are equal in terms of ['host', 'port', 'path'].

Sample scan files
redis_dedupe_examlpe.nuclei.json
redis_dedupe_example.nessus.csv

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions