Bug description
I tried to set up deduplication between two scanners (in my case between Nessus and Nuclei).
My config settings.dist.py:
DEDUPLICATION_ALGORITHM_PER_PARSER = {
'Tenable Scan': DEDUPE_ALGO_HASH_CODE,
'Nuclei Scan': DEDUPE_ALGO_HASH_CODE,
}
HASHCODE_FIELDS_PER_SCANNER = {
'Tenable Scan': ['component_name', 'severity'],
'Nuclei Scan': ['component_name', 'severity'],
}
HASH_CODE_FIELDS_ALWAYS = []
DEDUPE_ALGO_ENDPOINT_FIELDS = ['host', 'port', 'path']
I loaded test findings with the same component_name, severity and the same host, port and path into the endpoint. I noticed that the hash_code is the same in both findings, but finding is not set as a duplicate after starting deduplication. Logs from dojo.specific-loggers.deduplication end after this line:
django-defectdojo-celeryworker-1 | [11/May/2024 16:22:42] DEBUG [dojo.specific-loggers.deduplication:256] Starting deduplication by endpoint fields for finding 40958 with urls [DecodedURL(url=URL.from_text('tcp://10.20.197.218:6379'))] and finding 40957 with urls [DecodedURL(url=URL.from_text('10.20.197.218:6379'))]
The only difference was in the endpoints. As a result of the debug, I discovered that django-DefectDojo/dojo/utils.py function are_urls_equal returns False. The thing is that endpoints parsed by the python hyperlink module (hyperlink.parse(str(e))) are passed into the function, and if the endpoint does not have a scheme, then it is parsed incorrectly.
# endpoint with scheme
>>> import hyperlink
>>> e = hyperlink.parse("tcp://10.20.197.218:6379")
>>> e.scheme
'tcp'
>>> e.host
'10.20.197.218'
>>> e.port
6379
# endpoint without scheme
>>> e = hyperlink.parse("10.20.197.218:6379")
>>> e.scheme
'10.20.197.218'
>>> e.host
''
>>> e.port
>>> e.path
('6379',)
I think this can be fixed by adding // to the beginning of the endpoint if scheme is missing in the django-DefectDojo/dojo/utils.py get_endpoints_as_url function, as is done in the tools parsers.
Steps to reproduce
Steps to reproduce the behavior:
- Set up the config as in my description
- Enable deduplication
- Import added examples with redis from scanners Nuclei and Nessus
- Deduplication won't work
Logs
django-defectdojo-uwsgi-1 | [11/May/2024 16:15:22] DEBUG [titlecase:201] Redis - Default Logins
django-defectdojo-uwsgi-1 | [11/May/2024 16:15:22] DEBUG [dojo.specific-loggers.deduplication:2074] using HASHCODE_FIELDS_PER_SCANNER for test_type.name: Nuclei Scan
django-defectdojo-uwsgi-1 | [11/May/2024 16:15:22] DEBUG [dojo.specific-loggers.deduplication:2082] HASHCODE_FIELDS_PER_SCANNER is: ['severity', 'component_name']
django-defectdojo-uwsgi-1 | [11/May/2024 16:15:22] DEBUG [dojo.specific-loggers.deduplication:2091] using HASHCODE_ALLOWS_NULL_CWE for test_type.name: Nuclei Scan
django-defectdojo-uwsgi-1 | [11/May/2024 16:15:22] DEBUG [dojo.specific-loggers.deduplication:2099] HASHCODE_ALLOWS_NULL_CWE is: True
django-defectdojo-uwsgi-1 | [11/May/2024 16:15:22] DEBUG [dojo.specific-loggers.deduplication:2633] computing hash_code for finding id 40957 based on: severity, component_name
django-defectdojo-uwsgi-1 | [11/May/2024 16:15:22] DEBUG [dojo.specific-loggers.deduplication:2650] severity : Critical
django-defectdojo-uwsgi-1 | [11/May/2024 16:15:22] DEBUG [dojo.specific-loggers.deduplication:2650] component_name : redis
django-defectdojo-uwsgi-1 | [11/May/2024 16:15:22] DEBUG [dojo.specific-loggers.deduplication:2651] compute_hash_code - fields_to_hash = Criticalredis
django-defectdojo-uwsgi-1 | [11/May/2024 16:15:22] DEBUG [dojo.models:2734] fields_to_hash : Criticalredis
django-defectdojo-uwsgi-1 | [11/May/2024 16:15:22] DEBUG [dojo.models:2735] fields_to_hash lower: criticalredis
django-defectdojo-uwsgi-1 | [11/May/2024 16:15:22] DEBUG [dojo.specific-loggers.deduplication:2999] Hash_code computed for finding: 56d70dd7468d3a76c5282c21dfb6d96dfc41e0e87b087946e20b612df22da60d
django-defectdojo-uwsgi-1 | [11/May/2024 16:15:22] DEBUG [dojo.models:3028] Saving finding of id 40957 dedupe_option:True (self.pk is not None)
...
django-defectdojo-celeryworker-1 | [11/May/2024 16:15:22] DEBUG [dojo.specific-loggers.deduplication:282] dedupe for: 40957:Redis - Default Logins
django-defectdojo-uwsgi-1 | [11/May/2024 16:15:22] DEBUG [dojo.notifications.helper:78] creating personal notifications for event: test_added
django-defectdojo-celeryworker-1 | [11/May/2024 16:15:22] DEBUG [dojo.specific-loggers.deduplication:2057] using DEDUPLICATION_ALGORITHM_PER_PARSER for test_type.name: Nuclei Scan
django-defectdojo-celeryworker-1 | [11/May/2024 16:15:22] DEBUG [dojo.specific-loggers.deduplication:2065] DEDUPLICATION_ALGORITHM_PER_PARSER is: hash_code
django-defectdojo-celeryworker-1 | [11/May/2024 16:15:22] DEBUG [dojo.specific-loggers.deduplication:285] deduplication algorithm: hash_code
django-defectdojo-uwsgi-1 | [11/May/2024 16:15:22] DEBUG [dojo.notifications.helper:93] Filtering users for the product Deduplication Test
django-defectdojo-celeryworker-1 | [11/May/2024 16:15:22] DEBUG [dojo.specific-loggers.deduplication:469] Found 0 findings with the same hash_code
...
django-defectdojo-uwsgi-1 | [11/May/2024 16:22:42] DEBUG [titlecase:201] Redis Server Unprotected by Password Authentication
django-defectdojo-uwsgi-1 | [11/May/2024 16:22:42] DEBUG [dojo.specific-loggers.deduplication:2074] using HASHCODE_FIELDS_PER_SCANNER for test_type.name: Tenable Scan
django-defectdojo-uwsgi-1 | [11/May/2024 16:22:42] DEBUG [dojo.specific-loggers.deduplication:2082] HASHCODE_FIELDS_PER_SCANNER is: ['severity', 'component_name']
django-defectdojo-uwsgi-1 | [11/May/2024 16:22:42] DEBUG [dojo.specific-loggers.deduplication:2091] using HASHCODE_ALLOWS_NULL_CWE for test_type.name: Tenable Scan
django-defectdojo-uwsgi-1 | [11/May/2024 16:22:42] DEBUG [dojo.specific-loggers.deduplication:2099] HASHCODE_ALLOWS_NULL_CWE is: True
django-defectdojo-uwsgi-1 | [11/May/2024 16:22:42] DEBUG [dojo.specific-loggers.deduplication:2633] computing hash_code for finding id 40958 based on: severity, component_name
django-defectdojo-uwsgi-1 | [11/May/2024 16:22:42] DEBUG [dojo.specific-loggers.deduplication:2650] severity : Critical
django-defectdojo-uwsgi-1 | [11/May/2024 16:22:42] DEBUG [dojo.specific-loggers.deduplication:2650] component_name : redis
django-defectdojo-uwsgi-1 | [11/May/2024 16:22:42] DEBUG [dojo.specific-loggers.deduplication:2651] compute_hash_code - fields_to_hash = Criticalredis
django-defectdojo-uwsgi-1 | [11/May/2024 16:22:42] DEBUG [dojo.models:2734] fields_to_hash : Criticalredis
django-defectdojo-uwsgi-1 | [11/May/2024 16:22:42] DEBUG [dojo.models:2735] fields_to_hash lower: criticalredis
django-defectdojo-uwsgi-1 | [11/May/2024 16:22:42] DEBUG [dojo.specific-loggers.deduplication:2999] Hash_code computed for finding: 56d70dd7468d3a76c5282c21dfb6d96dfc41e0e87b087946e20b612df22da60d
django-defectdojo-uwsgi-1 | [11/May/2024 16:22:42] DEBUG [dojo.models:3028] Saving finding of id 40958 dedupe_option:True (self.pk is not None)
...
django-defectdojo-celeryworker-1 | [11/May/2024 16:22:42] DEBUG [dojo.specific-loggers.deduplication:282] dedupe for: 40958:Redis Server Unprotected by Password Authentication
django-defectdojo-celeryworker-1 | [11/May/2024 16:22:42] DEBUG [dojo.specific-loggers.deduplication:2057] using DEDUPLICATION_ALGORITHM_PER_PARSER for test_type.name: Tenable Scan
django-defectdojo-celeryworker-1 | [11/May/2024 16:22:42] DEBUG [dojo.specific-loggers.deduplication:2065] DEDUPLICATION_ALGORITHM_PER_PARSER is: hash_code
django-defectdojo-celeryworker-1 | [11/May/2024 16:22:42] DEBUG [dojo.specific-loggers.deduplication:285] deduplication algorithm: hash_code
django-defectdojo-celeryworker-1 | [11/May/2024 16:22:42] DEBUG [dojo.specific-loggers.deduplication:469] Found 1 findings with the same hash_code
django-defectdojo-celeryworker-1 | [11/May/2024 16:22:42] DEBUG [dojo.specific-loggers.deduplication:256] Starting deduplication by endpoint fields for finding 40958 with urls [DecodedURL(url=URL.from_text('tcp://10.20.197.218:6379'))] and finding 40957 with urls [DecodedURL(url=URL.from_text('10.20.197.218:6379'))]
django-defectdojo-celeryworker-1 | [11/May/2024 16:22:42] DEBUG [dojo.specific-loggers.deduplication:215] Check if url tcp://10.20.197.218:6379 and url 10.20.197.218:6379 are equal in terms of ['host', 'port', 'path'].
Sample scan files
redis_dedupe_examlpe.nuclei.json
redis_dedupe_example.nessus.csv
Bug description
I tried to set up deduplication between two scanners (in my case between Nessus and Nuclei).
My config
settings.dist.py:I loaded test findings with the same component_name, severity and the same host, port and path into the endpoint. I noticed that the hash_code is the same in both findings, but finding is not set as a duplicate after starting deduplication. Logs from
dojo.specific-loggers.deduplicationend after this line:The only difference was in the endpoints. As a result of the debug, I discovered that
django-DefectDojo/dojo/utils.pyfunctionare_urls_equalreturns False. The thing is that endpoints parsed by the python hyperlink module (hyperlink.parse(str(e))) are passed into the function, and if the endpoint does not have a scheme, then it is parsed incorrectly.I think this can be fixed by adding
//to the beginning of the endpoint if scheme is missing in thedjango-DefectDojo/dojo/utils.pyget_endpoints_as_urlfunction, as is done in the tools parsers.Steps to reproduce
Steps to reproduce the behavior:
Logs
Sample scan files
redis_dedupe_examlpe.nuclei.json
redis_dedupe_example.nessus.csv