Skip to content

Commit 0b2050c

Browse files
authored
OpenVAS parser improvments (DefectDojo#13214)
* feat: openvas parser version 2 * bufix: fixed openvas linting * bugfix: fixed dryrun findings * bugfix: fix failing tests
1 parent 5b70f65 commit 0b2050c

17 files changed

Lines changed: 1030 additions & 101 deletions

File tree

docs/content/en/connecting_your_tools/parsers/file/openvas.md

Lines changed: 30 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,16 +2,39 @@
22
title: "OpenVAS Parser"
33
toc_hide: true
44
---
5-
You can either upload the exported results of an OpenVAS Scan in a .csv or .xml format.
5+
You can upload the results of an OpenVAS/Greenbone report in either .csv or .xml format.
66

77
### Sample Scan Data
88
Sample OpenVAS scans can be found [here](https://github.com/DefectDojo/django-DefectDojo/tree/master/unittests/scans/openvas).
99

10-
### Default Deduplication Hashcode Fields
11-
By default, DefectDojo identifies duplicate Findings using these [hashcode fields](https://docs.defectdojo.com/en/working_with_findings/finding_deduplication/about_deduplication/):
10+
### Parser versions
11+
The OpenVAS parser has two versions: Version 2 and the legacy version. Only version 2 should be used going forward. This documentation assumes Version 2 going forward.
12+
13+
Version 2 comes with a number of improvements:
14+
- Use of a hash code algorithm for deduplication
15+
- Increased consistency in parsing between the XML and CSV parsers.
16+
- Combined findings where the only differences are in fields that cannot be rehashed due to inconsistent values between scans (e.g. fields containing timestamps or packet IDs). This prevents duplicates if the vulnerability is found multiple times on the same endpoint.
17+
- Increased parser value coverage
18+
- Heuristic for fix_available detection
19+
- Updated mapping to DefectDojo fields compared to version 1.
20+
21+
### Deduplication Algorithm
22+
Default Deduplication Hashcode Fields:
23+
By default, DefectDojo Parser V2 identifies duplicate findings using the following [hashcode fields](https://docs.defectdojo.com/en/working_with_findings/finding_deduplication/about_deduplication/):
1224

1325
- title
14-
- cwe
15-
- line
16-
- file path
17-
- description
26+
- severity
27+
- vuln_id_from_tool
28+
- endpoints
29+
30+
The legacy version (version 1) uses the legacy deduplication algorithm.
31+
32+
### CSV and XML differences and similarityies
33+
The parser attempts to parse XML and CSV files in a similar way. However, this is not always possible. The following lists the differences between the parsers:
34+
35+
- EPSS scores and percentiles are only available in CSV format.
36+
- CVSS vectors are only available in the XML format.
37+
- The CVSS score will always be reported as CVSS v3 in the CSV parser
38+
- The references in the CSV parser will never contain URLs.
39+
40+
If no supported CVSS version is detected, the score (if present) is registered as a CVSS v3 score, even if this is incorrect.

dojo/settings/settings.dist.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1355,6 +1355,7 @@ def saml2_attrib_map_format(din):
13551355
"Qualys Hacker Guardian Scan": ["title", "severity", "description"],
13561356
"Cyberwatch scan (Galeax)": ["title", "description", "severity"],
13571357
"Cycognito Scan": ["title", "severity"],
1358+
"OpenVAS Parser v2": ["title", "severity", "vuln_id_from_tool", "endpoints"],
13581359
}
13591360

13601361
# Override the hardcoded settings here via the env var
@@ -1426,6 +1427,7 @@ def saml2_attrib_map_format(din):
14261427
"HCL AppScan on Cloud SAST XML": True,
14271428
"AWS Inspector2 Scan": True,
14281429
"Cyberwatch scan (Galeax)": True,
1430+
"OpenVAS Parser v2": True,
14291431
}
14301432

14311433
# List of fields that are known to be usable in hash_code computation)
@@ -1612,6 +1614,7 @@ def saml2_attrib_map_format(din):
16121614
"Red Hat Satellite": DEDUPE_ALGO_HASH_CODE,
16131615
"Qualys Hacker Guardian Scan": DEDUPE_ALGO_HASH_CODE,
16141616
"Cyberwatch scan (Galeax)": DEDUPE_ALGO_HASH_CODE,
1617+
"OpenVAS Parser v2": DEDUPE_ALGO_HASH_CODE,
16151618
}
16161619

16171620
# Override the hardcoded settings here via the env var

dojo/tools/factory.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,12 @@ def requires_tool_type(scan_type):
119119
module = import_module(f"dojo.tools.{module_name}.parser")
120120
for attribute_name in dir(module):
121121
attribute = getattr(module, attribute_name)
122-
if isclass(attribute) and attribute_name.lower() == module_name.replace("_", "") + "parser":
122+
# Allow parser class names with optional v[number] suffix (e.g., OpenVASParser, OpenVASParserV2)
123+
expected_base = module_name.replace("_", "") + "parser"
124+
if isclass(attribute) and (
125+
attribute_name.lower() == expected_base or
126+
re.match(rf"^{re.escape(expected_base)}v\d+$", attribute_name.lower())
127+
):
123128
register(attribute)
124129
except:
125130
logger.exception("failed to load %s", module_name)

dojo/tools/openvas/parser.py

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
1-
from dojo.tools.openvas.csv_parser import OpenVASCSVParser
2-
from dojo.tools.openvas.xml_parser import OpenVASXMLParser
1+
from dojo.tools.openvas.parser_v1.csv_parser import OpenVASCSVParser
2+
from dojo.tools.openvas.parser_v1.xml_parser import OpenVASXMLParser
3+
from dojo.tools.openvas.parser_v2.csv_parser import get_findings_from_csv
4+
from dojo.tools.openvas.parser_v2.xml_parser import get_findings_from_xml
35

46

57
class OpenVASParser:
@@ -18,3 +20,21 @@ def get_findings(self, filename, test):
1820
if str(filename.name).endswith(".xml"):
1921
return OpenVASXMLParser().get_findings(filename, test)
2022
return None
23+
24+
25+
class OpenVASParserV2:
26+
def get_scan_types(self):
27+
return ["OpenVAS Parser v2"]
28+
29+
def get_label_for_scan_types(self, scan_type):
30+
return scan_type
31+
32+
def get_description_for_scan_types(self, scan_type):
33+
return "Import CSV or XML output of Greenbone OpenVAS report."
34+
35+
def get_findings(self, file, test):
36+
if str(file.name).endswith(".csv"):
37+
return get_findings_from_csv(file, test)
38+
if str(file.name).endswith(".xml"):
39+
return get_findings_from_xml(file, test)
40+
return None

dojo/tools/openvas/parser_v1/__init__.py

Whitespace-only changes.
File renamed without changes.
File renamed without changes.

dojo/tools/openvas/parser_v2/__init__.py

Whitespace-only changes.
Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
import hashlib
2+
from dataclasses import dataclass
3+
4+
from dojo.models import Endpoint, Finding
5+
6+
7+
@dataclass
8+
class OpenVASFindingAuxData:
9+
10+
"""Dataclass to contain all information added later to a finding"""
11+
12+
references: list[str]
13+
summary: str = ""
14+
qod: str = ""
15+
openvas_result: str = ""
16+
fallback_cvss_score: float | None = None
17+
18+
19+
def setup_finding(test) -> tuple[Finding, OpenVASFindingAuxData]:
20+
"""Base setup and init for findings and auxiliary data"""
21+
finding = Finding(test=test, dynamic_finding=True, static_finding=False, severity="Info", nb_occurences=1, cwe=None)
22+
finding.unsaved_vulnerability_ids = []
23+
finding.unsaved_endpoints = [Endpoint()]
24+
25+
aux_info = OpenVASFindingAuxData([])
26+
27+
return finding, aux_info
28+
29+
30+
def is_valid_severity(severity: str) -> bool:
31+
valid_severity = ("Info", "Low", "Medium", "High", "Critical")
32+
return severity in valid_severity
33+
34+
35+
def cleanup_openvas_text(text: str) -> str:
36+
"""Removes unnessesary defectojo newlines"""
37+
return text.replace("\n ", " ")
38+
39+
40+
def escape_restructured_text(text: str) -> str:
41+
"""Changes text so that restructured text symbols are not interpreted"""
42+
# OpenVAS likes to include markdown like tables in some fields
43+
# Defectdojo uses reStructuredText which causes them to be rendered wrong
44+
text = text.replace("```", "")
45+
text = text.replace("```", "")
46+
return f"```\n{text}\n```"
47+
48+
49+
def postprocess_finding(finding: Finding, aux_info: OpenVASFindingAuxData):
50+
"""Update finding with AuxData content"""
51+
if aux_info.openvas_result:
52+
finding.steps_to_reproduce = escape_restructured_text(cleanup_openvas_text(aux_info.openvas_result))
53+
if aux_info.summary:
54+
finding.description += f"\n**Summary**: {cleanup_openvas_text(aux_info.summary)}"
55+
if aux_info.qod:
56+
finding.description += f"\n**QoD**: {aux_info.qod}"
57+
if len(aux_info.references) > 0:
58+
finding.references = "\n".join(["- " + ref for ref in aux_info.references])
59+
# fallback in case no cvss version is detected
60+
if aux_info.fallback_cvss_score and not finding.cvssv3_score and not finding.cvssv4_score:
61+
finding.cvssv3_score = aux_info.fallback_cvss_score
62+
63+
# heuristic for fixed-available detection
64+
if finding.mitigation:
65+
search_terms = ["Update to version", "The vendor has released updates"]
66+
if any(text in finding.mitigation for text in search_terms):
67+
finding.fix_available = True
68+
69+
70+
def deduplicate(dupes: dict[str, Finding], finding: Finding):
71+
"""Combine multiple openvas findings into one defectdojo finding with potentially multiple endpoints"""
72+
finding_hash = gen_finding_hash(finding)
73+
74+
if finding_hash not in dupes:
75+
dupes[finding_hash] = finding
76+
else:
77+
# OpenVas does not combine multiple findings into one
78+
# e.g if 2 vulnerable java runtimes are present on the host this is reported as 2 finding.
79+
# The only way do differantiate theese findings when they are based on the same vulnerabilty
80+
# is the data in mapped to steps to reproduce.
81+
# However we cannot hash this field as it can contain data that changes between scans
82+
# e.g timestamps or packet ids
83+
# we therfore combine them into one defectdojo finding because duplicates during reimport cause
84+
# https://github.com/DefectDojo/django-DefectDojo/issues/3958
85+
org = dupes[finding_hash]
86+
org.nb_occurences += 1
87+
if org.steps_to_reproduce != finding.steps_to_reproduce:
88+
if "Endpoint" in org.steps_to_reproduce:
89+
org.steps_to_reproduce += "\n---------------------------------------\n"
90+
org.steps_to_reproduce += f"**Endpoint**: {finding.unsaved_endpoints[0].host}\n"
91+
org.steps_to_reproduce += finding.steps_to_reproduce
92+
else:
93+
tmp = org.steps_to_reproduce
94+
org.steps_to_reproduce = f"**Endpoint**: {org.unsaved_endpoints[0].host}\n"
95+
org.steps_to_reproduce += tmp
96+
97+
# combine identical findings on different hosts into one with multiple hosts
98+
endpoint = finding.unsaved_endpoints[0]
99+
if endpoint not in org.unsaved_endpoints:
100+
org.unsaved_endpoints += finding.unsaved_endpoints
101+
102+
103+
def gen_finding_hash(finding: Finding) -> str:
104+
"""Generate a hash for a finding that is used for deduplication of findings inside the current report"""
105+
endpoint = finding.unsaved_endpoints[0]
106+
hash_data = [
107+
str(endpoint),
108+
finding.title,
109+
finding.vuln_id_from_tool,
110+
finding.severity,
111+
]
112+
return hashlib.sha256("|".join(hash_data).encode("utf-8")).hexdigest()

0 commit comments

Comments
 (0)