Parser proposal: Xygeni JSON reports (single parser, multiple scan types)

# Parser proposal: Xygeni JSON reports

**Scanner Name**

Xygeni — Software Supply Chain Security platform with multiple scanners
(SAST, SCA, secrets, IaC, CI/CD, DAST, suspect-deps, code-tampering).
Site: <https://xygeni.io> · Docs: <https://docs.xygeni.io>

**Sample File(s)**

Format: JSON, one report per scanner, all sharing a common `metadata` envelope.
Trimmed inline samples are included under each per-kind section below
(SAST, SCA, Secrets). Full anonymized sample reports for the three phase-1
scan types will be attached as comments on this issue, and checked into
`unittests/scans/xygeni/` as part of the phase-1 PR.

## About Xygeni

[Xygeni](https://xygeni.io) is a platform for improving the Software Supply
Chain Security posture for organizations. The platform provides a set of
different scanners specialized in different software security domains: code
vulnerabilities (SAST), vulnerabilities in open-source components (SCA),
hard-coded secrets, flaws in IaC templates, vulnerabilities in web
applications (DAST), misconfigurations in version control and CI/CD systems,
or malicious behavior in owned and third-party components.

## Proposal

Each Xygeni scanner emits a JSON report with a shared `metadata` envelope and a
kind-specific payload. We'd like to add a single first-party parser at
`dojo/tools/xygeni/` that dispatches on `metadata.scanType` and routes to a
per-kind handler. This mirrors established precedent in `dojo/tools/`
(`rusty_hog`, `anchore_grype`, `checkmarx`, `sonarqube`, `mobsf`) and avoids
adding one near-duplicate `xygeni_*` parser per scanner.

The intent of this issue is to obtain pre-approval on:

1. **The shape** — one parser with multiple scan types, dispatched on
   `metadata.scanType`.
2. **The phase-1 mappings** — the field-by-field translations from the SAST,
   SCA, and Secrets JSON to DefectDojo `Finding` objects, documented below.
3. **The phasing** — approving the structure now and growing it through
   focused follow-up PRs, each adding one or more additional scan types when
   that grouping is natural.

The parser will follow the [Contribute to Parsers](https://docs.defectdojo.com/get_started/contributing/how-to-write-a-parser/)
recommendations from the DefectDojo documentation.

### In scope (this proposal and the phase-1 PR that follows)

- A single parser package at `dojo/tools/xygeni/` exposing three scan types:
  `Xygeni SAST Scan`, `Xygeni SCA Scan`, `Xygeni Secrets Scan`.
- A per-kind handler module for each (`sast.py`, `sca.py`, `secrets.py`)
  invoked from a thin `XygeniParser.get_findings()`.
- Severity, dedup, and CWE/CVE conversion utilities shared across the three
  kinds (`_common.py`).
- Unit tests under `unittests/tools/test_xygeni_parser.py`.
- Real Xygeni sample reports checked in as test fixtures under
  `unittests/scans/xygeni/{sast,sca,secrets}_*.json` (empty report,
  multi-finding report covering all severities, plus targeted edge-case
  fixtures per kind).
- A documentation page at
  `docs/content/en/connecting_your_tools/parsers/file/xygeni.md` covering all
  three scan types and pointing to the Xygeni CLI commands that produce the
  matching JSON.

### Out of scope (future follow-up PRs, listed for context only)

These Xygeni scan kinds are *not* part of this proposal. We mention them so
the maintainers can see the full direction and approve the parser structure
once. They will be delivered through follow-up PRs for additional scan types that extend
`XygeniParser.get_scan_types()` and add the corresponding handlers, fixtures,
and docs sections:

- `Xygeni IaC Scan` — Terraform / CloudFormation / Kubernetes / Dockerfile flaws.
- `Xygeni CICD Misconfig Scan` — pipeline and SCM misconfigurations.
- `Xygeni DAST Scan` — web-application dynamic findings.
- `Xygeni Suspect Dependencies Scan` — typosquatting / anomaly / malware signals.
- `Xygeni Code Tampering Scan` — code-integrity violations.

## Common backbone

Every Xygeni report has the same `metadata` envelope and a stable per-finding
backbone:

| Xygeni field      | DefectDojo `Finding` field | Notes                                    |
| ----------------- | -------------------------- | ---------------------------------------- |
| `metadata.scanType` | (dispatch only)          | `sast` / `deps` / `secrets` / ...        |
| `<finding>.uniqueHash` | `unique_id_from_tool`  | Vendor-stable; guarantees re-import dedup |
| `<finding>.issueId`    | `vuln_id_from_tool`    |                                          |
| `<finding>.severity`   | `severity`             | Titlecased: `critical→Critical`, `high→High`, `medium→Medium`, `low→Low`, `info→Info` |

`location.{filepath, beginLine, endLine, code}` is shared by SAST and Secrets.
SCA uses package coordinates instead. DAST uses URL+method (out of scope here).

---

## SAST — `Xygeni SAST Scan`

`vulnerabilities[]` is the primary array. `detector` is the rule id.
A subset of findings carry a SARIF-style `codeFlows[]` block with source/sink
frames and a data path; the parser renders that into the description and
populates DefectDojo's SAST source/sink fields.

Sample finding (taint flow, critical-severity):

```json
{
  "metadata": {"scanType": "sast", "format": "sast-xygeni"},
  "vulnerabilities": [{
    "detector": "python.code_injection_deserialization",
    "kind": "injection",
    "severity": "critical",
    "confidence": "high",
    "language": "python",
    "location": {
      "filepath": "main.py", "beginLine": 36,
      "code": "pickle.loads(decoded_data)"
    },
    "cwe": 502,
    "cwes": ["CWE-502"],
    "tags": ["CWE:502", "OWASP:2021:A8"],
    "explanation": "Untrusted input deserialized via pickle.loads enables RCE.",
    "codeFlows": [{
      "frames": [
        {"kind": "source", "...": "..."},
        {"kind": "sink",   "...": "..."}
      ]
    }],
    "uniqueHash": "N0JJTPOJPJBHZw0haLys5Q",
    "issueId": "SAS.injection.python.code_injection_deserialization.main.py.36"
  }]
}
```

Field mapping:

| DefectDojo `Finding`      | Xygeni source                                                |
| ------------------------- | ------------------------------------------------------------ |
| `title`                   | `detector`                                                   |
| `description`             | `explanation` + `location.code` + rendered `codeFlows`       |
| `file_path`               | `location.filepath`                                          |
| `line`                    | `location.beginLine`                                         |
| `cwe`                     | `cwe` (numeric)                                              |
| `sast_source_file_path` / `sast_source_line` | first `codeFlows[].frames[]` source, when present |
| `sast_sink_object`        | first `codeFlows[].frames[]` sink, when present              |
| `static_finding`          | `True`                                                       |

---

## SCA — `Xygeni SCA Scan`

Top-level `dependencies[]`, each with nested `vulnerabilities[]` (CVE/GHSA).
One `Finding` per `dependencies[].vulnerabilities[]` entry.

Sample finding:

```json
{
  "metadata": {"scanType": "deps", "format": "deps-xygeni"},
  "dependencies": [{
    "name": "cookie", "version": "0.5.0", "ecosystem": "npm",
    "vulnerabilities": [{
      "id": "CVE-2024-47764",
      "cve": "CVE-2024-47764",
      "severity": "low",
      "fixedVersion": "0.7.0",
      "aliases": ["GHSA-pxg6-pf52-xh8x"],
      "overallCvssScore": -1.0,
      "references": [
        "https://github.com/jshttp/cookie/security/advisories/GHSA-pxg6-pf52-xh8x"
      ],
      "uniqueHash": "CVE-2024-47764#:cookie:0.5.0:javascript",
      "issueId": "SCA.CVE-2024-47764",
      "description": "..."
    }]
  }]
}
```

Field mapping:

| DefectDojo `Finding`  | Xygeni source                                |
| --------------------- | -------------------------------------------- |
| `title`               | `cve` (fall back to `id`)                    |
| `description`         | `description`                                |
| `cve`                 | `cve`                                        |
| `cwe`                 | `cwes[0]` if present                         |
| `cvssv3_score`        | `overallCvssScore` when ≥ 0                  |
| `mitigation`          | `"Upgrade to {fixedVersion}"`                |
| `references`          | `references` joined                          |
| `component_name`      | parent `dependencies[].name`                 |
| `component_version`   | parent `dependencies[].version`              |

---

## Secrets — `Xygeni Secrets Scan`

`secrets[]` is the primary array. The Xygeni report already redacts the secret
value in both `secret` and `location.code` — the raw value never appears, so
the parser surfaces those fields as-is.

Sample finding:

```json
{
  "metadata": {"scanType": "secrets", "format": "secrets-xygeni"},
  "secrets": [{
    "secret": "AKIA****REDACTED****",
    "hash": "9d5e...",
    "type": "aws_access_key",
    "detector": "aws-access-key",
    "severity": "high",
    "confidence": "high",
    "location": {
      "filepath": "aws.properties", "beginLine": 7,
      "code": "aws.access.key=AKIA****"
    },
    "description": "AWS access key ID detected.",
    "tags": ["secret:aws", "cwe:798"],
    "uniqueHash": "abc...",
    "issueId": "SECRETS.aws-access-key.aws.properties:7"
  }]
}
```

Field mapping:

| DefectDojo `Finding`  | Xygeni source                                       |
| --------------------- | --------------------------------------------------- |
| `title`               | `"{type} secret detected in {filename}"`            |
| `description`         | `description` + `location.code`                     |
| `file_path`           | `location.filepath`                                 |
| `line`                | `location.beginLine`                                |
| `cwe`                 | first `cwe:N` tag, else `798`                       |
| `mitigation`          | `"Rotate this {type} secret and remove from history."` |
| `static_finding`      | `True`                                              |

---

## Layout

```
dojo/tools/xygeni/
├── __init__.py
├── parser.py        # XygeniParser, dispatches on metadata.scanType
├── sast.py
├── sca.py
├── secrets.py
└── _common.py       # severity map, dedup helpers

unittests/scans/xygeni/{sast,sca,secrets}_*.json
unittests/tools/test_xygeni_parser.py
docs/content/en/connecting_your_tools/parsers/file/xygeni.md
```

PRs originate from `xygeni/django-DefectDojo` (public org fork) against `dev`.

## Questions

1. Is one parser dispatching on `metadata.scanType` preferred, given the
   `rusty_hog` / `anchore_grype` precedent? Or should we split into
   `xygeni_sast` / `xygeni_sca` / `xygeni_secrets`?
2. Any objection to setting `vuln_id_from_tool = issueId` alongside
   `unique_id_from_tool = uniqueHash`?
3. OK to approve this structure now, with the phase-2 scan types
   (IaC / CICD / DAST / suspect-deps / code-tampering) added to the same
   parser in follow-up PRs?

---

References:

[DefectDojo parser contributor guide](https://docs.defectdojo.com/en/open_source/contributing/how-to-write-a-parser/) ·

[Xygeni docs](https://docs.xygeni.io)


Xygeni field	DefectDojo `Finding` field	Notes
`metadata.scanType`	(dispatch only)	`sast` / `deps` / `secrets` / ...
`<finding>.uniqueHash`	`unique_id_from_tool`	Vendor-stable; guarantees re-import dedup
`<finding>.issueId`	`vuln_id_from_tool`
`<finding>.severity`	`severity`	Titlecased: `critical→Critical`, `high→High`, `medium→Medium`, `low→Low`, `info→Info`

DefectDojo `Finding`	Xygeni source
`title`	`detector`
`description`	`explanation` + `location.code` + rendered `codeFlows`
`file_path`	`location.filepath`
`line`	`location.beginLine`
`cwe`	`cwe` (numeric)
`sast_source_file_path` / `sast_source_line`	first `codeFlows[].frames[]` source, when present
`sast_sink_object`	first `codeFlows[].frames[]` sink, when present
`static_finding`	`True`

DefectDojo `Finding`	Xygeni source
`title`	`cve` (fall back to `id`)
`description`	`description`
`cve`	`cve`
`cwe`	`cwes[0]` if present
`cvssv3_score`	`overallCvssScore` when ≥ 0
`mitigation`	`"Upgrade to {fixedVersion}"`
`references`	`references` joined
`component_name`	parent `dependencies[].name`
`component_version`	parent `dependencies[].version`

DefectDojo `Finding`	Xygeni source
`title`	`"{type} secret detected in {filename}"`
`description`	`description` + `location.code`
`file_path`	`location.filepath`
`line`	`location.beginLine`
`cwe`	first `cwe:N` tag, else `798`
`mitigation`	`"Rotate this {type} secret and remove from history."`
`static_finding`	`True`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parser proposal: Xygeni JSON reports (single parser, multiple scan types) #14755

Parser proposal: Xygeni JSON reports

About Xygeni

Proposal

In scope (this proposal and the phase-1 PR that follows)

Out of scope (future follow-up PRs, listed for context only)

Common backbone

SAST — `Xygeni SAST Scan`

SCA — `Xygeni SCA Scan`

Secrets — `Xygeni Secrets Scan`

Layout

Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Parser proposal: Xygeni JSON reports (single parser, multiple scan types) #14755

Description

Parser proposal: Xygeni JSON reports

About Xygeni

Proposal

In scope (this proposal and the phase-1 PR that follows)

Out of scope (future follow-up PRs, listed for context only)

Common backbone

SAST — Xygeni SAST Scan

SCA — Xygeni SCA Scan

Secrets — Xygeni Secrets Scan

Layout

Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

SAST — `Xygeni SAST Scan`

SCA — `Xygeni SCA Scan`

Secrets — `Xygeni Secrets Scan`