Skip to content

Parser proposal: Xygeni JSON reports (single parser, multiple scan types) #14755

@lmrb-1968

Description

@lmrb-1968

Parser proposal: Xygeni JSON reports

Scanner Name

Xygeni — Software Supply Chain Security platform with multiple scanners
(SAST, SCA, secrets, IaC, CI/CD, DAST, suspect-deps, code-tampering).
Site: https://xygeni.io · Docs: https://docs.xygeni.io

Sample File(s)

Format: JSON, one report per scanner, all sharing a common metadata envelope.
Trimmed inline samples are included under each per-kind section below
(SAST, SCA, Secrets). Full anonymized sample reports for the three phase-1
scan types will be attached as comments on this issue, and checked into
unittests/scans/xygeni/ as part of the phase-1 PR.

About Xygeni

Xygeni is a platform for improving the Software Supply
Chain Security posture for organizations. The platform provides a set of
different scanners specialized in different software security domains: code
vulnerabilities (SAST), vulnerabilities in open-source components (SCA),
hard-coded secrets, flaws in IaC templates, vulnerabilities in web
applications (DAST), misconfigurations in version control and CI/CD systems,
or malicious behavior in owned and third-party components.

Proposal

Each Xygeni scanner emits a JSON report with a shared metadata envelope and a
kind-specific payload. We'd like to add a single first-party parser at
dojo/tools/xygeni/ that dispatches on metadata.scanType and routes to a
per-kind handler. This mirrors established precedent in dojo/tools/
(rusty_hog, anchore_grype, checkmarx, sonarqube, mobsf) and avoids
adding one near-duplicate xygeni_* parser per scanner.

The intent of this issue is to obtain pre-approval on:

  1. The shape — one parser with multiple scan types, dispatched on
    metadata.scanType.
  2. The phase-1 mappings — the field-by-field translations from the SAST,
    SCA, and Secrets JSON to DefectDojo Finding objects, documented below.
  3. The phasing — approving the structure now and growing it through
    focused follow-up PRs, each adding one or more additional scan types when
    that grouping is natural.

The parser will follow the Contribute to Parsers
recommendations from the DefectDojo documentation.

In scope (this proposal and the phase-1 PR that follows)

  • A single parser package at dojo/tools/xygeni/ exposing three scan types:
    Xygeni SAST Scan, Xygeni SCA Scan, Xygeni Secrets Scan.
  • A per-kind handler module for each (sast.py, sca.py, secrets.py)
    invoked from a thin XygeniParser.get_findings().
  • Severity, dedup, and CWE/CVE conversion utilities shared across the three
    kinds (_common.py).
  • Unit tests under unittests/tools/test_xygeni_parser.py.
  • Real Xygeni sample reports checked in as test fixtures under
    unittests/scans/xygeni/{sast,sca,secrets}_*.json (empty report,
    multi-finding report covering all severities, plus targeted edge-case
    fixtures per kind).
  • A documentation page at
    docs/content/en/connecting_your_tools/parsers/file/xygeni.md covering all
    three scan types and pointing to the Xygeni CLI commands that produce the
    matching JSON.

Out of scope (future follow-up PRs, listed for context only)

These Xygeni scan kinds are not part of this proposal. We mention them so
the maintainers can see the full direction and approve the parser structure
once. They will be delivered through follow-up PRs for additional scan types that extend
XygeniParser.get_scan_types() and add the corresponding handlers, fixtures,
and docs sections:

  • Xygeni IaC Scan — Terraform / CloudFormation / Kubernetes / Dockerfile flaws.
  • Xygeni CICD Misconfig Scan — pipeline and SCM misconfigurations.
  • Xygeni DAST Scan — web-application dynamic findings.
  • Xygeni Suspect Dependencies Scan — typosquatting / anomaly / malware signals.
  • Xygeni Code Tampering Scan — code-integrity violations.

Common backbone

Every Xygeni report has the same metadata envelope and a stable per-finding
backbone:

Xygeni field DefectDojo Finding field Notes
metadata.scanType (dispatch only) sast / deps / secrets / ...
<finding>.uniqueHash unique_id_from_tool Vendor-stable; guarantees re-import dedup
<finding>.issueId vuln_id_from_tool
<finding>.severity severity Titlecased: critical→Critical, high→High, medium→Medium, low→Low, info→Info

location.{filepath, beginLine, endLine, code} is shared by SAST and Secrets.
SCA uses package coordinates instead. DAST uses URL+method (out of scope here).


SAST — Xygeni SAST Scan

vulnerabilities[] is the primary array. detector is the rule id.
A subset of findings carry a SARIF-style codeFlows[] block with source/sink
frames and a data path; the parser renders that into the description and
populates DefectDojo's SAST source/sink fields.

Sample finding (taint flow, critical-severity):

{
  "metadata": {"scanType": "sast", "format": "sast-xygeni"},
  "vulnerabilities": [{
    "detector": "python.code_injection_deserialization",
    "kind": "injection",
    "severity": "critical",
    "confidence": "high",
    "language": "python",
    "location": {
      "filepath": "main.py", "beginLine": 36,
      "code": "pickle.loads(decoded_data)"
    },
    "cwe": 502,
    "cwes": ["CWE-502"],
    "tags": ["CWE:502", "OWASP:2021:A8"],
    "explanation": "Untrusted input deserialized via pickle.loads enables RCE.",
    "codeFlows": [{
      "frames": [
        {"kind": "source", "...": "..."},
        {"kind": "sink",   "...": "..."}
      ]
    }],
    "uniqueHash": "N0JJTPOJPJBHZw0haLys5Q",
    "issueId": "SAS.injection.python.code_injection_deserialization.main.py.36"
  }]
}

Field mapping:

DefectDojo Finding Xygeni source
title detector
description explanation + location.code + rendered codeFlows
file_path location.filepath
line location.beginLine
cwe cwe (numeric)
sast_source_file_path / sast_source_line first codeFlows[].frames[] source, when present
sast_sink_object first codeFlows[].frames[] sink, when present
static_finding True

SCA — Xygeni SCA Scan

Top-level dependencies[], each with nested vulnerabilities[] (CVE/GHSA).
One Finding per dependencies[].vulnerabilities[] entry.

Sample finding:

{
  "metadata": {"scanType": "deps", "format": "deps-xygeni"},
  "dependencies": [{
    "name": "cookie", "version": "0.5.0", "ecosystem": "npm",
    "vulnerabilities": [{
      "id": "CVE-2024-47764",
      "cve": "CVE-2024-47764",
      "severity": "low",
      "fixedVersion": "0.7.0",
      "aliases": ["GHSA-pxg6-pf52-xh8x"],
      "overallCvssScore": -1.0,
      "references": [
        "https://github.com/jshttp/cookie/security/advisories/GHSA-pxg6-pf52-xh8x"
      ],
      "uniqueHash": "CVE-2024-47764#:cookie:0.5.0:javascript",
      "issueId": "SCA.CVE-2024-47764",
      "description": "..."
    }]
  }]
}

Field mapping:

DefectDojo Finding Xygeni source
title cve (fall back to id)
description description
cve cve
cwe cwes[0] if present
cvssv3_score overallCvssScore when ≥ 0
mitigation "Upgrade to {fixedVersion}"
references references joined
component_name parent dependencies[].name
component_version parent dependencies[].version

Secrets — Xygeni Secrets Scan

secrets[] is the primary array. The Xygeni report already redacts the secret
value in both secret and location.code — the raw value never appears, so
the parser surfaces those fields as-is.

Sample finding:

{
  "metadata": {"scanType": "secrets", "format": "secrets-xygeni"},
  "secrets": [{
    "secret": "AKIA****REDACTED****",
    "hash": "9d5e...",
    "type": "aws_access_key",
    "detector": "aws-access-key",
    "severity": "high",
    "confidence": "high",
    "location": {
      "filepath": "aws.properties", "beginLine": 7,
      "code": "aws.access.key=AKIA****"
    },
    "description": "AWS access key ID detected.",
    "tags": ["secret:aws", "cwe:798"],
    "uniqueHash": "abc...",
    "issueId": "SECRETS.aws-access-key.aws.properties:7"
  }]
}

Field mapping:

DefectDojo Finding Xygeni source
title "{type} secret detected in {filename}"
description description + location.code
file_path location.filepath
line location.beginLine
cwe first cwe:N tag, else 798
mitigation "Rotate this {type} secret and remove from history."
static_finding True

Layout

dojo/tools/xygeni/
├── __init__.py
├── parser.py        # XygeniParser, dispatches on metadata.scanType
├── sast.py
├── sca.py
├── secrets.py
└── _common.py       # severity map, dedup helpers

unittests/scans/xygeni/{sast,sca,secrets}_*.json
unittests/tools/test_xygeni_parser.py
docs/content/en/connecting_your_tools/parsers/file/xygeni.md

PRs originate from xygeni/django-DefectDojo (public org fork) against dev.

Questions

  1. Is one parser dispatching on metadata.scanType preferred, given the
    rusty_hog / anchore_grype precedent? Or should we split into
    xygeni_sast / xygeni_sca / xygeni_secrets?
  2. Any objection to setting vuln_id_from_tool = issueId alongside
    unique_id_from_tool = uniqueHash?
  3. OK to approve this structure now, with the phase-2 scan types
    (IaC / CICD / DAST / suspect-deps / code-tampering) added to the same
    parser in follow-up PRs?

References:

DefectDojo parser contributor guide ·

Xygeni docs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions