Skip to content

feat(parsers): add Xygeni JSON parser (SAST, SCA, Secrets)#14769

Draft
lmrb-1968 wants to merge 2 commits intoDefectDojo:devfrom
xygeni:xygeni-parser
Draft

feat(parsers): add Xygeni JSON parser (SAST, SCA, Secrets)#14769
lmrb-1968 wants to merge 2 commits intoDefectDojo:devfrom
xygeni:xygeni-parser

Conversation

@lmrb-1968
Copy link
Copy Markdown

@lmrb-1968 lmrb-1968 commented Apr 28, 2026

Description

This PR adds a single first-party parser for Xygeni JSON reports under
dojo/tools/xygeni/. It dispatches on metadata.scanType and exposes three
scan types: Xygeni SAST Scan, Xygeni SCA Scan, Xygeni Secrets Scan. The
pattern mirrors rusty_hog, anchore_grype, checkmarx, sonarqube, and
mobsf.

Xygeni is a Software Supply Chain Security platform that
emits a JSON report per scanner. The full pre-approval discussion (with
field-mapping tables and example JSON per kind) is at #14755.

Opened as a draft because pre-approval is still pending — happy to wait
for maintainer feedback before any further action. The implementation is
provided here so reviewers can evaluate the concrete shape if helpful.

Test results

unittests/tools/test_xygeni_parser.py covers:

  • empty report per kind (3 fixtures)
  • many-finding report per kind against real anonymized fixtures
    (501 SAST + 50 SCA + 61 Secrets findings)
  • dispatch on metadata.scanType for a synthetic minimal report
  • error paths: missing metadata.scanType, unsupported scan type

Run via the project's docker-compose unit-tests setup against real Postgres:

docker compose -f docker-compose.yml -f docker-compose.override.unit_tests.yml run --rm \
  --entrypoint /bin/bash uwsgi -lc \
  '. /secret-file-loader.sh; . /reach_database.sh; cd /app;
   unset DD_DATABASE_URL DD_CELERY_BROKER_URL;
   wait_for_database_to_be_reachable;
   python3 manage.py migrate --no-input;
   python3 manage.py test unittests.tools.test_xygeni_parser --keepdb -v 2'

Result: Ran 10 tests in 0.084s — OK. ruff check is clean against
dojo/tools/xygeni/ and unittests/tools/test_xygeni_parser.py.

Documentation

Added at docs/content/supported_tools/parsers/file/xygeni.md — covers all
three scan types, the common metadata envelope, the per-kind payload shapes,
and links to the sample fixtures.

Checklist

  • Rebased on the latest dev.
  • Targets dev (new parser).
  • Ruff compliant (line-length 120, target py313).
  • Python 3.13 compliant.
  • Documentation page added.
  • No model changes — no migration needed.
  • Unit tests added.
  • Label Import Scans requested — cannot self-apply as a non-collaborator.

Pre-approval: #14755

Add a single first-party parser at dojo/tools/xygeni/ that handles three
Xygeni JSON report kinds (SAST, SCA, Secrets) by dispatching on
metadata.scanType. Mirrors the multi-scan-type pattern of rusty_hog,
anchore_grype, checkmarx and sonarqube.

Pre-approval: DefectDojo#14755
@valentijnscholten
Copy link
Copy Markdown
Member

At first glance it looks good, but shouldn't settings.dist.py not be updated to set the dedupe algorithm to UNIQUE_ID_FROM_TOOL (or UNIQUE_ID_FROM_TOOL_OR_HASH)?

@valentijnscholten valentijnscholten added this to the 2.58.0 milestone Apr 28, 2026
Wire the three Xygeni scan types into DEDUPLICATION_ALGORITHM_PER_PARSER
in settings.dist.py so re-imports dedup against the vendor-stable
uniqueHash instead of the legacy heuristic:

- Xygeni SAST Scan, Xygeni Secrets Scan: DEDUPE_ALGO_UNIQUE_ID_FROM_TOOL.
- Xygeni SCA Scan: DEDUPE_ALGO_UNIQUE_ID_FROM_TOOL_OR_HASH_CODE with
  HASHCODE_FIELDS_PER_SCANNER set to (vulnerability_ids, component_name,
  component_version) and HASHCODE_ALLOWS_NULL_CWE: True, enabling
  cross-tool dedup with other SCA parsers when a CVE matches a package
  at the same version.

Document the per-scan-type algorithm in the parser docs page.

Refs: DefectDojo#14755
@github-actions github-actions Bot added the settings_changes Needs changes to settings.py based on changes in settings.dist.py included in this PR label Apr 30, 2026
@lmrb-1968
Copy link
Copy Markdown
Author

lmrb-1968 commented Apr 30, 2026

Good catch, thanks @valentijnscholten, we missed to register ourselves entirely !
Pushed an update wiring the dedup algorithm in dojo/settings/settings.dist.py:

  • Xygeni SAST Scan: DEDUPE_ALGO_UNIQUE_ID_FROM_TOOL. Vendor-specific titles (detector strings like python.code_injection_deserialization) make cross-tool dedup almost impossible, and a line-based fallback breaks on unrelated edits. Staying with unique_id_from_tool only is the best choice.

  • Xygeni SCA Scan: DEDUPE_ALGO_UNIQUE_ID_FROM_TOOL_OR_HASH_CODE with hash fields ["vulnerability_ids", "component_name", "component_version"] and HASHCODE_ALLOWS_NULL_CWE: True (Xygeni's cwes[] is empty for some advisories). Real cross-tool dedup win - same CVE on same package@version emitted by Xygeni and other tools might collapse into one finding.

  • Xygeni Secrets Scan: DEDUPE_ALGO_UNIQUE_ID_FROM_TOOL. No _OR_HASH_CODE precedent for secrets parsers upstream.

Updated docs/content/supported_tools/parsers/file/xygeni.md to document the per-scan-type dedup behavior.

Ran the Xygeni test suite via the project's docker-compose.override.unit_tests.yml setup - all 10 tests still pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs parser settings_changes Needs changes to settings.py based on changes in settings.dist.py included in this PR unittests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants