diff --git a/docs/content/en/open_source/upgrading/2.53.md b/docs/content/en/open_source/upgrading/2.53.md index b6970b87fc9..191a9f83025 100644 --- a/docs/content/en/open_source/upgrading/2.53.md +++ b/docs/content/en/open_source/upgrading/2.53.md @@ -2,7 +2,7 @@ title: 'Upgrading to DefectDojo Version 2.53.x' toc_hide: true weight: -20251103 -description: "Helm chart: changes for initializer annotations + Replaced Redis with Valkey + HPA & PDB support" +description: "Helm chart: changes for initializer annotations + Replaced Redis with Valkey + HPA & PDB support + Batch Deduplication" --- ## Helm Chart Changes @@ -17,9 +17,9 @@ Added Helm chart support for Celery and Django deployments for Horizontal Pod Au ### Breaking changes -#### Valkey +#### Valkey -##### Renamed values +##### Renamed values HELM values had been changed to the following: - `createRedisSecret` → `createValkeySecret` @@ -40,7 +40,7 @@ If an external Redis instance is being used, set the parameter `valkey.enabled` 0. As always, perform a backup of your instance 1. If you would like to be 100% sure that you do not miss any async event (triggered deduplication, email notification, ...) it is recommended to perform the following substeps (if your system is not in production and/or you are willing to miss some notifications or postpone deduplication to a later time, feel free to skip these substeps) 0. Perform the following steps with your previous version of HELM chart (not with the upgraded one - you might lose your data) - 1. Downscale all producers of async tasks: + 1. Downscale all producers of async tasks: - Set `django.replicas` to 0 (if you used HPA, adjust it based on your needs) - Set `celery.beat.replicas` to 0 (if you used HPA, adjust it based on your needs) - Do not change `celery.worker.replicas` (they are responsible for processing your async tasks) @@ -89,4 +89,26 @@ Both `extraAnnotations` and `initializer.podAnnotations` will now be properly ap Reimport will update existing findings `fix_available` and `fix_version` fields based on the incoming scan report. +## Batch Deduplication + +Before 2.53.0 Defect Dojo has been deduplicating new or updated findings one-by-one. This works well for small imports and has the benefit of an easy to understand codebase and test suite. For larger imports however the performance is bad and resource usage is (very) high. A 1000+ finding import can cause a celery worker to spend minutes on deduplication. + +PR [13491](https://github.com/DefectDojo/django-DefectDojo/pull/13491) changes the deduplication process for import and reimport to be done in batches. This biggest benefit is that there now will be 1 database query per batch (1000 findings), instead of 1 query per finding (1000 queries). + +A quick test with the `jfrog_xray_unified/very_many_vulns.json` samples scan (10k findings) shwo the obvious huge improvement in deduplication time. Please note that we're not only doing this for performance, but also to reduce the resources (cloud cost) needed to run Defect Dojo. + +initial import (no duplicates): +| branch | import time | dedupe time | total time | +|--------|:-----------:|:-----------:|:-----------:| +| dev | ~200s | ~400s | ~600s | +| dedupe-batching | ~190s | _~12s_ | ~200s | + +second import into the same product (all duplicates): +initial import (no duplicates): +| branch | import time | dedupe time | total time | +|--------|:-----------:|:-----------:|:-----------:| +| dev | ~200s | ~400s | ~600s | +| dedupe-batching | ~190s | _~180s_ | ~370s | + + There are no other special instructions for upgrading to 2.53.x. Check the [Release Notes](https://github.com/DefectDojo/django-DefectDojo/releases/tag/2.53.0) for the contents of the release.