Skip to content

Commit 1316419

Browse files
docs: add dedupe batching note to 2.53 upgrade notes (#13914)
1 parent 20917e8 commit 1316419

1 file changed

Lines changed: 26 additions & 4 deletions

File tree

  • docs/content/en/open_source/upgrading

docs/content/en/open_source/upgrading/2.53.md

Lines changed: 26 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
title: 'Upgrading to DefectDojo Version 2.53.x'
33
toc_hide: true
44
weight: -20251103
5-
description: "Helm chart: changes for initializer annotations + Replaced Redis with Valkey + HPA & PDB support"
5+
description: "Helm chart: changes for initializer annotations + Replaced Redis with Valkey + HPA & PDB support + Batch Deduplication"
66
---
77

88
## Helm Chart Changes
@@ -17,9 +17,9 @@ Added Helm chart support for Celery and Django deployments for Horizontal Pod Au
1717

1818
### Breaking changes
1919

20-
#### Valkey
20+
#### Valkey
2121

22-
##### Renamed values
22+
##### Renamed values
2323

2424
HELM values had been changed to the following:
2525
- `createRedisSecret``createValkeySecret`
@@ -40,7 +40,7 @@ If an external Redis instance is being used, set the parameter `valkey.enabled`
4040
0. As always, perform a backup of your instance
4141
1. If you would like to be 100% sure that you do not miss any async event (triggered deduplication, email notification, ...) it is recommended to perform the following substeps (if your system is not in production and/or you are willing to miss some notifications or postpone deduplication to a later time, feel free to skip these substeps)
4242
0. Perform the following steps with your previous version of HELM chart (not with the upgraded one - you might lose your data)
43-
1. Downscale all producers of async tasks:
43+
1. Downscale all producers of async tasks:
4444
- Set `django.replicas` to 0 (if you used HPA, adjust it based on your needs)
4545
- Set `celery.beat.replicas` to 0 (if you used HPA, adjust it based on your needs)
4646
- Do not change `celery.worker.replicas` (they are responsible for processing your async tasks)
@@ -89,4 +89,26 @@ Both `extraAnnotations` and `initializer.podAnnotations` will now be properly ap
8989

9090
Reimport will update existing findings `fix_available` and `fix_version` fields based on the incoming scan report.
9191

92+
## Batch Deduplication
93+
94+
Before 2.53.0 Defect Dojo has been deduplicating new or updated findings one-by-one. This works well for small imports and has the benefit of an easy to understand codebase and test suite. For larger imports however the performance is bad and resource usage is (very) high. A 1000+ finding import can cause a celery worker to spend minutes on deduplication.
95+
96+
PR [13491](https://github.com/DefectDojo/django-DefectDojo/pull/13491) changes the deduplication process for import and reimport to be done in batches. This biggest benefit is that there now will be 1 database query per batch (1000 findings), instead of 1 query per finding (1000 queries).
97+
98+
A quick test with the `jfrog_xray_unified/very_many_vulns.json` samples scan (10k findings) shwo the obvious huge improvement in deduplication time. Please note that we're not only doing this for performance, but also to reduce the resources (cloud cost) needed to run Defect Dojo.
99+
100+
initial import (no duplicates):
101+
| branch | import time | dedupe time | total time |
102+
|--------|:-----------:|:-----------:|:-----------:|
103+
| dev | ~200s | ~400s | ~600s |
104+
| dedupe-batching | ~190s | _~12s_ | ~200s |
105+
106+
second import into the same product (all duplicates):
107+
initial import (no duplicates):
108+
| branch | import time | dedupe time | total time |
109+
|--------|:-----------:|:-----------:|:-----------:|
110+
| dev | ~200s | ~400s | ~600s |
111+
| dedupe-batching | ~190s | _~180s_ | ~370s |
112+
113+
92114
There are no other special instructions for upgrading to 2.53.x. Check the [Release Notes](https://github.com/DefectDojo/django-DefectDojo/releases/tag/2.53.0) for the contents of the release.

0 commit comments

Comments
 (0)