Reduce and optimize number of product grading calls using a `Chord` by valentijnscholten · Pull Request #12914 · DefectDojo/django-DefectDojo

valentijnscholten · 2025-08-04T17:05:21Z

The current implementation of the import and reimport executes the product grading for each finding that is imported. This results in lots of background celery tasks putting a non-trivial strain on the database. In large imports it could mean minutes of extra processing purely for product grading all of the same product.
The same happens in the async_dupe_delete task that runs every minute. And it happens if a product or engagement is deleted.

This PR optimizes these cases. The import/reimport case uses a celery chord to reduce this to only a couple of product grading.

By design the chord will only start processing once all tasks have been submitted to it. That's why we ramp up processing with multiple chords starting with small sizes increasing exponentially up to 1024.
If product grading is disabled we use a group and no product grading.

To be able to use a chord, we need to generate signatures which are similar to lambda's than can be passed around (to celery tasks/chords/groups). This needs a little custom code because we have the @dojo_async_task decorator which wraps around @app.task. If we use similar celery constructs more in the future, we may want to remove @dojo_async_task and/or replace it with something simpler.

Notes:

Instead of a chord I also tried just launching the tasks directly and await their results. But this involved setting timeouts which can be hard for large imports or busy servers. I considered batching/chaining, but then we would just be reimplementing celery. So I settled for the multi-chord approach. A single chord approach would defer the start of post processing until after all findings would have been created.
I tried to solve this by generating and yielding signatures and use a the generator to cosntruct a chord. But it looks like this used to work in Celery 4, but no longer works in Celery 5. So multi-chord it is.

github-actions · 2025-08-04T17:33:41Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions · 2025-08-04T18:28:35Z

Conflicts have been resolved. A maintainer will review the pull request shortly.

github-actions · 2025-09-06T19:30:48Z

Conflicts have been resolved. A maintainer will review the pull request shortly.

github-actions · 2025-09-12T16:29:45Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

dryrunsecurity · 2025-09-19T20:09:47Z

This pull request introduces three issues: unhandled exceptions when reading System_Settings in calculate_grade_internal that can crash Celery tasks and halt imports, a race condition where asynchronous finding-processing group tasks may not complete before product grading runs leading to inconsistent product grades, and potential resource exhaustion from building an in-memory list of all parsed findings (cleaned_findings) which can cause DoS on large reports.

Denial of Service due to Unhandled Exception in dojo/utils.py

Vulnerability	Denial of Service due to Unhandled Exception
Description	The `calculate_grade_internal` function, which is part of critical asynchronous workflows (like finding deletion and scan imports), calls `System_Settings.objects.get()` without handling potential `DoesNotExist` or `MultipleObjectsReturned` exceptions. If the `System_Settings` table is empty or contains more than one entry, this will cause an unhandled exception, leading to the Celery task failing. Since these tasks are often part of larger chains (e.g., chords in scan imports), a failure here can halt the entire import process or prevent product grading, causing a denial of service for these features.

django-DefectDojo/dojo/utils.py

Lines 1577 to 1579 in 5b45aaf

    
           system_settings = System_Settings.objects.get() 
        
           if not product: 
        
               logger.warning("ignoring calculate product for product None!")

Inconsistent Product Grade due to Race Condition in dojo/finding/helper.py

Vulnerability	Inconsistent Product Grade due to Race Condition
Description	The new asynchronous finding processing logic introduces a race condition that can lead to an inconsistent product grade. When `system_settings.enable_product_grade` is disabled, the `maybe_launch_post_processing_chord` function dispatches finding processing tasks using a Celery `group`. A `group` does not wait for its constituent tasks to complete before returning. Immediately after this dispatch, the main process calls `perform_product_grading` synchronously. This means the product grade can be calculated based on stale data, as the finding modifications (creations, updates, deletions) initiated by the `group` tasks may not have finished executing in the background. This results in an inaccurate security grade for the product until all asynchronous tasks eventually complete.

django-DefectDojo/dojo/finding/helper.py

Lines 379 to 413 in 5b45aaf

    
                               finding_group.findings.add(*findings) 
        
           @dojo_model_to_id 
        
           @dojo_async_task(signature=True) 
        
           @app.task 
        
           @dojo_model_from_id 
        
           def post_process_finding_save_signature(finding, dedupe_option=True, rules_option=True, product_grading_option=True,  # noqa: FBT002 
        
                        issue_updater_option=True, push_to_jira=False, user=None, *args, **kwargs):  # noqa: FBT002 - this is bit hard to fix nice have this universally fixed 
        
               """ 
        
               Returns a task signature for post-processing a finding. This is useful for creating task signatures 
        
               that can be used in chords or groups or to await results. We need this extra method because of our dojo_async decorator. 
        
               If we use more of these celery features, we should probably move away from that decorator. 
        
               """ 
        
               return post_process_finding_save_internal(finding, dedupe_option, rules_option, product_grading_option, 
        
                                              issue_updater_option, push_to_jira, user, *args, **kwargs) 
        
           @dojo_model_to_id 
        
           @dojo_async_task 
        
           @app.task 
        
           @dojo_model_from_id 
        
           def post_process_finding_save(finding, dedupe_option=True, rules_option=True, product_grading_option=True,  # noqa: FBT002 
        
                        issue_updater_option=True, push_to_jira=False, user=None, *args, **kwargs):  # noqa: FBT002 - this is bit hard to fix nice have this universally fixed 
        
               return post_process_finding_save_internal(finding, dedupe_option, rules_option, product_grading_option, 
        
                                              issue_updater_option, push_to_jira, user, *args, **kwargs) 
        
           def post_process_finding_save_internal(finding, dedupe_option=True, rules_option=True, product_grading_option=True,  # noqa: FBT002 
        
                        issue_updater_option=True, push_to_jira=False, user=None, *args, **kwargs):  # noqa: FBT002 - this is bit hard to fix nice have this universally fixed 
        
               if not finding: 
        
                   logger.warning("post_process_finding_save called with finding==None, skipping post processing") 
        
                   return

Denial of Service (DoS) via Resource Exhaustion in dojo/importers/default_reimporter.py

Vulnerability	Denial of Service (DoS) via Resource Exhaustion
Description	The `process_findings` function in both `default_reimporter.py` and `default_importer.py` creates an in-memory list called `cleaned_findings` and populates it with all findings parsed from an uploaded report. This means that if a user uploads a report containing a very large number of findings, the application will attempt to load all of them into memory, potentially exhausting available server resources and leading to a Denial of Service.

django-DefectDojo/dojo/importers/default_reimporter.py

Lines 202 to 205 in 5b45aaf

    
               cleaned_findings.append(sanitized) 
        
           for idx, unsaved_finding in enumerate(cleaned_findings): 
        
               is_final = idx == len(cleaned_findings) - 1

All finding details can be found in the DryRun Security Dashboard.

github-actions · 2025-09-22T19:55:54Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions · 2025-09-23T17:10:44Z

Conflicts have been resolved. A maintainer will review the pull request shortly.

Maffooch

I need to learn more about these celery chords! They seem incredible

Maffooch

Very nice :)

mtesauro

Approved

github-actions · 2025-09-29T17:24:06Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions · 2025-09-29T17:24:06Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions · 2025-09-29T17:31:19Z

Conflicts have been resolved. A maintainer will review the pull request shortly.

…efectDojo#12914) * test cases: fix caching of system settings * fix tests * fix caching for github * fix caching for github * simplify cache loading * post process only when needed * set tags on (re)import * rebase set tags * reduce save with options * update counts, reduce saves with options * importers: do not save again, but postprocess directly * update counts * optimize hash_code setting * fix counts * set hash code for new findings in reimport * make smaller second save work * make smaller second save work - add no_options * update query counts * update counts * remove logging * perf3b: compute hash_code on first save * fix cve for reimport * ruff * fix no async * Merge remote-tracking branch 'upstream/dev' into perf3-reduce-saves * make smaller second save work * fix cve for reimport * initial * fix counts * fix counts * simplify * simplify * refactor to await results * handle reimport and close old findings * update query and task counts * switch back to chords * simplify * respect system settings product grading enabled * finding/test delete grading only if enabled * optimize asyn_dupe_delete grading * cleanup comments * fix merge artifact * fix loop * simplify loop * fix reimport loop * revert settings changes * revert settings changes * update counts * extract product grading method call * cleanup model deletes * product grade logging fix * extract chord orchestration into method * fix model traversal

github-actions Bot added the unittests label Aug 4, 2025

github-actions Bot added the conflicts-detected label Aug 4, 2025

valentijnscholten force-pushed the perf4-chord-grade branch from 2193b93 to 0abf18d Compare August 4, 2025 18:16

github-actions Bot added docker New Migration Adding a new migration file. Take care when merging. apiv2 docs integration_tests ui parser helm labels Aug 4, 2025

valentijnscholten force-pushed the perf4-chord-grade branch from 0abf18d to 6008faf Compare August 4, 2025 18:28

github-actions Bot removed docker New Migration Adding a new migration file. Take care when merging. apiv2 docs integration_tests ui parser helm conflicts-detected labels Aug 4, 2025

mtesauro added the conflicts-detected label Sep 4, 2025

valentijnscholten force-pushed the perf4-chord-grade branch from 6008faf to 5bed9bb Compare September 6, 2025 19:30

github-actions Bot removed the conflicts-detected label Sep 6, 2025

github-actions Bot added the conflicts-detected label Sep 12, 2025

test cases: fix caching of system settings

a1ee613

valentijnscholten marked this pull request as ready for review September 19, 2025 20:09

github-actions Bot added the conflicts-detected label Sep 22, 2025

Merge remote-tracking branch 'upstream/dev' into perf4-chord-grade

cc6b9b1

github-actions Bot removed the conflicts-detected label Sep 23, 2025

valentijnscholten added 2 commits September 23, 2025 19:32

update counts

edd5815

Merge remote-tracking branch 'upstream/dev' into perf4-chord-grade

b3a18e3

Maffooch requested changes Sep 25, 2025

View reviewed changes

Comment thread dojo/importers/default_reimporter.py Outdated

Comment thread dojo/importers/default_importer.py

Comment thread dojo/models.py Outdated

Comment thread dojo/models.py Outdated

valentijnscholten added 5 commits September 26, 2025 17:08

extract product grading method call

4de0f60

cleanup model deletes

d774ac7

product grade logging fix

360b788

extract chord orchestration into method

6a2c516

fix model traversal

dba9a40

Maffooch approved these changes Sep 26, 2025

View reviewed changes

Maffooch requested review from blakeaowens and dogboat September 26, 2025 17:18

mtesauro approved these changes Sep 26, 2025

View reviewed changes

blakeaowens approved these changes Sep 26, 2025

View reviewed changes

Merge remote-tracking branch 'upstream/dev' into perf4-chord-grade

6dae387

github-actions Bot added the conflicts-detected label Sep 29, 2025

Merge remote-tracking branch 'upstream/dev' into perf4-chord-grade

5b45aaf

github-actions Bot removed the conflicts-detected label Sep 29, 2025

dogboat approved these changes Sep 29, 2025

View reviewed changes

valentijnscholten merged commit 927e261 into DefectDojo:dev Sep 29, 2025
88 checks passed

Conversation

valentijnscholten commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Aug 4, 2025

Uh oh!

github-actions Bot commented Aug 4, 2025

Uh oh!

github-actions Bot commented Sep 6, 2025

Uh oh!

github-actions Bot commented Sep 12, 2025

Uh oh!

dryrunsecurity Bot commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Sep 22, 2025

Uh oh!

github-actions Bot commented Sep 23, 2025

Uh oh!

Maffooch left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Maffooch left a comment

Choose a reason for hiding this comment

Uh oh!

mtesauro left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Sep 29, 2025

Uh oh!

github-actions Bot commented Sep 29, 2025

Uh oh!

github-actions Bot commented Sep 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

valentijnscholten commented Aug 4, 2025 •

edited

Loading

dryrunsecurity Bot commented Sep 19, 2025 •

edited

Loading