Skip to content

Reduce and optimize number of product grading calls using a Chord#12914

Merged
valentijnscholten merged 59 commits intoDefectDojo:devfrom
valentijnscholten:perf4-chord-grade
Sep 29, 2025
Merged

Reduce and optimize number of product grading calls using a Chord#12914
valentijnscholten merged 59 commits intoDefectDojo:devfrom
valentijnscholten:perf4-chord-grade

Conversation

@valentijnscholten
Copy link
Copy Markdown
Member

@valentijnscholten valentijnscholten commented Aug 4, 2025

The current implementation of the import and reimport executes the product grading for each finding that is imported. This results in lots of background celery tasks putting a non-trivial strain on the database. In large imports it could mean minutes of extra processing purely for product grading all of the same product.
The same happens in the async_dupe_delete task that runs every minute. And it happens if a product or engagement is deleted.

This PR optimizes these cases. The import/reimport case uses a celery chord to reduce this to only a couple of product grading.

By design the chord will only start processing once all tasks have been submitted to it. That's why we ramp up processing with multiple chords starting with small sizes increasing exponentially up to 1024.
If product grading is disabled we use a group and no product grading.

To be able to use a chord, we need to generate signatures which are similar to lambda's than can be passed around (to celery tasks/chords/groups). This needs a little custom code because we have the @dojo_async_task decorator which wraps around @app.task. If we use similar celery constructs more in the future, we may want to remove @dojo_async_task and/or replace it with something simpler.

Notes:

  • Instead of a chord I also tried just launching the tasks directly and await their results. But this involved setting timeouts which can be hard for large imports or busy servers. I considered batching/chaining, but then we would just be reimplementing celery. So I settled for the multi-chord approach. A single chord approach would defer the start of post processing until after all findings would have been created.
  • I tried to solve this by generating and yielding signatures and use a the generator to cosntruct a chord. But it looks like this used to work in Celery 4, but no longer works in Celery 5. So multi-chord it is.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Aug 4, 2025

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Aug 4, 2025

Conflicts have been resolved. A maintainer will review the pull request shortly.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Sep 6, 2025

Conflicts have been resolved. A maintainer will review the pull request shortly.

@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@valentijnscholten valentijnscholten marked this pull request as ready for review September 19, 2025 20:09
@dryrunsecurity
Copy link
Copy Markdown

dryrunsecurity Bot commented Sep 19, 2025

DryRun Security

This pull request introduces three issues: unhandled exceptions when reading System_Settings in calculate_grade_internal that can crash Celery tasks and halt imports, a race condition where asynchronous finding-processing group tasks may not complete before product grading runs leading to inconsistent product grades, and potential resource exhaustion from building an in-memory list of all parsed findings (cleaned_findings) which can cause DoS on large reports.

Denial of Service due to Unhandled Exception in dojo/utils.py
Vulnerability Denial of Service due to Unhandled Exception
Description The calculate_grade_internal function, which is part of critical asynchronous workflows (like finding deletion and scan imports), calls System_Settings.objects.get() without handling potential DoesNotExist or MultipleObjectsReturned exceptions. If the System_Settings table is empty or contains more than one entry, this will cause an unhandled exception, leading to the Celery task failing. Since these tasks are often part of larger chains (e.g., chords in scan imports), a failure here can halt the entire import process or prevent product grading, causing a denial of service for these features.

system_settings = System_Settings.objects.get()
if not product:
logger.warning("ignoring calculate product for product None!")

Inconsistent Product Grade due to Race Condition in dojo/finding/helper.py
Vulnerability Inconsistent Product Grade due to Race Condition
Description The new asynchronous finding processing logic introduces a race condition that can lead to an inconsistent product grade. When system_settings.enable_product_grade is disabled, the maybe_launch_post_processing_chord function dispatches finding processing tasks using a Celery group. A group does not wait for its constituent tasks to complete before returning. Immediately after this dispatch, the main process calls perform_product_grading synchronously. This means the product grade can be calculated based on stale data, as the finding modifications (creations, updates, deletions) initiated by the group tasks may not have finished executing in the background. This results in an inaccurate security grade for the product until all asynchronous tasks eventually complete.

finding_group.findings.add(*findings)
@dojo_model_to_id
@dojo_async_task(signature=True)
@app.task
@dojo_model_from_id
def post_process_finding_save_signature(finding, dedupe_option=True, rules_option=True, product_grading_option=True, # noqa: FBT002
issue_updater_option=True, push_to_jira=False, user=None, *args, **kwargs): # noqa: FBT002 - this is bit hard to fix nice have this universally fixed
"""
Returns a task signature for post-processing a finding. This is useful for creating task signatures
that can be used in chords or groups or to await results. We need this extra method because of our dojo_async decorator.
If we use more of these celery features, we should probably move away from that decorator.
"""
return post_process_finding_save_internal(finding, dedupe_option, rules_option, product_grading_option,
issue_updater_option, push_to_jira, user, *args, **kwargs)
@dojo_model_to_id
@dojo_async_task
@app.task
@dojo_model_from_id
def post_process_finding_save(finding, dedupe_option=True, rules_option=True, product_grading_option=True, # noqa: FBT002
issue_updater_option=True, push_to_jira=False, user=None, *args, **kwargs): # noqa: FBT002 - this is bit hard to fix nice have this universally fixed
return post_process_finding_save_internal(finding, dedupe_option, rules_option, product_grading_option,
issue_updater_option, push_to_jira, user, *args, **kwargs)
def post_process_finding_save_internal(finding, dedupe_option=True, rules_option=True, product_grading_option=True, # noqa: FBT002
issue_updater_option=True, push_to_jira=False, user=None, *args, **kwargs): # noqa: FBT002 - this is bit hard to fix nice have this universally fixed
if not finding:
logger.warning("post_process_finding_save called with finding==None, skipping post processing")
return

Denial of Service (DoS) via Resource Exhaustion in dojo/importers/default_reimporter.py
Vulnerability Denial of Service (DoS) via Resource Exhaustion
Description The process_findings function in both default_reimporter.py and default_importer.py creates an in-memory list called cleaned_findings and populates it with all findings parsed from an uploaded report. This means that if a user uploads a report containing a very large number of findings, the application will attempt to load all of them into memory, potentially exhausting available server resources and leading to a Denial of Service.

cleaned_findings.append(sanitized)
for idx, unsaved_finding in enumerate(cleaned_findings):
is_final = idx == len(cleaned_findings) - 1


All finding details can be found in the DryRun Security Dashboard.

@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@github-actions
Copy link
Copy Markdown
Contributor

Conflicts have been resolved. A maintainer will review the pull request shortly.

Copy link
Copy Markdown
Contributor

@Maffooch Maffooch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to learn more about these celery chords! They seem incredible

Comment thread dojo/importers/default_reimporter.py Outdated
Comment thread dojo/importers/default_importer.py
Comment thread dojo/models.py Outdated
Comment thread dojo/models.py Outdated
Copy link
Copy Markdown
Contributor

@Maffooch Maffooch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice :)

Copy link
Copy Markdown
Contributor

@mtesauro mtesauro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved

@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@github-actions
Copy link
Copy Markdown
Contributor

Conflicts have been resolved. A maintainer will review the pull request shortly.

@valentijnscholten valentijnscholten merged commit 927e261 into DefectDojo:dev Sep 29, 2025
88 checks passed
Maffooch pushed a commit to valentijnscholten/django-DefectDojo that referenced this pull request Feb 16, 2026
…efectDojo#12914)

* test cases: fix caching of system settings

* fix tests

* fix caching for github

* fix caching for github

* simplify cache loading

* post process only when needed

* set tags on (re)import

* rebase set tags

* reduce save with options

* update counts, reduce saves with options

* importers: do not save again, but postprocess directly

* update counts

* optimize hash_code setting

* fix counts

* set hash code for new findings in reimport

* make smaller second save work

* make smaller second save work - add no_options

* update query counts

* update counts

* remove logging

* perf3b: compute hash_code on first save

* fix cve for reimport

* ruff

* fix no async

* Merge remote-tracking branch 'upstream/dev' into perf3-reduce-saves

* make smaller second save work

* fix cve for reimport

* initial

* fix counts

* fix counts

* simplify

* simplify

* refactor to await results

* handle reimport and close old findings

* update query and task counts

* switch back to chords

* simplify

* respect system settings product grading enabled

* finding/test delete grading only if enabled

* optimize asyn_dupe_delete grading

* cleanup comments

* fix merge artifact

* fix loop

* simplify loop

* fix reimport loop

* revert settings changes

* revert settings changes

* update counts

* extract product grading method call

* cleanup model deletes

* product grade logging fix

* extract chord orchestration into method

* fix model traversal
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants