Reduce and optimize number of product grading calls using a Chord#12914
Reduce and optimize number of product grading calls using a Chord#12914valentijnscholten merged 59 commits intoDefectDojo:devfrom
Chord#12914Conversation
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
2193b93 to
0abf18d
Compare
0abf18d to
6008faf
Compare
|
Conflicts have been resolved. A maintainer will review the pull request shortly. |
6008faf to
5bed9bb
Compare
|
Conflicts have been resolved. A maintainer will review the pull request shortly. |
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
|
This pull request introduces three issues: unhandled exceptions when reading System_Settings in calculate_grade_internal that can crash Celery tasks and halt imports, a race condition where asynchronous finding-processing group tasks may not complete before product grading runs leading to inconsistent product grades, and potential resource exhaustion from building an in-memory list of all parsed findings (cleaned_findings) which can cause DoS on large reports.
Denial of Service due to Unhandled Exception in
|
| Vulnerability | Denial of Service due to Unhandled Exception |
|---|---|
| Description | The calculate_grade_internal function, which is part of critical asynchronous workflows (like finding deletion and scan imports), calls System_Settings.objects.get() without handling potential DoesNotExist or MultipleObjectsReturned exceptions. If the System_Settings table is empty or contains more than one entry, this will cause an unhandled exception, leading to the Celery task failing. Since these tasks are often part of larger chains (e.g., chords in scan imports), a failure here can halt the entire import process or prevent product grading, causing a denial of service for these features. |
django-DefectDojo/dojo/utils.py
Lines 1577 to 1579 in 5b45aaf
Inconsistent Product Grade due to Race Condition in dojo/finding/helper.py
| Vulnerability | Inconsistent Product Grade due to Race Condition |
|---|---|
| Description | The new asynchronous finding processing logic introduces a race condition that can lead to an inconsistent product grade. When system_settings.enable_product_grade is disabled, the maybe_launch_post_processing_chord function dispatches finding processing tasks using a Celery group. A group does not wait for its constituent tasks to complete before returning. Immediately after this dispatch, the main process calls perform_product_grading synchronously. This means the product grade can be calculated based on stale data, as the finding modifications (creations, updates, deletions) initiated by the group tasks may not have finished executing in the background. This results in an inaccurate security grade for the product until all asynchronous tasks eventually complete. |
django-DefectDojo/dojo/finding/helper.py
Lines 379 to 413 in 5b45aaf
Denial of Service (DoS) via Resource Exhaustion in dojo/importers/default_reimporter.py
| Vulnerability | Denial of Service (DoS) via Resource Exhaustion |
|---|---|
| Description | The process_findings function in both default_reimporter.py and default_importer.py creates an in-memory list called cleaned_findings and populates it with all findings parsed from an uploaded report. This means that if a user uploads a report containing a very large number of findings, the application will attempt to load all of them into memory, potentially exhausting available server resources and leading to a Denial of Service. |
django-DefectDojo/dojo/importers/default_reimporter.py
Lines 202 to 205 in 5b45aaf
All finding details can be found in the DryRun Security Dashboard.
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
|
Conflicts have been resolved. A maintainer will review the pull request shortly. |
Maffooch
left a comment
There was a problem hiding this comment.
I need to learn more about these celery chords! They seem incredible
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
1 similar comment
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
|
Conflicts have been resolved. A maintainer will review the pull request shortly. |
…efectDojo#12914) * test cases: fix caching of system settings * fix tests * fix caching for github * fix caching for github * simplify cache loading * post process only when needed * set tags on (re)import * rebase set tags * reduce save with options * update counts, reduce saves with options * importers: do not save again, but postprocess directly * update counts * optimize hash_code setting * fix counts * set hash code for new findings in reimport * make smaller second save work * make smaller second save work - add no_options * update query counts * update counts * remove logging * perf3b: compute hash_code on first save * fix cve for reimport * ruff * fix no async * Merge remote-tracking branch 'upstream/dev' into perf3-reduce-saves * make smaller second save work * fix cve for reimport * initial * fix counts * fix counts * simplify * simplify * refactor to await results * handle reimport and close old findings * update query and task counts * switch back to chords * simplify * respect system settings product grading enabled * finding/test delete grading only if enabled * optimize asyn_dupe_delete grading * cleanup comments * fix merge artifact * fix loop * simplify loop * fix reimport loop * revert settings changes * revert settings changes * update counts * extract product grading method call * cleanup model deletes * product grade logging fix * extract chord orchestration into method * fix model traversal
The current implementation of the import and reimport executes the product grading for each finding that is imported. This results in lots of background celery tasks putting a non-trivial strain on the database. In large imports it could mean minutes of extra processing purely for product grading all of the same product.
The same happens in the
async_dupe_deletetask that runs every minute. And it happens if a product or engagement is deleted.This PR optimizes these cases. The import/reimport case uses a celery chord to reduce this to only a couple of product grading.
By design the chord will only start processing once all tasks have been submitted to it. That's why we ramp up processing with multiple chords starting with small sizes increasing exponentially up to 1024.
If product grading is disabled we use a
groupand no product grading.To be able to use a chord, we need to generate signatures which are similar to lambda's than can be passed around (to celery tasks/chords/groups). This needs a little custom code because we have the
@dojo_async_taskdecorator which wraps around@app.task. If we use similar celery constructs more in the future, we may want to remove@dojo_async_taskand/or replace it with something simpler.Notes: