Release: Merge release into master from: release/2.51.1 by github-actions[bot] · Pull Request #13421 · DefectDojo/django-DefectDojo

github-actions · 2025-10-14T15:44:37Z

Release triggered by rossops

….52.0-dev Release: Merge back 2.51.0 into bugfix from: master-into-bugfix/2.51.0-2.52.0-dev

Bumps [django](https://github.com/django/django) from 5.1.12 to 5.1.13. - [Commits](django/django@5.1.12...5.1.13) --- updated-dependencies: - dependency-name: django dependency-version: 5.1.13 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…n Acunetix XML parser

…#13326)

* watson middleware: skip logging if no instances updated * watson middleware: skip logging if no instances updated

feat(helm): Add support for automountServiceAccountToken

…ttests add unit tests to test importer deduplication

fix: handle broken endpoints when <StartURL> includes a port number in Acunetix XML parser

…mprovements pghistory improvements: backfill and "empty" changes

dryrunsecurity · 2025-10-14T15:48:49Z

🔴 Risk threshold exceeded.

This pull request contains several security concerns: sensitive edits in dojo/user/views.py, verbose DB query logging in pghistory_backfill.py that can expose sensitive data if run in production, use of a third-party GitHub Action that can push changes (supply-chain risk), and multiple potential SQL injection issues in pghistory_backfill_simple.py and pghistory_backfill_fast.py where table/identifier names are interpolated into SQL without proper validation or quoting.

🔴 Configured Codepaths Edit in dojo/user/views.py

Vulnerability	Configured Codepaths Edit
Description	Sensitive edits detected for this file. Sensitive file paths and allowed authors can be configured in `.dryrunsecurity.yaml`.

🟡 Potential SQL Injection in dojo/management/commands/pghistory_backfill_fast.py

Vulnerability	Potential SQL Injection
Description	The command constructs and executes multiple raw SQL statements by interpolating table names and SQL fragments into query strings (e.g. f"SELECT COUNT(*) FROM {table_name}"). Table names are derived from the model_name parameter (which can be influenced via the --model CLI argument) and are inserted directly into SQL without validation or proper quoting. Although most value parameters use parameterized queries, any SQL constructed with f-strings or string concatenation (especially identifiers like table names or the COPY SQL) can allow injection or cause malformed SQL if an attacker controls model_name or other derived strings. Additionally, queries that embed event_table_name and table_name directly (including the COPY statement) are executed against the database without sanitizing or quoting identifiers.

django-DefectDojo/dojo/management/commands/pghistory_backfill_fast.py

Lines 1 to 543 in a1737ee

    
           """ 
        
           Management command to backfill existing data into django-pghistory using COPY. 
        
           This command creates initial snapshots for all existing records in tracked models 
        
           using PostgreSQL COPY for maximum performance. 
        
           """ 
        
           import io 
        
           import logging 
        
           import time 
        
           from django.conf import settings 
        
           from django.core.management.base import BaseCommand 
        
           from django.db import connection 
        
           from django.utils import timezone 
        
           logger = logging.getLogger(__name__) 
        
           class Command(BaseCommand): 
        
               help = "Backfill existing data into django-pghistory using COPY" 
        
               def add_arguments(self, parser): 
        
                   parser.add_argument( 
        
                       "--model", 
        
                       type=str, 
        
                       help='Specific model to backfill (e.g., "Finding", "Product")', 
        
                   ) 
        
                   parser.add_argument( 
        
                       "--batch-size", 
        
                       type=int, 
        
                       default=10000, 
        
                       help="Number of records to process in each batch (default: 10000)", 
        
                   ) 
        
                   parser.add_argument( 
        
                       "--dry-run", 
        
                       action="store_true", 
        
                       help="Show what would be done without actually creating events", 
        
                   ) 
        
                   parser.add_argument( 
        
                       "--log-queries", 
        
                       action="store_true", 
        
                       help="Enable database query logging (default: enabled)", 
        
                   ) 
        
                   parser.add_argument( 
        
                       "--no-log-queries", 
        
                       action="store_true", 
        
                       help="Disable database query logging", 
        
                   ) 
        
               def get_excluded_fields(self, model_name): 
        
                   """Get the list of excluded fields for a specific model from pghistory configuration.""" 
        
                   # Define excluded fields for each model (matching auditlog.py) 
        
                   excluded_fields_map = { 
        
                       "Dojo_User": ["password"], 
        
                       "Product": ["updated"],  # This is the key change 
        
                       "Cred_User": ["password"], 
        
                       "Notification_Webhooks": ["header_name", "header_value"], 
        
                   } 
        
                   return excluded_fields_map.get(model_name, []) 
        
               def process_model_with_copy(self, model_name, batch_size, dry_run): 
        
                   """Process a single model using COPY operations with raw SQL.""" 
        
                   try: 
        
                       # Get table names using raw SQL 
        
                       # Handle special cases for table naming 
        
                       if model_name == "Dojo_User": 
        
                           table_name = "dojo_dojo_user" 
        
                           event_table_name = "dojo_dojo_userevent" 
        
                       elif model_name == "Product_Type": 
        
                           table_name = "dojo_product_type" 
        
                           event_table_name = "dojo_product_typeevent" 
        
                       elif model_name == "Finding_Group": 
        
                           table_name = "dojo_finding_group" 
        
                           event_table_name = "dojo_finding_groupevent" 
        
                       elif model_name == "Risk_Acceptance": 
        
                           table_name = "dojo_risk_acceptance" 
        
                           event_table_name = "dojo_risk_acceptanceevent" 
        
                       elif model_name == "Finding_Template": 
        
                           table_name = "dojo_finding_template" 
        
                           event_table_name = "dojo_finding_templateevent" 
        
                       elif model_name == "Cred_User": 
        
                           table_name = "dojo_cred_user" 
        
                           event_table_name = "dojo_cred_userevent" 
        
                       elif model_name == "Notification_Webhooks": 
        
                           table_name = "dojo_notification_webhooks" 
        
                           event_table_name = "dojo_notification_webhooksevent" 
        
                       else: 
        
                           table_name = f"dojo_{model_name.lower()}" 
        
                           event_table_name = f"dojo_{model_name.lower()}event" 
        
                       # Check if tables exist 
        
                       with connection.cursor() as cursor: 
        
                           cursor.execute(""" 
        
                               SELECT EXISTS ( 
        
                                   SELECT FROM information_schema.tables 
        
                                   WHERE table_name = %s 
        
                               ) 
        
                           """, [table_name]) 
        
                           table_exists = cursor.fetchone()[0] 
        
                           cursor.execute(""" 
        
                               SELECT EXISTS ( 
        
                                   SELECT FROM information_schema.tables 
        
                                   WHERE table_name = %s 
        
                               ) 
        
                           """, [event_table_name]) 
        
                           event_table_exists = cursor.fetchone()[0] 
        
                       if not table_exists: 
        
                           self.stdout.write(f"  Table {table_name} not found") 
        
                           return 0, 0.0 
        
                       if not event_table_exists: 
        
                           self.stdout.write( 
        
                               self.style.ERROR( 
        
                                   f"  Event table {event_table_name} not found. " 
        
                                   f"Is {model_name} tracked by pghistory?", 
        
                               ), 
        
                           ) 
        
                           return 0, 0.0 
        
                       # Get total count using raw SQL 
        
                       with connection.cursor() as cursor: 
        
                           cursor.execute(f"SELECT COUNT(*) FROM {table_name}") 
        
                           total_count = cursor.fetchone()[0] 
        
                       if total_count == 0: 
        
                           self.stdout.write(f"  No records found for {model_name}") 
        
                           return 0, 0.0 
        
                       self.stdout.write(f"  Found {total_count:,} records") 
        
                       # Get excluded fields 
        
                       excluded_fields = self.get_excluded_fields(model_name) 
        
                       # Check if records already have initial_import events using raw SQL 
        
                       with connection.cursor() as cursor: 
        
                           cursor.execute(f"SELECT COUNT(*) FROM {event_table_name} WHERE pgh_label = 'initial_import'") 
        
                           existing_count = cursor.fetchone()[0] 
        
                       # Get records that need backfill using raw SQL 
        
                       with connection.cursor() as cursor: 
        
                           cursor.execute(f""" 
        
                               SELECT COUNT(*) FROM {table_name} t 
        
                               WHERE NOT EXISTS ( 
        
                                   SELECT 1 FROM {event_table_name} e 
        
                                   WHERE e.pgh_obj_id = t.id AND e.pgh_label = 'initial_import' 
        
                               ) 
        
                           """) 
        
                           backfill_count = cursor.fetchone()[0] 
        
                       # Log the breakdown 
        
                       self.stdout.write(f"  Records with initial_import events: {existing_count:,}") 
        
                       self.stdout.write(f"  Records needing initial_import events: {backfill_count:,}") 
        
                       if backfill_count == 0: 
        
                           self.stdout.write( 
        
                               self.style.SUCCESS(f"  ✓ All {total_count:,} records already have initial_import events"), 
        
                           ) 
        
                           return total_count, 0.0 
        
                       if dry_run: 
        
                           self.stdout.write(f"  Would process {backfill_count:,} records using COPY...") 
        
                           return backfill_count, 0.0 
        
                       # Get event table columns using raw SQL (excluding auto-generated pgh_id) 
        
                       with connection.cursor() as cursor: 
        
                           cursor.execute(""" 
        
                               SELECT column_name 
        
                               FROM information_schema.columns 
        
                               WHERE table_name = %s AND column_name != 'pgh_id' 
        
                               ORDER BY ordinal_position 
        
                           """, [event_table_name]) 
        
                           event_columns = [row[0] for row in cursor.fetchall()] 
        
                       # Get all IDs that need backfill first 
        
                       with connection.cursor() as cursor: 
        
                           cursor.execute(f""" 
        
                               SELECT t.id FROM {table_name} t 
        
                               WHERE NOT EXISTS ( 
        
                                   SELECT 1 FROM {event_table_name} e 
        
                                   WHERE e.pgh_obj_id = t.id AND e.pgh_label = 'initial_import' 
        
                               ) 
        
                               ORDER BY t.id 
        
                           """) 
        
                           ids_to_process = [row[0] for row in cursor.fetchall()] 
        
                       if not ids_to_process: 
        
                           self.stdout.write("  No records need backfill") 
        
                           return 0, 0.0 
        
                       # Process records in batches using raw SQL 
        
                       processed = 0 
        
                       batch_start_time = time.time() 
        
                       model_start_time = time.time()  # Track model start time 
        
                       # Get column names for the source table 
        
                       with connection.cursor() as cursor: 
        
                           cursor.execute(""" 
        
                               SELECT column_name 
        
                               FROM information_schema.columns 
        
                               WHERE table_name = %s 
        
                               ORDER BY ordinal_position 
        
                           """, [table_name]) 
        
                           source_columns = [row[0] for row in cursor.fetchall()] 
        
                       # Filter out excluded fields from source columns 
        
                       source_columns = [col for col in source_columns if col not in excluded_fields] 
        
                       # Process in batches 
        
                       consecutive_failures = 0 
        
                       max_failures = 3 
        
                       for i in range(0, len(ids_to_process), batch_size): 
        
                           batch_ids = ids_to_process[i:i + batch_size] 
        
                           # Log progress every 10 batches 
        
                           if i > 0 and i % (batch_size * 10) == 0: 
        
                               self.stdout.write(f"  Processing batch starting at index {i:,}...") 
        
                           # Get batch of records using raw SQL with specific IDs 
        
                           columns_str = ", ".join(source_columns) 
        
                           placeholders = ", ".join(["%s"] * len(batch_ids)) 
        
                           query = f""" 
        
                               SELECT {columns_str} FROM {table_name} t 
        
                               WHERE t.id IN ({placeholders}) 
        
                               ORDER BY t.id 
        
                           """ 
        
                           with connection.cursor() as cursor: 
        
                               cursor.execute(query, batch_ids) 
        
                               batch_rows = cursor.fetchall() 
        
                           if not batch_rows: 
        
                               self.stdout.write(f"  No records found for batch at index {i}") 
        
                               continue 
        
                           # Use PostgreSQL COPY as described in the article 
        
                           try: 
        
                               # Prepare data for COPY using a custom file-like object 
        
                               class FileLikeObject: 
        
                                   def __init__(self): 
        
                                       self.data = io.BytesIO() 
        
                                   def write(self, data): 
        
                                       return self.data.write(data) 
        
                                   def read(self, size=-1): 
        
                                       return self.data.read(size) 
        
                                   def seek(self, pos): 
        
                                       return self.data.seek(pos) 
        
                                   def tell(self): 
        
                                       return self.data.tell() 
        
                                   def __len__(self): 
        
                                       return len(self.data.getvalue()) 
        
                                   def getvalue(self): 
        
                                       return self.data.getvalue() 
        
                               copy_buffer = FileLikeObject() 
        
                               for row in batch_rows: 
        
                                   row_data = [] 
        
                                   # Create a mapping of source columns to values 
        
                                   source_values = {} 
        
                                   for idx, value in enumerate(row): 
        
                                       field_name = source_columns[idx] 
        
                                       # Convert value to string for COPY 
        
                                       if value is None: 
        
                                           source_values[field_name] = "" 
        
                                       elif isinstance(value, bool): 
        
                                           source_values[field_name] = "t" if value else "f" 
        
                                       elif hasattr(value, "isoformat"):  # datetime objects 
        
                                           source_values[field_name] = value.isoformat() 
        
                                       else: 
        
                                           source_values[field_name] = str(value) 
        
                                   # Build row data in the order of event_columns 
        
                                   for col in event_columns: 
        
                                       if col == "pgh_created_at": 
        
                                           row_data.append(timezone.now().isoformat()) 
        
                                       elif col == "pgh_label": 
        
                                           row_data.append("initial_import") 
        
                                       elif col == "pgh_obj_id": 
        
                                           row_data.append(str(row[0]) if row[0] is not None else "")  # Assuming first column is id 
        
                                       elif col == "pgh_context_id": 
        
                                           row_data.append("")  # Empty for backfilled events 
        
                                       elif col in source_values: 
        
                                           row_data.append(source_values[col]) 
        
                                       else: 
        
                                           row_data.append("")  # Default empty value 
        
                                   # Write tab-separated row to buffer as bytes 
        
                                   copy_buffer.write(("\t".join(row_data) + "\n").encode("utf-8")) 
        
                               copy_buffer.seek(0) 
        
                               # Debug: Show what we're about to copy 
        
                               self.stdout.write(f"  Batch {i // batch_size + 1}: Writing to table: {event_table_name}") 
        
                               # Use PostgreSQL COPY with psycopg3 syntax 
        
                               with connection.cursor() as cursor: 
        
                                   # Get the underlying raw cursor to bypass Django's wrapper 
        
                                   raw_cursor = cursor.cursor 
        
                                   # Use the copy method (psycopg3 syntax) 
        
                                   copy_sql = f"COPY {event_table_name} ({', '.join(event_columns)}) FROM STDIN WITH (FORMAT text, DELIMITER E'\\t')" 
        
                                   try: 
        
                                       # Use psycopg3 copy syntax as per documentation 
        
                                       # Prepare data as list of tuples for write_row() 
        
                                       records = [] 
        
                                       for row in batch_rows: 
        
                                           row_data = [] 
        
                                           # Create a mapping of source columns to values 
        
                                           source_values = {} 
        
                                           for idx, value in enumerate(row): 
        
                                               field_name = source_columns[idx] 
        
                                               source_values[field_name] = value 
        
                                           # Build row data in the order of event_columns 
        
                                           for col in event_columns: 
        
                                               if col == "pgh_created_at": 
        
                                                   row_data.append(timezone.now()) 
        
                                               elif col == "pgh_label": 
        
                                                   row_data.append("initial_import") 
        
                                               elif col == "pgh_obj_id": 
        
                                                   row_data.append(row[0])  # Assuming first column is id 
        
                                               elif col == "pgh_context_id": 
        
                                                   row_data.append(None)  # Empty for backfilled events 
        
                                               elif col in source_values: 
        
                                                   row_data.append(source_values[col]) 
        
                                               else: 
        
                                                   row_data.append(None)  # Default NULL value 
        
                                           records.append(tuple(row_data)) 
        
                                       # Use COPY with write_row() as per psycopg3 docs 
        
                                       with raw_cursor.copy(copy_sql) as copy: 
        
                                           for record in records: 
        
                                               copy.write_row(record) 
        
                                           self.stdout.write("  COPY operation completed using write_row") 
        
                                       # Commit the transaction to persist the data 
        
                                       raw_cursor.connection.commit() 
        
                                       # Debug: Check if data was inserted 
        
                                       raw_cursor.execute(f"SELECT COUNT(*) FROM {event_table_name} WHERE pgh_label = 'initial_import'") 
        
                                       count = raw_cursor.fetchone()[0] 
        
                                       self.stdout.write(f"  Records in event table after batch: {count}") 
        
                                   except Exception as copy_error: 
        
                                       self.stdout.write(f"  COPY error: {copy_error}") 
        
                                       # Try to get more details about the error 
        
                                       raw_cursor.execute("SELECT * FROM pg_stat_activity WHERE state = 'active'") 
        
                                       self.stdout.write(f"  Active queries: {raw_cursor.fetchall()}") 
        
                                       raise 
        
                               batch_processed = len(batch_rows) 
        
                               processed += batch_processed 
        
                               consecutive_failures = 0  # Reset failure counter on success 
        
                               # Calculate timing 
        
                               batch_end_time = time.time() 
        
                               batch_duration = batch_end_time - batch_start_time 
        
                               batch_records_per_second = batch_processed / batch_duration if batch_duration > 0 else 0 
        
                               # Log progress 
        
                               progress = (processed / backfill_count) * 100 
        
                               self.stdout.write(f"  Processed {processed:,}/{backfill_count:,} records ({progress:.1f}%) - " 
        
                                               f"Last batch: {batch_duration:.2f}s ({batch_records_per_second:.1f} records/sec)") 
        
                               batch_start_time = time.time()  # Reset for next batch 
        
                           except Exception as e: 
        
                               consecutive_failures += 1 
        
                               logger.error(f"Bulk insert failed for {model_name} batch: {e}") 
        
                               self.stdout.write(f"  Bulk insert failed: {e}") 
        
                               # Log more details about the error 
        
                               self.stdout.write(f"  Processed {processed:,} records before failure") 
        
                               if consecutive_failures >= max_failures: 
        
                                   self.stdout.write(f"  Too many consecutive failures ({consecutive_failures}), stopping processing") 
        
                                   break 
        
                               # Continue with next batch instead of breaking 
        
                               continue 
        
                       # Calculate total timing 
        
                       model_end_time = time.time() 
        
                       total_duration = model_end_time - model_start_time 
        
                       records_per_second = processed / total_duration if total_duration > 0 else 0 
        
                       self.stdout.write( 
        
                           self.style.SUCCESS( 
        
                               f"  ✓ Completed {model_name}: {processed:,} records in {total_duration:.2f}s " 
        
                               f"({records_per_second:.1f} records/sec)", 
        
                           ), 
        
                       ) 
        
                       return processed, records_per_second  # noqa: TRY300 
        
                   except Exception as e: 
        
                       self.stdout.write( 
        
                           self.style.ERROR(f"  ✗ Failed to process {model_name}: {e}"), 
        
                       ) 
        
                       logger.exception(f"Error processing {model_name}") 
        
                       return 0, 0.0 
        
               def enable_db_logging(self): 
        
                   """Enable database query logging for this command.""" 
        
                   # Store original DEBUG setting 
        
                   self.original_debug = settings.DEBUG 
        
                   # Configure database query logging 
        
                   db_logger = logging.getLogger("django.db.backends") 
        
                   db_logger.setLevel(logging.DEBUG) 
        
                   # Add a handler if one doesn't exist 
        
                   if not db_logger.handlers: 
        
                       handler = logging.StreamHandler() 
        
                       formatter = logging.Formatter( 
        
                           "%(asctime)s - %(name)s - %(levelname)s - %(message)s", 
        
                       ) 
        
                       handler.setFormatter(formatter) 
        
                       db_logger.addHandler(handler) 
        
                   # Also enable the SQL logger specifically 
        
                   sql_logger = logging.getLogger("django.db.backends.sql") 
        
                   sql_logger.setLevel(logging.DEBUG) 
        
                   # Ensure the root logger propagates to our handlers 
        
                   if not sql_logger.handlers: 
        
                       sql_logger.addHandler(handler) 
        
                   # Enable query logging in Django settings 
        
                   settings.DEBUG = True 
        
                   self.stdout.write( 
        
                       self.style.SUCCESS("Database query logging enabled"), 
        
                   ) 
        
               def disable_db_logging(self): 
        
                   """Disable database query logging.""" 
        
                   # Restore original DEBUG setting 
        
                   settings.DEBUG = self.original_debug 
        
                   # Disable query logging by setting a higher level 
        
                   logging.getLogger("django.db.backends").setLevel(logging.INFO) 
        
                   logging.getLogger("django.db.backends.sql").setLevel(logging.INFO) 
        
                   self.stdout.write( 
        
                       self.style.SUCCESS("Database query logging disabled"), 
        
                   ) 
        
               def handle(self, *args, **options): 
        
                   if not settings.ENABLE_AUDITLOG or settings.AUDITLOG_TYPE != "django-pghistory": 
        
                       self.stdout.write( 
        
                           self.style.WARNING( 
        
                               "pghistory is not enabled. Set DD_ENABLE_AUDITLOG=True and " 
        
                               "DD_AUDITLOG_TYPE=django-pghistory", 
        
                           ), 
        
                       ) 
        
                       return 
        
                   # Check if we can use COPY (PostgreSQL only) 
        
                   if settings.DATABASES["default"]["ENGINE"] != "django.db.backends.postgresql": 
        
                       self.stdout.write( 
        
                           self.style.ERROR( 
        
                               "COPY operations only available with PostgreSQL. " 
        
                               "Please use the original pghistory_backfill command instead.", 
        
                           ), 
        
                       ) 
        
                       return 
        
                   # Enable database query logging based on options 
        
                   enable_query_logging = not options.get("no_log_queries") 
        
                   if enable_query_logging: 
        
                       self.enable_db_logging() 
        
                   else: 
        
                       self.stdout.write( 
        
                           self.style.WARNING("Database query logging disabled"), 
        
                       ) 
        
                   # Models that are tracked by pghistory 
        
                   tracked_models = [ 
        
                       "Dojo_User", "Endpoint", "Engagement", "Finding", "Finding_Group", 
        
                       "Product_Type", "Product", "Test", "Risk_Acceptance", 
        
                       "Finding_Template", "Cred_User", "Notification_Webhooks", 
        
                   ] 
        
                   specific_model = options.get("model") 
        
                   if specific_model: 
        
                       if specific_model not in tracked_models: 
        
                           self.stdout.write( 
        
                               self.style.ERROR( 
        
                                   f'Model "{specific_model}" is not tracked by pghistory. ' 
        
                                   f'Available models: {", ".join(tracked_models)}', 
        
                               ), 
        
                           ) 
        
                           return 
        
                       tracked_models = [specific_model] 
        
                   batch_size = options["batch_size"] 
        
                   dry_run = options["dry_run"] 
        
                   if dry_run: 
        
                       self.stdout.write( 
        
                           self.style.WARNING("DRY RUN MODE - No events will be created"), 
        
                       ) 
        
                   total_processed = 0 
        
                   total_start_time = time.time() 
        
                   self.stdout.write(f"Starting backfill for {len(tracked_models)} model(s) using PostgreSQL COPY...") 
        
                   for model_name in tracked_models: 
        
                       time.time() 
        
                       self.stdout.write(f"\nProcessing {model_name}...") 
        
                       processed, _ = self.process_model_with_copy( 
        
                           model_name, batch_size, dry_run, 
        
                       ) 
        
                       total_processed += processed 
        
                   # Calculate total timing 
        
                   total_end_time = time.time() 
        
                   total_duration = total_end_time - total_start_time 
        
                   total_records_per_second = total_processed / total_duration if total_duration > 0 else 0 
        
                   # Disable database query logging if it was enabled 
        
                   if enable_query_logging: 
        
                       self.disable_db_logging() 
        
                   self.stdout.write( 
        
                       self.style.SUCCESS( 
        
                           f"\nBACKFILL COMPLETE: Processed {total_processed:,} records in {total_duration:.2f}s " 
        
                           f"({total_records_per_second:.1f} records/sec)", 
        
                       ), 
        
                   )

🟡 Potential SQL Injection in dojo/management/commands/pghistory_backfill_simple.py

Vulnerability	Potential SQL Injection
Description	The command constructs SQL strings using table names interpolated directly into f-strings (e.g. f"SELECT COUNT(*) FROM {table_name}", f"SELECT t.id FROM {table_name} t ... {event_table_name} ...", and the INSERT INTO {event_table_name} ... FROM {table_name} t ..."). While parameterized queries are used for values (e.g. cursor.execute(..., [event_table_name]) when querying information_schema and for passing id arrays), table names and column lists are not passed as query parameters and come from apps.get_model(model_name) output. If any of the model_name values or the resolved table names could be influenced by user input (via the --models argument or other misconfiguration) or if apps.get_model were tricked to return a model with a malicious db_table value, an attacker could inject SQL through those interpolated identifiers. SQL identifiers cannot be parameterized via the DB API, so they must be validated or quoted safely. The vulnerable lines are the places where f-strings include table names directly into SQL statements.

django-DefectDojo/dojo/management/commands/pghistory_backfill_simple.py

Lines 1 to 260 in a1737ee

    
           import logging 
        
           import time 
        
           from django.apps import apps 
        
           from django.core.management.base import BaseCommand 
        
           from django.db import connection 
        
           logger = logging.getLogger(__name__) 
        
           class Command(BaseCommand): 
        
               help = "Backfill pghistory events using direct SQL INSERT - much simpler and faster!" 
        
               def add_arguments(self, parser): 
        
                   parser.add_argument( 
        
                       "--batch-size", 
        
                       type=int, 
        
                       default=10000, 
        
                       help="Number of records to process in each batch", 
        
                   ) 
        
                   parser.add_argument( 
        
                       "--dry-run", 
        
                       action="store_true", 
        
                       help="Show what would be processed without making changes", 
        
                   ) 
        
                   parser.add_argument( 
        
                       "--models", 
        
                       nargs="+", 
        
                       help="Specific models to process (default: all configured models)", 
        
                   ) 
        
               def handle(self, *args, **options): 
        
                   batch_size = options["batch_size"] 
        
                   dry_run = options["dry_run"] 
        
                   specific_models = options.get("models") 
        
                   # Define the models to process 
        
                   models_to_process = [ 
        
                       "Test", 
        
                       "Product", 
        
                       "Finding", 
        
                       "Endpoint", 
        
                       "Dojo_User", 
        
                       "Product_Type", 
        
                       "Finding_Group", 
        
                       "Risk_Acceptance", 
        
                       "Finding_Template", 
        
                       "Cred_User", 
        
                       "Notification_Webhooks", 
        
                   ] 
        
                   if specific_models: 
        
                       models_to_process = [m for m in models_to_process if m in specific_models] 
        
                   self.stdout.write( 
        
                       self.style.SUCCESS( 
        
                           f"Starting backfill for {len(models_to_process)} model(s) using direct SQL INSERT...", 
        
                       ), 
        
                   ) 
        
                   total_processed = 0 
        
                   total_start_time = time.time() 
        
                   for model_name in models_to_process: 
        
                       self.stdout.write(f"\nProcessing {model_name}...") 
        
                       processed, _records_per_second = self.process_model_simple( 
        
                           model_name, batch_size, dry_run, 
        
                       ) 
        
                       total_processed += processed 
        
                   total_duration = time.time() - total_start_time 
        
                   total_records_per_second = total_processed / total_duration if total_duration > 0 else 0 
        
                   self.stdout.write( 
        
                       self.style.SUCCESS( 
        
                           f"\n✓ Backfill completed: {total_processed:,} total records in {total_duration:.2f}s " 
        
                           f"({total_records_per_second:.1f} records/sec)", 
        
                       ), 
        
                   ) 
        
               def get_excluded_fields(self, model_name): 
        
                   """Get the list of excluded fields for a specific model from pghistory configuration.""" 
        
                   excluded_fields_map = { 
        
                       "Dojo_User": ["password"], 
        
                       "Product": ["updated"], 
        
                       "Cred_User": ["password"], 
        
                       "Notification_Webhooks": ["header_name", "header_value"], 
        
                   } 
        
                   return excluded_fields_map.get(model_name, []) 
        
               def process_model_simple(self, model_name, batch_size, dry_run): 
        
                   """Process a single model using direct SQL INSERT - much simpler!""" 
        
                   try: 
        
                       # Get table names 
        
                       table_name, event_table_name = self.get_table_names(model_name) 
        
                       if not table_name or not event_table_name: 
        
                           self.stdout.write(f"  Skipping {model_name}: table not found") 
        
                           return 0, 0.0 
        
                       # Check if event table exists 
        
                       with connection.cursor() as cursor: 
        
                           cursor.execute(""" 
        
                               SELECT EXISTS ( 
        
                                   SELECT 1 FROM information_schema.tables 
        
                                   WHERE table_name = %s 
        
                               ) 
        
                           """, [event_table_name]) 
        
                           if not cursor.fetchone()[0]: 
        
                               self.stdout.write(f"  Skipping {model_name}: event table {event_table_name} not found") 
        
                               return 0, 0.0 
        
                       # Get counts 
        
                       with connection.cursor() as cursor: 
        
                           cursor.execute(f"SELECT COUNT(*) FROM {table_name}") 
        
                           total_count = cursor.fetchone()[0] 
        
                           cursor.execute(f""" 
        
                               SELECT COUNT(*) FROM {table_name} t 
        
                               WHERE NOT EXISTS ( 
        
                                   SELECT 1 FROM {event_table_name} e 
        
                                   WHERE e.pgh_obj_id = t.id AND e.pgh_label = 'initial_import' 
        
                               ) 
        
                           """) 
        
                           backfill_count = cursor.fetchone()[0] 
        
                       if backfill_count == 0: 
        
                           self.stdout.write(f"  No records need backfill for {model_name}") 
        
                           return 0, 0.0 
        
                       self.stdout.write(f"  {backfill_count:,} records need backfill out of {total_count:,} total") 
        
                       if dry_run: 
        
                           self.stdout.write(f"  [DRY RUN] Would process {backfill_count:,} records") 
        
                           return backfill_count, 0.0 
        
                       # Get source columns (excluding pghistory-specific ones) 
        
                       excluded_fields = self.get_excluded_fields(model_name) 
        
                       with connection.cursor() as cursor: 
        
                           cursor.execute(""" 
        
                               SELECT column_name 
        
                               FROM information_schema.columns 
        
                               WHERE table_name = %s 
        
                               ORDER BY ordinal_position 
        
                           """, [table_name]) 
        
                           source_columns = [row[0] for row in cursor.fetchall()] 
        
                       # Filter out excluded fields 
        
                       source_columns = [col for col in source_columns if col not in excluded_fields] 
        
                       # Get event table columns (excluding pgh_id which is auto-generated) 
        
                       with connection.cursor() as cursor: 
        
                           cursor.execute(""" 
        
                               SELECT column_name 
        
                               FROM information_schema.columns 
        
                               WHERE table_name = %s AND column_name != 'pgh_id' 
        
                               ORDER BY ordinal_position 
        
                           """, [event_table_name]) 
        
                           event_columns = [row[0] for row in cursor.fetchall()] 
        
                       # Build the INSERT query - this is the magic! 
        
                       # We use INSERT INTO ... SELECT to directly generate the event data 
        
                       select_columns = [] 
        
                       for col in event_columns: 
        
                           if col == "pgh_created_at": 
        
                               select_columns.append("NOW() as pgh_created_at") 
        
                           elif col == "pgh_label": 
        
                               select_columns.append("'initial_import' as pgh_label") 
        
                           elif col == "pgh_obj_id": 
        
                               select_columns.append("t.id as pgh_obj_id") 
        
                           elif col == "pgh_context_id": 
        
                               select_columns.append("NULL as pgh_context_id") 
        
                           elif col in source_columns: 
        
                               select_columns.append(f"t.{col}") 
        
                           else: 
        
                               select_columns.append("NULL as " + col) 
        
                       # Get all IDs that need backfill 
        
                       with connection.cursor() as cursor: 
        
                           cursor.execute(f""" 
        
                               SELECT t.id FROM {table_name} t 
        
                               WHERE NOT EXISTS ( 
        
                                   SELECT 1 FROM {event_table_name} e 
        
                                   WHERE e.pgh_obj_id = t.id AND e.pgh_label = 'initial_import' 
        
                               ) 
        
                               ORDER BY t.id 
        
                           """) 
        
                           ids_to_process = [row[0] for row in cursor.fetchall()] 
        
                       if not ids_to_process: 
        
                           self.stdout.write("  No records need backfill") 
        
                           return 0, 0.0 
        
                       # Process in batches using direct SQL 
        
                       processed = 0 
        
                       model_start_time = time.time() 
        
                       for i in range(0, len(ids_to_process), batch_size): 
        
                           batch_ids = ids_to_process[i:i + batch_size] 
        
                           # Log progress every 10 batches 
        
                           if i > 0 and i % (batch_size * 10) == 0: 
        
                               self.stdout.write(f"  Processing batch starting at index {i:,}...") 
        
                           # The magic happens here - direct SQL INSERT! 
        
                           insert_sql = f""" 
        
                               INSERT INTO {event_table_name} ({', '.join(event_columns)}) 
        
                               SELECT {', '.join(select_columns)} 
        
                               FROM {table_name} t 
        
                               WHERE t.id = ANY(%s) 
        
                               ORDER BY t.id 
        
                           """ 
        
                           with connection.cursor() as cursor: 
        
                               cursor.execute(insert_sql, [batch_ids]) 
        
                               batch_processed = cursor.rowcount 
        
                               processed += batch_processed 
        
                           # Log progress every 10 batches 
        
                           if i > 0 and i % (batch_size * 10) == 0: 
        
                               progress = (i + batch_size) / len(ids_to_process) * 100 
        
                               self.stdout.write(f"  Processed {processed:,}/{backfill_count:,} records ({progress:.1f}%)") 
        
                       # Calculate timing 
        
                       model_end_time = time.time() 
        
                       total_duration = model_end_time - model_start_time 
        
                       records_per_second = processed / total_duration if total_duration > 0 else 0 
        
                       self.stdout.write( 
        
                           self.style.SUCCESS( 
        
                               f"  ✓ Completed {model_name}: {processed:,} records in {total_duration:.2f}s " 
        
                               f"({records_per_second:.1f} records/sec)", 
        
                           ), 
        
                       ) 
        
                       return processed, records_per_second  # noqa: TRY300 
        
                   except Exception as e: 
        
                       self.stdout.write( 
        
                           self.style.ERROR(f"  ✗ Failed to process {model_name}: {e}"), 
        
                       ) 
        
                       logger.exception(f"Error processing {model_name}") 
        
                       return 0, 0.0 
        
               def get_table_names(self, model_name): 
        
                   """Get the actual table names for a model using Django's model metadata.""" 
        
                   try: 
        
                       # Get the Django model 
        
                       Model = apps.get_model("dojo", model_name) 
        
                       table_name = Model._meta.db_table 
        
                       # Get the corresponding Event model 
        
                       event_table_name = f"{model_name}Event" 
        
                       EventModel = apps.get_model("dojo", event_table_name) 
        
                       event_table_name = EventModel._meta.db_table 
        
                       return table_name, event_table_name  # noqa: TRY300 
        
                   except LookupError: 
        
                       # Model not found, return None 
        
                       return None, None

GitHub Actions Supply Chain Risk in .github/workflows/helm-docs-updates.yml

Vulnerability	GitHub Actions Supply Chain Risk
Description	The GitHub Actions workflow uses a third-party action `losisin/helm-docs-github-action` with `git-push: true`. This allows the action to push changes directly to the repository. If this third-party action is compromised, it could push malicious code to the repository, bypassing pull request reviews for automated branches like `renovate/` and `dependabot/`.

django-DefectDojo/.github/workflows/helm-docs-updates.yml

Lines 1 to 25 in a1737ee

    
           name: Update HELM docs for Renovate & Dependabot 
        
           on: 
        
             pull_request: 
        
               branches: 
        
                 - master 
        
                 - dev 
        
                 - bugfix 
        
                 - release/** 
        
                 - hotfix/** 
        
           jobs: 
        
             docs_updates: 
        
               name: Update documentation 
        
               runs-on: ubuntu-latest 
        
               if: startsWith(github.head_ref, 'renovate/') or startsWith(github.head_ref, 'dependabot/') 
        
               steps: 
        
                 - name: Checkout 
        
                   uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0 
        
                 - name: Run helm-docs 
        
                   uses: losisin/helm-docs-github-action@a57fae5676e4c55a228ea654a1bcaec8dd3cf5b5 # v1.6.2 
        
                   with: 
        
                     chart-search-root: "helm/defectdojo" 
        
                     git-push: true

Information Disclosure via Verbose Database Query Logging in dojo/management/commands/pghistory_backfill.py

Vulnerability	Information Disclosure via Verbose Database Query Logging
Description	The `pghistory_backfill.py` management command enables verbose database query logging by default, setting `settings.DEBUG = True` and directing SQL queries (including parameters) to application logs. This can expose sensitive data from models like `Dojo_User`, `Cred_User`, `Finding`, and `Engagement` in plain text within logs if the command is run in a production environment without explicitly disabling logging.

django-DefectDojo/dojo/management/commands/pghistory_backfill.py

Lines 56 to 144 in a1737ee

    
               } 
        
               return excluded_fields_map.get(model_name, []) 
        
           def process_batch(self, event_model, event_records, model_name, dry_run, batch_start_time, processed, backfill_count, *, is_final_batch=False): 
        
               """Process a batch of event records by bulk creating them in the database.""" 
        
               if not event_records: 
        
                   return 0, batch_start_time 
        
               if dry_run: 
        
                   actually_created = len(event_records) 
        
               else: 
        
                   try: 
        
                       attempted = len(event_records) 
        
                       # No need to pass batch_size since we're already batching ourselves 
        
                       created_objects = event_model.objects.bulk_create(event_records) 
        
                       actually_created = len(created_objects) if created_objects else 0 
        
                       if actually_created != attempted: 
        
                           logger.warning( 
        
                               f"bulk_create for {model_name}: attempted {attempted}, " 
        
                               f"actually created {actually_created} ({attempted - actually_created} skipped)", 
        
                           ) 
        
                   except Exception: 
        
                       logger.exception(f"Failed to bulk create events for {model_name}") 
        
                       raise 
        
               # Calculate timing after the actual database operation 
        
               batch_end_time = time.time() 
        
               batch_duration = batch_end_time - batch_start_time 
        
               batch_records_per_second = len(event_records) / batch_duration if batch_duration > 0 else 0 
        
               # Log batch timing 
        
               if is_final_batch: 
        
                   self.stdout.write(f"  Final batch: {batch_duration:.2f}s ({batch_records_per_second:.1f} records/sec)") 
        
               else: 
        
                   progress = (processed + actually_created) / backfill_count * 100 
        
                   self.stdout.write(f"  Processed {processed + actually_created:,}/{backfill_count:,} records needing backfill ({progress:.1f}%) - " 
        
                                   f"Last batch: {batch_duration:.2f}s ({batch_records_per_second:.1f} records/sec)") 
        
               return actually_created, batch_end_time 
        
           def enable_db_logging(self): 
        
               """Enable database query logging for this command.""" 
        
               # Store original DEBUG setting 
        
               self.original_debug = settings.DEBUG 
        
               # Configure database query logging 
        
               db_logger = logging.getLogger("django.db.backends") 
        
               db_logger.setLevel(logging.DEBUG) 
        
               # Add a handler if one doesn't exist 
        
               if not db_logger.handlers: 
        
                   handler = logging.StreamHandler() 
        
                   formatter = logging.Formatter( 
        
                       "%(asctime)s - %(name)s - %(levelname)s - %(message)s", 
        
                   ) 
        
                   handler.setFormatter(formatter) 
        
                   db_logger.addHandler(handler) 
        
               # Also enable the SQL logger specifically 
        
               sql_logger = logging.getLogger("django.db.backends.sql") 
        
               sql_logger.setLevel(logging.DEBUG) 
        
               # Ensure the root logger propagates to our handlers 
        
               if not sql_logger.handlers: 
        
                   sql_logger.addHandler(handler) 
        
               # Enable query logging in Django settings 
        
               settings.DEBUG = True 
        
               self.stdout.write( 
        
                   self.style.SUCCESS("Database query logging enabled"), 
        
               ) 
        
           def disable_db_logging(self): 
        
               """Disable database query logging.""" 
        
               # Restore original DEBUG setting 
        
               settings.DEBUG = self.original_debug 
        
               # Disable query logging by setting a higher level 
        
               logging.getLogger("django.db.backends").setLevel(logging.INFO) 
        
               logging.getLogger("django.db.backends.sql").setLevel(logging.INFO) 
        
               self.stdout.write( 
        
                   self.style.SUCCESS("Database query logging disabled"), 
        
               ) 
        
           def handle(self, *args, **options): 
        
               if not settings.ENABLE_AUDITLOG or settings.AUDITLOG_TYPE != "django-pghistory": 
        
                   self.stdout.write(

We've notified @mtesauro.

All finding details can be found in the DryRun Security Dashboard.

Release: Merge release into master from: release/2.51.1

valentijnscholten and others added 30 commits October 6, 2025 12:21

order alerts explicitly (#13314)

9dc11ff

Update versions in application files

209010d

Merge branch 'bugfix' into master-into-bugfix/2.51.0-2.52.0-dev

f9b0961

Merge pull request #13354 from DefectDojo/master-into-bugfix/2.51.0-2…

96fa917

….52.0-dev Release: Merge back 2.51.0 into bugfix from: master-into-bugfix/2.51.0-2.52.0-dev

fix(gha): Run Release-Nightly only once a day (#13329)

9d2e906

fix: handle broken endpoints when <startURL> includes a port number i…

7d8b3f9

…n Acunetix XML parser

fix:broken endpoint error in acunetix XML parser with unittests

12ea082

all unittests clear for broken endpoint in Acunetix parser

51447c7

Fix: resolve ruff linting errors

aba31c7

Fix: resolve ruff linting errors

004f492

Fix: resolve ruff linting errors

6eba956

Fix: resolve ruff linting errors

f548051

feat(helm): Add support for automountServiceAccountToken

4460758

pghistory_backfill: avoid prefetching - dry-run working

f809828

JIRA instance config: improve error handling on open/close status ids (…

a02c4e3

…#13326)

skip duplicates: remove obsolete references (#13327)

9ba01e3

watson middleware: skip logging if no instances updated (#13363)

e13a95c

* watson middleware: skip logging if no instances updated * watson middleware: skip logging if no instances updated

finalize

b503b8b

feat(helm): Make release commits more verbose (#13367)

3eb4e36

feat(gha): Help Renovate + Dependabot to update HELM docs (#13366)

f4b53ca

feat(helm): Hint for correct "artifacthub.io/changes" syntax (#13397)

3fca6c1

add new test

9437ce3

supporting changes

1fef56d

progress

4a43381

progress new samples

df65888

somewhat working

2dfe5cf

cleanup

5c1bee5

update tests

64e120b

capture dedupe performance

2eb45b8

valentijnscholten and others added 7 commits October 12, 2025 19:02

add backfill using copy

45e4931

add backfill using insert with select from

39b51a1

Merge pull request #13375 from kiblik/helm_automountServiceAccountToken

659e136

feat(helm): Add support for automountServiceAccountToken

Merge pull request #13372 from valentijnscholten/dedupe-importers-uni…

d4caea5

…ttests add unit tests to test importer deduplication

Merge pull request #13371 from Irfan-Mohd/fix/acunetix-broken-endpoint

dbb4950

fix: handle broken endpoints when <StartURL> includes a port number in Acunetix XML parser

Merge pull request #13383 from valentijnscholten/pghistory-backfill-i…

2ae7490

…mprovements pghistory improvements: backfill and "empty" changes

Update versions in application files

a1737ee

github-actions Bot requested review from Maffooch and mtesauro as code owners October 14, 2025 15:44

rossops closed this Oct 14, 2025

rossops reopened this Oct 14, 2025

github-actions Bot added docs unittests ui parser helm labels Oct 14, 2025

rossops merged commit cba7d81 into master Oct 14, 2025
146 of 147 checks passed

pablosnt mentioned this pull request Oct 20, 2025

No authorization implemented for class Risk_Acceptance #13468

Closed

3 tasks

Maffooch pushed a commit to valentijnscholten/django-DefectDojo that referenced this pull request Feb 16, 2026

Merge pull request DefectDojo#13421 from DefectDojo/release/2.51.1

243c6c0

Release: Merge release into master from: release/2.51.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release: Merge release into master from: release/2.51.1#13421

Release: Merge release into master from: release/2.51.1#13421
rossops merged 37 commits intomasterfrom
release/2.51.1

github-actions Bot commented Oct 14, 2025

Uh oh!

dryrunsecurity Bot commented Oct 14, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

github-actions Bot commented Oct 14, 2025

Uh oh!

dryrunsecurity Bot commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔴 Risk threshold exceeded.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dryrunsecurity Bot commented Oct 14, 2025 •

edited

Loading