Skip to content

Optimize O(n²) algorithms and reduce redundant iterations#2778

Merged
koxudaxi merged 1 commit intomainfrom
perf/additional-optimizations
Dec 24, 2025
Merged

Optimize O(n²) algorithms and reduce redundant iterations#2778
koxudaxi merged 1 commit intomainfrom
perf/additional-optimizations

Conversation

@koxudaxi
Copy link
Copy Markdown
Owner

@koxudaxi koxudaxi commented Dec 24, 2025

Summary by CodeRabbit

  • Performance Improvements

    • Optimized duplicate/canonical model handling with batch removal for more efficient processing.
    • Streamlined self-reference checks for reduced iteration overhead.
  • Bug Fixes

    • Strengthened duplicate parameter detection to prevent name collisions.
    • Improved discriminator and subtype resolution for more accurate inheritance handling.
  • Refactor

    • Reworked duplicate/discriminator processing into clearer two-phase flows.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Dec 24, 2025

📝 Walkthrough

Walkthrough

Refactors duplicate-model processing to use per-module sets and batch removals, adjusts self-reference detection in JSON Schema array parsing to a streaming check, and changes OpenAPI discriminator handling to a two-phase collection that resolves allOf-derived subtypes after initial discriminator discovery.

Changes

Cohort / File(s) Summary
Module duplicate handling
src/datamodel_code_generator/parser/base.py
Adds module_models_sets (per-module model sets) and models_to_remove; validates duplicate origins; defers/removes canonical and duplicate models in a batch pass instead of immediate per-item removals; preserves reuse/replace logic when collapsing is allowed.
JSONSchema self-reference check
src/datamodel_code_generator/parser/jsonschema.py
Replaces list-membership style self-reference check in parse_array with a generator-based any(...) check over field.data_type.all_data_types.
OpenAPI parameter & discriminator handling
src/datamodel_code_generator/parser/openapi.py
Adds seen_parameter_names to detect duplicate parameter names; rewrites _collect_discriminator_schemas into a two-phase flow that first collects direct discriminator schemas, then accumulates allOf $refs and links them as discriminator subtypes.

Sequence Diagram(s)

(omitted — changes are internal parser/data structure refinements without a new multi-component sequential flow)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰
With sets I hop from module to module,
I mark the clones and bundle them humble.
Two-phase sniff for subtype trails,
I nibble references, follow the rails.
Batch-removed tidily — a tidy tumble!

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main objectives of the PR - optimizing algorithmic complexity from O(n²) to O(1) and reducing redundant iterations across multiple files.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch perf/additional-optimizations

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c66d12a and c04171a.

📒 Files selected for processing (3)
  • src/datamodel_code_generator/parser/base.py
  • src/datamodel_code_generator/parser/jsonschema.py
  • src/datamodel_code_generator/parser/openapi.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/datamodel_code_generator/parser/jsonschema.py
  • src/datamodel_code_generator/parser/base.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/datamodel_code_generator/parser/openapi.py (1)
src/datamodel_code_generator/reference.py (2)
  • add (883-957)
  • get (959-961)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: 3.13 on Ubuntu
  • GitHub Check: 3.11 on macOS
  • GitHub Check: py312-black22 on Ubuntu
  • GitHub Check: 3.10 on Windows
  • GitHub Check: py312-black24 on Ubuntu
  • GitHub Check: 3.11 on Windows
  • GitHub Check: 3.12 on Windows
  • GitHub Check: 3.13 on Windows
  • GitHub Check: Analyze (python)
  • GitHub Check: benchmarks
🔇 Additional comments (2)
src/datamodel_code_generator/parser/openapi.py (2)

679-679: Excellent optimization: O(1) duplicate detection.

The use of a set for tracking seen parameter names replaces an O(n) scan per parameter with O(1) membership checks, reducing overall complexity from O(n²) to O(n). The logic correctly checks for duplicates before adding the name.

Also applies to: 691-694


924-942: Effective two-phase optimization.

The refactored collection logic merges two full iterations into a single pass over all schemas (lines 926-936) plus a targeted pass over only schemas with allOf references (lines 938-942). This reduces redundant iterations when the number of schemas with allOf is significantly smaller than the total schema count.

The logic correctly:

  • Collects discriminator schemas that lack oneOf/anyOf (line 928-930)
  • Accumulates allOf references in the same pass (lines 932-936)
  • Resolves subtype relationships only for relevant schemas (lines 938-942)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 24, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.48%. Comparing base (09b36f2) to head (c04171a).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2778      +/-   ##
==========================================
- Coverage   99.48%   99.48%   -0.01%     
==========================================
  Files          87       87              
  Lines       12897    12908      +11     
  Branches     1543     1547       +4     
==========================================
+ Hits        12831    12841      +10     
  Misses         35       35              
- Partials       31       32       +1     
Flag Coverage Δ
unittests 99.48% <100.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Dec 24, 2025

CodSpeed Performance Report

Merging #2778 will not alter performance

Comparing perf/additional-optimizations (c04171a) with main (09b36f2)

Summary

✅ 73 untouched
⏩ 10 skipped1

Footnotes

  1. 10 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Performance optimizations:

1. Batch model removal in __create_shared_module_from_duplicates:
   - Use sets for O(1) membership checks
   - Collect models to remove and batch process at end
   - Avoids O(n²) list.remove() calls in loops

2. O(1) parameter name duplicate detection:
   - Use set to track seen parameter names
   - Replaces O(n) any() scan for each parameter

3. Single-pass discriminator schema collection:
   - Merge two iterations into one
   - Collect allOf refs during first pass
   - Only iterate schemas with allOf for subtype resolution

4. Early-exit self-reference check:
   - Replace list comprehension with any() generator
   - Stops at first match instead of building full list
@koxudaxi koxudaxi force-pushed the perf/additional-optimizations branch from c66d12a to c04171a Compare December 24, 2025 06:02
@koxudaxi koxudaxi enabled auto-merge (squash) December 24, 2025 06:04
@koxudaxi koxudaxi merged commit 0a41d43 into main Dec 24, 2025
33 of 34 checks passed
@koxudaxi koxudaxi deleted the perf/additional-optimizations branch December 24, 2025 06:05
@github-actions
Copy link
Copy Markdown
Contributor

Breaking Change Analysis

Result: No breaking changes detected

Reasoning: PR #2778 contains pure performance optimizations with no breaking changes. The changes include: (1) Batch model removal using sets for O(1) lookups instead of O(n) list operations - same models are removed, just more efficiently; (2) Early-exit self-reference check using any() generator instead of list comprehension - identical boolean result; (3) O(1) parameter name duplicate detection using a set instead of any() scan - same validation behavior and exception; (4) Single-pass discriminator schema collection merging two iterations into one - same final data collected. All changes are algorithmic optimizations that produce identical output, which is why no test changes were needed. The generated code output, CLI/API behavior, templates, and error handling all remain unchanged.


This analysis was performed by Claude Code Action

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant