Optimize O(n²) algorithms and reduce redundant iterations by koxudaxi · Pull Request #2778 · koxudaxi/datamodel-code-generator

koxudaxi · 2025-12-24T05:54:43Z

Summary by CodeRabbit

Performance Improvements
- Optimized duplicate/canonical model handling with batch removal for more efficient processing.
- Streamlined self-reference checks for reduced iteration overhead.
Bug Fixes
- Strengthened duplicate parameter detection to prevent name collisions.
- Improved discriminator and subtype resolution for more accurate inheritance handling.
Refactor
- Reworked duplicate/discriminator processing into clearer two-phase flows.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-24T05:54:53Z

📝 Walkthrough

Walkthrough

Refactors duplicate-model processing to use per-module sets and batch removals, adjusts self-reference detection in JSON Schema array parsing to a streaming check, and changes OpenAPI discriminator handling to a two-phase collection that resolves allOf-derived subtypes after initial discriminator discovery.

Changes

Cohort / File(s)	Summary
Module duplicate handling `src/datamodel_code_generator/parser/base.py`	Adds `module_models_sets` (per-module model sets) and `models_to_remove`; validates duplicate origins; defers/removes canonical and duplicate models in a batch pass instead of immediate per-item removals; preserves reuse/replace logic when collapsing is allowed.
JSONSchema self-reference check `src/datamodel_code_generator/parser/jsonschema.py`	Replaces list-membership style self-reference check in `parse_array` with a generator-based `any(...)` check over `field.data_type.all_data_types`.
OpenAPI parameter & discriminator handling `src/datamodel_code_generator/parser/openapi.py`	Adds `seen_parameter_names` to detect duplicate parameter names; rewrites `_collect_discriminator_schemas` into a two-phase flow that first collects direct discriminator schemas, then accumulates allOf `$ref`s and links them as discriminator subtypes.

Sequence Diagram(s)

(omitted — changes are internal parser/data structure refinements without a new multi-component sequential flow)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Fix discriminator with allOf without Literal type for Pydantic v2 #2722: Related discriminator handling changes in src/datamodel_code_generator/parser/openapi.py.
Refactor parser base post-processing for DRY and type-safe implementation #2730: Overlapping post-processing and duplicate-removal refactors in src/datamodel_code_generator/parser/base.py.

Poem

🐰
With sets I hop from module to module,
I mark the clones and bundle them humble.
Two-phase sniff for subtype trails,
I nibble references, follow the rails.
Batch-removed tidily — a tidy tumble!

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main objectives of the PR - optimizing algorithmic complexity from O(n²) to O(1) and reducing redundant iterations across multiple files.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch perf/additional-optimizations

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c66d12a and c04171a.

📒 Files selected for processing (3)

src/datamodel_code_generator/parser/base.py
src/datamodel_code_generator/parser/jsonschema.py
src/datamodel_code_generator/parser/openapi.py

🚧 Files skipped from review as they are similar to previous changes (2)

src/datamodel_code_generator/parser/jsonschema.py
src/datamodel_code_generator/parser/base.py

🧰 Additional context used

🧬 Code graph analysis (1)

src/datamodel_code_generator/parser/openapi.py (1)

src/datamodel_code_generator/reference.py (2)

add (883-957)

get (959-961)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: 3.13 on Ubuntu
GitHub Check: 3.11 on macOS
GitHub Check: py312-black22 on Ubuntu
GitHub Check: 3.10 on Windows
GitHub Check: py312-black24 on Ubuntu
GitHub Check: 3.11 on Windows
GitHub Check: 3.12 on Windows
GitHub Check: 3.13 on Windows
GitHub Check: Analyze (python)
GitHub Check: benchmarks

🔇 Additional comments (2)

src/datamodel_code_generator/parser/openapi.py (2)

679-679: Excellent optimization: O(1) duplicate detection.

The use of a set for tracking seen parameter names replaces an O(n) scan per parameter with O(1) membership checks, reducing overall complexity from O(n²) to O(n). The logic correctly checks for duplicates before adding the name.

Also applies to: 691-694

924-942: Effective two-phase optimization.

The refactored collection logic merges two full iterations into a single pass over all schemas (lines 926-936) plus a targeted pass over only schemas with allOf references (lines 938-942). This reduces redundant iterations when the number of schemas with allOf is significantly smaller than the total schema count.

The logic correctly:

Collects discriminator schemas that lack oneOf/anyOf (line 928-930)

Accumulates allOf references in the same pass (lines 932-936)

Resolves subtype relationships only for relevant schemas (lines 938-942)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2025-12-24T05:59:16Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.48%. Comparing base (09b36f2) to head (c04171a).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2778      +/-   ##
==========================================
- Coverage   99.48%   99.48%   -0.01%     
==========================================
  Files          87       87              
  Lines       12897    12908      +11     
  Branches     1543     1547       +4     
==========================================
+ Hits        12831    12841      +10     
  Misses         35       35              
- Partials       31       32       +1

Flag	Coverage Δ
unittests	`99.48% <100.00%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codspeed-hq · 2025-12-24T06:00:49Z

CodSpeed Performance Report

Merging #2778 will not alter performance

_{Comparing perf/additional-optimizations (c04171a) with main (09b36f2)}

Summary

✅ 73 untouched
⏩ 10 skipped¹

10 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

Performance optimizations: 1. Batch model removal in __create_shared_module_from_duplicates: - Use sets for O(1) membership checks - Collect models to remove and batch process at end - Avoids O(n²) list.remove() calls in loops 2. O(1) parameter name duplicate detection: - Use set to track seen parameter names - Replaces O(n) any() scan for each parameter 3. Single-pass discriminator schema collection: - Merge two iterations into one - Collect allOf refs during first pass - Only iterate schemas with allOf for subtype resolution 4. Early-exit self-reference check: - Replace list comprehension with any() generator - Stops at first match instead of building full list

github-actions · 2025-12-24T06:06:48Z

Breaking Change Analysis

Result: No breaking changes detected

Reasoning: PR #2778 contains pure performance optimizations with no breaking changes. The changes include: (1) Batch model removal using sets for O(1) lookups instead of O(n) list operations - same models are removed, just more efficiently; (2) Early-exit self-reference check using any() generator instead of list comprehension - identical boolean result; (3) O(1) parameter name duplicate detection using a set instead of any() scan - same validation behavior and exception; (4) Single-pass discriminator schema collection merging two iterations into one - same final data collected. All changes are algorithmic optimizations that produce identical output, which is why no test changes were needed. The generated code output, CLI/API behavior, templates, and error handling all remain unchanged.

This analysis was performed by Claude Code Action

koxudaxi force-pushed the perf/additional-optimizations branch from c66d12a to c04171a Compare December 24, 2025 06:02

koxudaxi enabled auto-merge (squash) December 24, 2025 06:04

koxudaxi merged commit 0a41d43 into main Dec 24, 2025
33 of 34 checks passed

koxudaxi deleted the perf/additional-optimizations branch December 24, 2025 06:05

github-actions Bot added the breaking-change-analyzed label Dec 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize O(n²) algorithms and reduce redundant iterations#2778

Optimize O(n²) algorithms and reduce redundant iterations#2778
koxudaxi merged 1 commit intomainfrom
perf/additional-optimizations

koxudaxi commented Dec 24, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Dec 24, 2025 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

codecov Bot commented Dec 24, 2025 •

edited

Loading

Uh oh!

codspeed-hq Bot commented Dec 24, 2025 •

edited

Loading

Uh oh!

Uh oh!

github-actions Bot commented Dec 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

koxudaxi commented Dec 24, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks and finishing touches

Uh oh!

codecov Bot commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codspeed-hq Bot commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #2778 will not alter performance

Summary

Footnotes

Uh oh!

Uh oh!

github-actions Bot commented Dec 24, 2025

Breaking Change Analysis

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

koxudaxi commented Dec 24, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Dec 24, 2025 •

edited

Loading

codecov Bot commented Dec 24, 2025 •

edited

Loading

codspeed-hq Bot commented Dec 24, 2025 •

edited

Loading