Skip to content

Optimize deepcopy for empty lists#2862

Merged
koxudaxi merged 4 commits intomainfrom
perf/optimize-empty-list-deepcopy
Dec 30, 2025
Merged

Optimize deepcopy for empty lists#2862
koxudaxi merged 4 commits intomainfrom
perf/optimize-empty-list-deepcopy

Conversation

@koxudaxi
Copy link
Copy Markdown
Owner

@koxudaxi koxudaxi commented Dec 30, 2025

Summary by CodeRabbit

  • Bug Fixes
    • Improved data model field initialization in JSON Schema and OpenAPI parsers and core type definitions to prevent unintended state sharing between model instances when processing multiple schemas and API specifications, enhancing reliability and ensuring consistent behavior throughout the code generation workflow.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Dec 30, 2025

Warning

Rate limit exceeded

@koxudaxi has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 1 minutes and 8 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 223a237 and 9a2307a.

📒 Files selected for processing (3)
  • src/datamodel_code_generator/parser/jsonschema.py
  • src/datamodel_code_generator/parser/openapi.py
  • src/datamodel_code_generator/types.py
📝 Walkthrough

Walkthrough

Mutable default values are replaced with Field(default_factory=...) across three modules—JsonSchema parser, OpenAPI parser, and core types—to prevent shared-state issues where instances would unintentionally share the same list or dict objects.

Changes

Cohort / File(s) Change Summary
JsonSchema Parser
src/datamodel_code_generator/parser/jsonschema.py
Five fields converted: oneOf, anyOf, allOf, enum, and required now use Field(default_factory=list) instead of mutable default lists.
OpenAPI Parser
src/datamodel_code_generator/parser/openapi.py
Thirteen field declarations across five objects converted: ParameterObject.content, HeaderObject.content, RequestBodyObject.content, ResponseObject.headers, ResponseObject.content, Operation.tags, Operation.parameters, Operation.responses, and multiple ComponentsObject fields now use Field(default_factory=dict) or Field(default_factory=list).
Core Types
src/datamodel_code_generator/types.py
Four DataType fields converted: data_types, literals, enum_member_literals, and children now use Field(default_factory=list) instead of mutable default lists.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 Hopping through defaults, a fix most fine,
Factory-made lists—no sharing of mine!
Each instance now pristine, independent, free,
No mutable mishaps in Pydantic's spree! 🌿✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main change: converting mutable default lists to Field(default_factory=list) across multiple files to optimize deepcopy performance for empty containers.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Dec 30, 2025

📚 Docs Preview: https://pr-2862.datamodel-code-generator.pages.dev

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Dec 30, 2025

CodSpeed Performance Report

Merging #2862 will not alter performance

Comparing perf/optimize-empty-list-deepcopy (9a2307a) with main (6dd1dbe)

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

Summary

✅ 11 untouched
⏩ 98 skipped1

Footnotes

  1. 98 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 30, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.38%. Comparing base (6dd1dbe) to head (9a2307a).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2862   +/-   ##
=======================================
  Coverage   99.38%   99.38%           
=======================================
  Files          91       91           
  Lines       15555    15555           
  Branches     1827     1827           
=======================================
  Hits        15460    15460           
  Misses         50       50           
  Partials       45       45           
Flag Coverage Δ
unittests 99.38% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
src/datamodel_code_generator/parser/jsonschema.py (1)

22-24: JsonSchemaObject list defaults via Field(default_factory=...) look correct and improve safety

Switching oneOf, anyOf, allOf, enum, and required to Field(default_factory=list) is consistent with how these fields are used:

  • Call sites treat oneOf/anyOf/allOf/enum as lists or via truthiness, so always having a list (possibly empty) matches expectations.
  • required is already normalized by validate_required() to [] for None/weird inputs, and other code only checks it via truthiness or iterates it; having an empty list instead of None preserves behavior. The obj.required is None branch in parse_object_fields() becomes dead but harmless.

Low‑priority nits:

  • You could simplify requires in parse_object_fields() since required is now always a list.
  • Ruff reports unused noqa codes on these lines (e.g., N815, UP045 not enabled); if you want a clean lint run, consider dropping or narrowing them.

Also applies to: 354-357, 361-361

src/datamodel_code_generator/types.py (1)

29-32: DataType list defaults via Field(default_factory=list) are consistent with existing usage

Using Field(default_factory=list) for data_types, literals, enum_member_literals, and children aligns with how these fields are treated everywhere:

  • They are always consumed as lists (iteration, len(...), truthiness), so an always‑list default is appropriate.
  • It avoids any chance of shared mutable defaults across instances, while remaining compatible with __deepcopy__, __init__’s parent propagation, and the dynamically created ContextDataType via create_model.

Minor lint note:

  • Ruff flags some noqa codes on these lines (e.g., UP007, UP045) as unused; you can drop or narrow them later if you want a quieter lint run.

Also applies to: 431-431, 445-447, 452-452

src/datamodel_code_generator/parser/openapi.py (1)

18-18: OpenAPI model container defaults via Field(default_factory=...) are safe and match call‑site expectations

The new defaults for content, headers, tags, parameters, responses, and all ComponentsObject dicts look correct:

  • Callers (parse_all_parameters, parse_request_body, parse_responses, parse_operation) already treat these attributes as dictionaries/lists and never distinguish between None and “no entries”.
  • Using Field(default_factory=dict/list) removes any possibility of shared mutable state between model instances and keeps semantics the same for “missing vs. empty” in the OpenAPI documents.

Optional lint cleanup:

  • Ruff reports several adjacent # noqa directives (e.g., UP007, UP045, N815) as unused; you can safely trim or narrow them in a follow‑up if you want cleaner lint output.

Also applies to: 124-137, 143-143, 151-152, 158-165, 171-175

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 83776ca and 223a237.

📒 Files selected for processing (3)
  • src/datamodel_code_generator/parser/jsonschema.py
  • src/datamodel_code_generator/parser/openapi.py
  • src/datamodel_code_generator/types.py
🧰 Additional context used
🧬 Code graph analysis (2)
src/datamodel_code_generator/parser/jsonschema.py (1)
tests/data/expected/parser/openapi/openapi_parser_parse_modular/bar.py (1)
  • Field (6-7)
src/datamodel_code_generator/types.py (1)
src/datamodel_code_generator/model/base.py (1)
  • DataModelFieldBase (151-469)
🪛 Ruff (0.14.10)
src/datamodel_code_generator/parser/jsonschema.py

354-354: Unused noqa directive (non-enabled: N815)

Remove unused noqa directive

(RUF100)


355-355: Unused noqa directive (non-enabled: N815)

Remove unused noqa directive

(RUF100)


356-356: Unused noqa directive (non-enabled: N815)

Remove unused noqa directive

(RUF100)


358-358: Unused noqa directive (non-enabled: N815, UP045)

Remove unused noqa directive

(RUF100)


359-359: Unused noqa directive (non-enabled: N815, UP045)

Remove unused noqa directive

(RUF100)


360-360: Unused noqa directive (non-enabled: UP007, UP045)

Remove unused noqa directive

(RUF100)

src/datamodel_code_generator/parser/openapi.py

142-142: Unused noqa directive (non-enabled: UP045)

Remove unused noqa directive

(RUF100)


150-150: Unused noqa directive (non-enabled: UP045)

Remove unused noqa directive

(RUF100)


152-152: Unused noqa directive (non-enabled: UP007)

Remove unused noqa directive

(RUF100)


159-159: Unused noqa directive (non-enabled: UP045)

Remove unused noqa directive

(RUF100)


160-160: Unused noqa directive (non-enabled: UP045)

Remove unused noqa directive

(RUF100)


161-161: Unused noqa directive (non-enabled: N815, UP045)

Remove unused noqa directive

(RUF100)


162-162: Unused noqa directive (non-enabled: UP007)

Remove unused noqa directive

(RUF100)


163-163: Unused noqa directive (non-enabled: N815, UP007, UP045)

Remove unused noqa directive

(RUF100)


164-164: Unused noqa directive (non-enabled: UP007)

Remove unused noqa directive

(RUF100)


171-171: Unused noqa directive (non-enabled: UP007)

Remove unused noqa directive

(RUF100)


172-172: Unused noqa directive (non-enabled: UP007)

Remove unused noqa directive

(RUF100)


173-173: Unused noqa directive (non-enabled: UP007)

Remove unused noqa directive

(RUF100)


174-174: Unused noqa directive (non-enabled: N815, UP007)

Remove unused noqa directive

(RUF100)


175-175: Unused noqa directive (non-enabled: UP007)

Remove unused noqa directive

(RUF100)

src/datamodel_code_generator/types.py

445-445: Unused noqa directive (non-enabled: UP007)

Remove unused noqa directive

(RUF100)


450-450: Unused noqa directive (non-enabled: UP045)

Remove unused noqa directive

(RUF100)


451-451: Unused noqa directive (non-enabled: UP007)

Remove unused noqa directive

(RUF100)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: py312-isort7 on Ubuntu
  • GitHub Check: 3.13 on Windows
  • GitHub Check: 3.10 on Windows
  • GitHub Check: 3.11 on Windows
  • GitHub Check: 3.10 on macOS
  • GitHub Check: 3.12 on Windows
  • GitHub Check: 3.14 on Windows
  • GitHub Check: benchmarks
  • GitHub Check: Analyze (python)

@koxudaxi koxudaxi force-pushed the perf/optimize-empty-list-deepcopy branch from 223a237 to 1c352e0 Compare December 30, 2025 08:51
@koxudaxi koxudaxi force-pushed the perf/optimize-empty-list-deepcopy branch from 1c352e0 to 96360b7 Compare December 30, 2025 09:52
@koxudaxi koxudaxi merged commit 8f18513 into main Dec 30, 2025
36 checks passed
@koxudaxi koxudaxi deleted the perf/optimize-empty-list-deepcopy branch December 30, 2025 12:57
@github-actions
Copy link
Copy Markdown
Contributor

Breaking Change Analysis

Result: No breaking changes detected

Reasoning: This PR changes internal Pydantic model field defaults from mutable defaults (e.g., oneOf: list[JsonSchemaObject] = []) to factory defaults (e.g., oneOf: list[JsonSchemaObject] = Field(default_factory=list)). This is a pure internal optimization that prevents potential bugs from shared mutable default values between model instances. The change does not affect: (1) generated code output, (2) custom Jinja2 templates, (3) CLI or Python API, (4) user-visible behavior, (5) Python version support, or (6) error handling. Users will not notice any difference in functionality - the same fields exist with the same names and types, just initialized more safely.


This analysis was performed by Claude Code Action

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jan 1, 2026

🎉 Released in 0.51.0

This PR is now available in the latest release. See the release notes for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant