Skip to content

Fix set/frozenset duplicate output in x-python-type serialization#2849

Merged
koxudaxi merged 1 commit intomainfrom
fix/set-frozenset-duplicate-output
Dec 28, 2025
Merged

Fix set/frozenset duplicate output in x-python-type serialization#2849
koxudaxi merged 1 commit intomainfrom
fix/set-frozenset-duplicate-output

Conversation

@koxudaxi
Copy link
Copy Markdown
Owner

@koxudaxi koxudaxi commented Dec 28, 2025

Summary by CodeRabbit

  • Bug Fixes
    • Added proper support for lowercase Python container type aliases (set, frozenset) in JSON Schema parsing and type serialization.
    • Improved type origin mapping to handle both lowercase and capitalized versions of Python type names.
    • Enhanced compatibility with various Python type naming conventions in generated schemas.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Dec 28, 2025

📝 Walkthrough

Walkthrough

The PR updates Python set type handling by changing the preserved type origins from capitalized "Set"/"FrozenSet" to lowercase "set"/"frozenset" and adds corresponding JSON Schema parser support for the lowercase variants, enabling consistent handling of both naming conventions.

Changes

Cohort / File(s) Summary
Preserved Type Origins Update
src/datamodel_code_generator/__main__.py
Updated the type origin mappings in _init_preserved_type_origins to use lowercase "set" and "frozenset" instead of capitalized "Set" and "FrozenSet".
JSON Schema Parser Support
src/datamodel_code_generator/parser/jsonschema.py
Added lowercase "set" and "frozenset" entries to the type_to_flag mapping in JsonSchemaObject._get_python_type_flags, complementing existing capitalized variants.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Possibly related PRs

  • #2837: Directly modifies the same preserved-type origin mappings for set/frozenset across the codebase.
  • #2841: Adds x-python-type compatibility logic that depends on proper handling of Set/FrozenSet type recognition in JSON Schema parsing.

Suggested labels

breaking-change-analyzed

Poem

🐰 Lowercase sets now hop with grace,
frozensets in their proper place,
No more CamelCase to slow the pace,
Python's types run the rabbit race! 🏃‍♂️

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: fixing set/frozenset duplicate output in x-python-type serialization, which is reflected in the code changes updating type mappings for set and frozenset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/set-frozenset-duplicate-output

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2aebb77 and 469d83f.

📒 Files selected for processing (2)
  • src/datamodel_code_generator/__main__.py
  • src/datamodel_code_generator/parser/jsonschema.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: 3.10 on Windows
  • GitHub Check: py312-pydantic1 on Ubuntu
  • GitHub Check: py312-black24 on Ubuntu
  • GitHub Check: 3.10 on Ubuntu
  • GitHub Check: py312-isort6 on Ubuntu
  • GitHub Check: py312-black22 on Ubuntu
  • GitHub Check: 3.13 on Windows
  • GitHub Check: 3.12 on Windows
  • GitHub Check: 3.11 on Windows
  • GitHub Check: 3.14 on Windows
  • GitHub Check: benchmarks
  • GitHub Check: Analyze (python)
🔇 Additional comments (2)
src/datamodel_code_generator/__main__.py (1)

611-612: LGTM! Correct alignment with Python built-in type names.

The change from capitalized "Set"/"FrozenSet" to lowercase "set"/"frozenset" correctly aligns the serialization with Python's actual built-in type names. This is especially important since typing.Set and typing.FrozenSet are deprecated in Python 3.9+ in favor of the built-in set and frozenset types.

The corresponding parser changes in jsonschema.py ensure backward compatibility by supporting both uppercase and lowercase variants.

src/datamodel_code_generator/parser/jsonschema.py (1)

1288-1299: LGTM! Backward-compatible parser support for lowercase type aliases.

The addition of lowercase "set" and "frozenset" aliases correctly enables parsing of the new serialization format while maintaining full backward compatibility. Existing schemas using uppercase "Set" and "FrozenSet" in x-python-type fields continue to work unchanged, as both variants map to identical container type flags.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

📚 Docs Preview: https://pr-2849.datamodel-code-generator.pages.dev

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Dec 28, 2025

CodSpeed Performance Report

Merging #2849 will improve performance by 23.31%

Comparing fix/set-frozenset-duplicate-output (469d83f) with main (2aebb77)

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

Summary

⚡ 11 improvements
⏩ 98 skipped1

Benchmarks breakdown

Mode Benchmark BASE HEAD Efficiency
WallTime test_perf_multiple_files_input 3.9 s 3.2 s +20.93%
WallTime test_perf_complex_refs 2.2 s 1.8 s +22.99%
WallTime test_perf_duplicate_names 1,048.7 ms 869.2 ms +20.65%
WallTime test_perf_stripe_style_pydantic_v2 2.1 s 1.8 s +20.49%
WallTime test_perf_graphql_style_pydantic_v2 877 ms 727.3 ms +20.58%
WallTime test_perf_large_models_pydantic_v2 3.7 s 3.1 s +19.12%
WallTime test_perf_deep_nested 6.2 s 5.4 s +15.82%
WallTime test_perf_aws_style_openapi_pydantic_v2 2.1 s 1.7 s +19.98%
WallTime test_perf_all_options_enabled 6.9 s 5.8 s +17.82%
WallTime test_perf_openapi_large 3.1 s 2.5 s +21.17%
WallTime test_perf_kubernetes_style_pydantic_v2 2.8 s 2.3 s +23.31%

Footnotes

  1. 98 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 28, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.50%. Comparing base (2aebb77) to head (469d83f).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2849   +/-   ##
=======================================
  Coverage   99.50%   99.50%           
=======================================
  Files          90       90           
  Lines       14489    14489           
  Branches     1736     1736           
=======================================
  Hits        14417    14417           
  Misses         37       37           
  Partials       35       35           
Flag Coverage Δ
unittests 99.50% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@koxudaxi koxudaxi merged commit 16f27ef into main Dec 28, 2025
37 checks passed
@koxudaxi koxudaxi deleted the fix/set-frozenset-duplicate-output branch December 28, 2025 20:44
@github-actions
Copy link
Copy Markdown
Contributor

Breaking Change Analysis

Result: No breaking changes detected

Reasoning: This PR is a bug fix that changes the internal x-python-type serialization format from uppercase "Set"/"FrozenSet" to lowercase "set"/"frozenset" to match Python's modern convention. The change affects only the intermediate JSON Schema representation, not the final generated Python code output. Users' generated code remains unchanged (still produces set[str], frozenset[int], etc.). Additionally, the PR adds backward compatibility in the parser by supporting both uppercase and lowercase variants when reading JSON schemas, so existing schemas with x-python-type: "Set[str]" continue to work correctly.


This analysis was performed by Claude Code Action

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jan 1, 2026

🎉 Released in 0.51.0

This PR is now available in the latest release. See the release notes for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant