Skip to content

Fix reuse-foreign to compare with output type instead of input type#2854

Merged
koxudaxi merged 1 commit intomainfrom
fix/reuse-foreign-output-type-comparison
Dec 30, 2025
Merged

Fix reuse-foreign to compare with output type instead of input type#2854
koxudaxi merged 1 commit intomainfrom
fix/reuse-foreign-output-type-comparison

Conversation

@koxudaxi
Copy link
Copy Markdown
Owner

@koxudaxi koxudaxi commented Dec 29, 2025

Summary by CodeRabbit

  • New Features

    • Added msgspec structure support in type discovery and classification.
    • Improved type reuse logic to intelligently match types across different output model formats, determining when types can be safely reused without conversion.
  • Tests

    • Extended test coverage for mixed-type reuse scenarios involving Pydantic models, TypedDict, dataclass, and enum types across various output formats.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Dec 29, 2025

📝 Walkthrough

Walkthrough

This PR extends the datamodel-code-generator's type reuse logic to support mixed type families (Pydantic, dataclass, TypedDict, Enum, msgspec) when using the reuse-foreign strategy. It adds type family classification for msgspec, new reuse decision logic based on output model type, and comprehensive test coverage for nested mixed-type scenarios.

Changes

Cohort / File(s) Summary
Core type family and reuse logic
src/datamodel_code_generator/__main__.py
Extended _find_models_in_type to recognize BaseModel, Enum, dataclass, TypedDict, and msgspec structures. Added _TYPE_FAMILY_MSGSPEC constant and msgspec support to _get_type_family. Introduced _get_output_family() to map DataModelType to internal type-family strings and _should_reuse_type() to determine reuse eligibility across families (enums always reusable; others compared by family). Updated _filter_defs_by_strategy to accept output_model_type instead of input family string and use new reuse logic. Extended _load_model_schema signatures to accept and propagate output_model_type throughout call sites.
Test data for mixed-type reuse scenarios
tests/data/python/input_model/mixed_nested.py
New file introducing mixed-type test models: Enum Category, TypedDict NestedTypedDict, Pydantic NestedPydantic, dataclass NestedDataclass, and four root models (ModelWithTypedDict, ModelWithPydantic, ModelWithDataclass, ModelWithMixed) combining these types to validate reuse behavior across families.
Reuse-foreign strategy test coverage
tests/test_input_model.py
Updated and extended test expectations for test_input_model_ref_strategy_reuse_foreign and related cases to reflect new import/regeneration semantics: enums are always imported regardless of family; same-family types respect output model type; different-family types are regenerated. Added new test cases validating reuse-foreign behavior across TypedDict, dataclass, Pydantic, and msgspec output types with nested mixed-type scenarios.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant TypeDiscovery as Type Discovery
    participant FamilyClass as Family Classification
    participant ReuseLogic as Reuse Decision
    participant SchemaFilter as Schema Filtering
    
    User->>TypeDiscovery: Load input model with annotations
    TypeDiscovery->>TypeDiscovery: Scan for BaseModel, Enum, dataclass,<br/>TypedDict, msgspec structures
    TypeDiscovery->>FamilyClass: Provide discovered types
    
    FamilyClass->>FamilyClass: Map each type to family<br/>(PYDANTIC, DATACLASS,<br/>TYPEDDICT, MSGSPEC, ENUM)
    FamilyClass->>ReuseLogic: Provide source family & output type
    
    rect rgba(100, 200, 150, 0.3)
        Note over ReuseLogic: Reuse Decision Logic
        ReuseLogic->>ReuseLogic: Get output family from<br/>output_model_type
        ReuseLogic->>ReuseLogic: If Enum: always reuse
        ReuseLogic->>ReuseLogic: Else: compare source family<br/>vs output family
    end
    
    ReuseLogic->>SchemaFilter: Decision: reuse or regenerate?
    SchemaFilter->>SchemaFilter: Filter definitions by<br/>ReuseForeign strategy
    SchemaFilter->>User: Output schema with<br/>correct imports/regenerations
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • PR #2850: Modifies the same type-family constants and reuse/filtering helper logic in __main__.py, extending model-type discovery and adding new type-family taxonomy.
  • PR #2851: Changes input-model schema generation in __main__.py via _load_model_schema and related input-model processing updates.
  • PR #2837: Modifies _load_model_schema function and its call sites to propagate metadata for type preservation during schema augmentation.

Suggested labels

breaking-change-analyzed

Poem

🐰 A rabbit hops through type-family trees,
Finding Enums, Pydantic, dataclasses with ease,
Deciding what reuses and what regenerates anew,
Mixed types no longer confuse the queue,
Schema reuse-foreign logic shines bright and true! ✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: fixing reuse-foreign logic to compare with output type instead of input type, which is reflected throughout the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/reuse-foreign-output-type-comparison

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

📚 Docs Preview: https://pr-2854.datamodel-code-generator.pages.dev

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/datamodel_code_generator/__main__.py (1)

1008-1039: Remove unused noqa directive on line 1016.

The static analysis indicates that # noqa: PLR0911 is no longer needed since the rule isn't enabled or the function no longer triggers it.

🔎 Proposed fix
-def _get_type_family(tp: type) -> str:  # noqa: PLR0911
+def _get_type_family(tp: type) -> str:
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0f7a6c9 and da23e93.

📒 Files selected for processing (3)
  • src/datamodel_code_generator/__main__.py
  • tests/data/python/input_model/mixed_nested.py
  • tests/test_input_model.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/datamodel_code_generator/__main__.py (1)
src/datamodel_code_generator/enums.py (2)
  • DataModelType (48-56)
  • InputModelRefStrategy (199-210)
🪛 Ruff (0.14.10)
src/datamodel_code_generator/__main__.py

1016-1016: Unused noqa directive (non-enabled: PLR0911)

Remove unused noqa directive

(RUF100)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: 3.10 on Windows
  • GitHub Check: 3.10 on macOS
  • GitHub Check: 3.12 on macOS
  • GitHub Check: 3.11 on Windows
  • GitHub Check: 3.12 on Windows
  • GitHub Check: 3.14 on Windows
  • GitHub Check: 3.13 on Windows
  • GitHub Check: Analyze (python)
  • GitHub Check: benchmarks
🔇 Additional comments (15)
tests/test_input_model.py (8)

780-797: LGTM! Test correctly validates reuse-foreign imports enum and same-family types.

The test verifies that when using reuse-foreign strategy with TypedDict output:

  • Status enum is imported (enums always reused)
  • Metadata class is present (TypedDict reused as same-family)
  • Address and User classes are regenerated

800-821: LGTM! Properly validates that Status TypeAlias is not regenerated when imported.

The test correctly asserts that Status: TypeAlias is not in the output when using reuse-foreign, confirming the enum is imported rather than regenerated.


967-988: LGTM! Test validates same-family TypedDict reuse.

This test correctly verifies that when output is TypedDict and input contains a nested TypedDict (NestedTypedDict), it's imported rather than regenerated (verified by assert "class NestedTypedDict" not in content).


990-1007: LGTM! Test validates cross-family regeneration.

Correctly tests that a Pydantic model (NestedPydantic) is regenerated when output type is TypedDict, since they belong to different families.


1010-1031: LGTM! Test validates same-family dataclass reuse.

Test correctly verifies that when output is dataclass and input contains a nested dataclass (NestedDataclass), it's imported rather than regenerated.


1033-1056: LGTM! Comprehensive test for mixed nested types.

This test validates the complete behavior with mixed types: TypedDict is reused (imported), while Pydantic and dataclass are regenerated when output is TypedDict.


1058-1079: LGTM! Validates Pydantic-to-Pydantic same-family reuse.

Test correctly verifies that Pydantic models are imported when output is also Pydantic BaseModel.


1081-1099: LGTM! Test validates msgspec output behavior.

This test correctly verifies that non-msgspec types (like Pydantic) are regenerated when output type is msgspec.Struct, confirming the new msgspec family handling works.

src/datamodel_code_generator/__main__.py (6)

924-943: LGTM! Extended type discovery to include TypedDict and msgspec.

The updated _find_models_in_type function now correctly identifies:

  • BaseModel subclasses
  • Enum subclasses
  • Dataclasses
  • TypedDict (via __required_keys__)
  • msgspec Structs (via __struct_fields__)

This enables proper classification for the reuse-foreign strategy.


1042-1057: LGTM! Proper mapping of DataModelType to type families.

The _get_output_family function correctly maps:

  • Pydantic variants (BaseModel, V2BaseModel, V2Dataclass) → pydantic
  • DataclassesDataclass → dataclass
  • TypingTypedDict → typeddict
  • MsgspecStruct → msgspec

This aligns with the DataModelType enum values shown in the relevant code snippets.


1060-1068: LGTM! Clean reuse decision logic.

The _should_reuse_type function correctly implements the reuse-foreign strategy:

  • Enums are always reusable (return True)
  • Other types are reusable only when source family matches output family

This is the core fix for comparing with output type instead of input type.


1071-1109: LGTM! Updated strategy filtering to use output model type.

The _filter_defs_by_strategy function now correctly:

  1. Derives output_family from output_model_type
  2. Uses _should_reuse_type(type_family, output_family) for the reuse decision

This is the key change that makes reuse-foreign compare against output type rather than input type.


1112-1116: LGTM! Function signature updated with new parameter.

The output_model_type parameter with a sensible default allows backward compatibility while enabling the new behavior.


1827-1832: LGTM! Correctly propagates output_model_type to schema loading.

The config.output_model_type is now passed through to _load_model_schema, ensuring the reuse-foreign strategy uses the correct output type for comparison.

tests/data/python/input_model/mixed_nested.py (1)

1-68: LGTM! Well-structured test fixtures for mixed-type reuse scenarios.

This test data file provides comprehensive coverage for the reuse-foreign strategy:

  • Category enum (always reused)
  • NestedTypedDict (reused when output is TypedDict)
  • NestedPydantic (reused when output is Pydantic)
  • NestedDataclass (reused when output is dataclass)
  • Composite models that combine these for testing mixed scenarios

The docstrings clearly document the expected behavior, making tests self-documenting.

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Dec 29, 2025

CodSpeed Performance Report

Merging #2854 will not alter performance

Comparing fix/reuse-foreign-output-type-comparison (da23e93) with main (0f7a6c9)

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

Summary

✅ 11 untouched
⏩ 98 skipped1

Footnotes

  1. 98 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 29, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.50%. Comparing base (0f7a6c9) to head (da23e93).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2854   +/-   ##
=======================================
  Coverage   99.50%   99.50%           
=======================================
  Files          90       90           
  Lines       14824    14869   +45     
  Branches     1777     1781    +4     
=======================================
+ Hits        14750    14795   +45     
  Misses         38       38           
  Partials       36       36           
Flag Coverage Δ
unittests 99.50% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@koxudaxi koxudaxi merged commit c46b64f into main Dec 30, 2025
37 checks passed
@koxudaxi koxudaxi deleted the fix/reuse-foreign-output-type-comparison branch December 30, 2025 00:41
@github-actions
Copy link
Copy Markdown
Contributor

Breaking Change Analysis

Result: Breaking changes detected

Reasoning: This PR changes the behavior of the --input-model-ref-strategy reuse-foreign option. Previously, "foreign" types were determined by comparing against the input model's type family - if you had a Pydantic model as input containing a dataclass field, the dataclass was considered "foreign" and would be imported/reused. After this change, "foreign" types are determined by comparing against the output model's type family. Now a type is only reused if it matches the output type family (or is an enum, which is always reused). This is a breaking change because users relying on the previous behavior of reuse-foreign will see different generated output: types that were previously imported may now be regenerated into the output type, and vice versa. The test changes confirm this - tests were updated to expect different output, showing that the behavior has observably changed.

Content for Release Notes

Default Behavior Changes

  • --input-model-ref-strategy reuse-foreign behavior changed - Previously, this strategy compared the source type family against the input model's family (e.g., if input was Pydantic, any non-Pydantic type like dataclass was considered "foreign" and reused). Now it compares against the output model's family. This means types that were previously imported/reused may now be regenerated, and vice versa. For example, when converting a Pydantic model containing a dataclass to TypedDict output, the dataclass was previously imported (it was "foreign" to Pydantic input), but now it will be regenerated (it's not the same family as TypedDict output). Enums are always reused regardless of output type. (Fix reuse-foreign to compare with output type instead of input type #2854)

This analysis was performed by Claude Code Action

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jan 1, 2026

🎉 Released in 0.51.0

This PR is now available in the latest release. See the release notes for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant