Fix reuse-foreign to compare with output type instead of input type by koxudaxi · Pull Request #2854 · koxudaxi/datamodel-code-generator

koxudaxi · 2025-12-29T13:25:34Z

Summary by CodeRabbit

New Features
- Added msgspec structure support in type discovery and classification.
- Improved type reuse logic to intelligently match types across different output model formats, determining when types can be safely reused without conversion.
Tests
- Extended test coverage for mixed-type reuse scenarios involving Pydantic models, TypedDict, dataclass, and enum types across various output formats.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-29T13:25:48Z

📝 Walkthrough

Walkthrough

This PR extends the datamodel-code-generator's type reuse logic to support mixed type families (Pydantic, dataclass, TypedDict, Enum, msgspec) when using the reuse-foreign strategy. It adds type family classification for msgspec, new reuse decision logic based on output model type, and comprehensive test coverage for nested mixed-type scenarios.

Changes

Cohort / File(s)	Summary
Core type family and reuse logic `src/datamodel_code_generator/__main__.py`	Extended `_find_models_in_type` to recognize BaseModel, Enum, dataclass, TypedDict, and msgspec structures. Added `_TYPE_FAMILY_MSGSPEC` constant and msgspec support to `_get_type_family`. Introduced `_get_output_family()` to map DataModelType to internal type-family strings and `_should_reuse_type()` to determine reuse eligibility across families (enums always reusable; others compared by family). Updated `_filter_defs_by_strategy` to accept `output_model_type` instead of input family string and use new reuse logic. Extended `_load_model_schema` signatures to accept and propagate `output_model_type` throughout call sites.
Test data for mixed-type reuse scenarios `tests/data/python/input_model/mixed_nested.py`	New file introducing mixed-type test models: Enum `Category`, TypedDict `NestedTypedDict`, Pydantic `NestedPydantic`, dataclass `NestedDataclass`, and four root models (`ModelWithTypedDict`, `ModelWithPydantic`, `ModelWithDataclass`, `ModelWithMixed`) combining these types to validate reuse behavior across families.
Reuse-foreign strategy test coverage `tests/test_input_model.py`	Updated and extended test expectations for `test_input_model_ref_strategy_reuse_foreign` and related cases to reflect new import/regeneration semantics: enums are always imported regardless of family; same-family types respect output model type; different-family types are regenerated. Added new test cases validating reuse-foreign behavior across TypedDict, dataclass, Pydantic, and msgspec output types with nested mixed-type scenarios.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant TypeDiscovery as Type Discovery
    participant FamilyClass as Family Classification
    participant ReuseLogic as Reuse Decision
    participant SchemaFilter as Schema Filtering
    
    User->>TypeDiscovery: Load input model with annotations
    TypeDiscovery->>TypeDiscovery: Scan for BaseModel, Enum, dataclass,<br/>TypedDict, msgspec structures
    TypeDiscovery->>FamilyClass: Provide discovered types
    
    FamilyClass->>FamilyClass: Map each type to family<br/>(PYDANTIC, DATACLASS,<br/>TYPEDDICT, MSGSPEC, ENUM)
    FamilyClass->>ReuseLogic: Provide source family & output type
    
    rect rgba(100, 200, 150, 0.3)
        Note over ReuseLogic: Reuse Decision Logic
        ReuseLogic->>ReuseLogic: Get output family from<br/>output_model_type
        ReuseLogic->>ReuseLogic: If Enum: always reuse
        ReuseLogic->>ReuseLogic: Else: compare source family<br/>vs output family
    end
    
    ReuseLogic->>SchemaFilter: Decision: reuse or regenerate?
    SchemaFilter->>SchemaFilter: Filter definitions by<br/>ReuseForeign strategy
    SchemaFilter->>User: Output schema with<br/>correct imports/regenerations

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

PR #2850: Modifies the same type-family constants and reuse/filtering helper logic in __main__.py, extending model-type discovery and adding new type-family taxonomy.
PR #2851: Changes input-model schema generation in __main__.py via _load_model_schema and related input-model processing updates.
PR #2837: Modifies _load_model_schema function and its call sites to propagate metadata for type preservation during schema augmentation.

Suggested labels

breaking-change-analyzed

Poem

🐰 A rabbit hops through type-family trees,
Finding Enums, Pydantic, dataclasses with ease,
Deciding what reuses and what regenerates anew,
Mixed types no longer confuse the queue,
Schema reuse-foreign logic shines bright and true! ✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: fixing reuse-foreign logic to compare with output type instead of input type, which is reflected throughout the changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/reuse-foreign-output-type-comparison

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2025-12-29T13:26:18Z

📚 Docs Preview: https://pr-2854.datamodel-code-generator.pages.dev

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

src/datamodel_code_generator/__main__.py (1)
1008-1039: Remove unused noqa directive on line 1016.

The static analysis indicates that # noqa: PLR0911 is no longer needed since the rule isn't enabled or the function no longer triggers it.
🔎 Proposed fix
-def _get_type_family(tp: type) -> str:  # noqa: PLR0911
+def _get_type_family(tp: type) -> str:

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0f7a6c9 and da23e93.

📒 Files selected for processing (3)

src/datamodel_code_generator/__main__.py
tests/data/python/input_model/mixed_nested.py
tests/test_input_model.py

🧰 Additional context used

🧬 Code graph analysis (1)

src/datamodel_code_generator/__main__.py (1)

src/datamodel_code_generator/enums.py (2)

DataModelType (48-56)

InputModelRefStrategy (199-210)

🪛 Ruff (0.14.10)

src/datamodel_code_generator/__main__.py

1016-1016: Unused noqa directive (non-enabled: PLR0911)

Remove unused noqa directive

(RUF100)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)

GitHub Check: 3.10 on Windows
GitHub Check: 3.10 on macOS
GitHub Check: 3.12 on macOS
GitHub Check: 3.11 on Windows
GitHub Check: 3.12 on Windows
GitHub Check: 3.14 on Windows
GitHub Check: 3.13 on Windows
GitHub Check: Analyze (python)
GitHub Check: benchmarks

🔇 Additional comments (15)

tests/test_input_model.py (8)

780-797: LGTM! Test correctly validates reuse-foreign imports enum and same-family types.

The test verifies that when using reuse-foreign strategy with TypedDict output:

Status enum is imported (enums always reused)

Metadata class is present (TypedDict reused as same-family)

Address and User classes are regenerated

800-821: LGTM! Properly validates that Status TypeAlias is not regenerated when imported.

The test correctly asserts that Status: TypeAlias is not in the output when using reuse-foreign, confirming the enum is imported rather than regenerated.

967-988: LGTM! Test validates same-family TypedDict reuse.

This test correctly verifies that when output is TypedDict and input contains a nested TypedDict (NestedTypedDict), it's imported rather than regenerated (verified by assert "class NestedTypedDict" not in content).

990-1007: LGTM! Test validates cross-family regeneration.

Correctly tests that a Pydantic model (NestedPydantic) is regenerated when output type is TypedDict, since they belong to different families.

1010-1031: LGTM! Test validates same-family dataclass reuse.

Test correctly verifies that when output is dataclass and input contains a nested dataclass (NestedDataclass), it's imported rather than regenerated.

1033-1056: LGTM! Comprehensive test for mixed nested types.

This test validates the complete behavior with mixed types: TypedDict is reused (imported), while Pydantic and dataclass are regenerated when output is TypedDict.

1058-1079: LGTM! Validates Pydantic-to-Pydantic same-family reuse.

Test correctly verifies that Pydantic models are imported when output is also Pydantic BaseModel.

1081-1099: LGTM! Test validates msgspec output behavior.

This test correctly verifies that non-msgspec types (like Pydantic) are regenerated when output type is msgspec.Struct, confirming the new msgspec family handling works.

src/datamodel_code_generator/__main__.py (6)

924-943: LGTM! Extended type discovery to include TypedDict and msgspec.

The updated _find_models_in_type function now correctly identifies:

BaseModel subclasses

Enum subclasses

Dataclasses

TypedDict (via __required_keys__)

msgspec Structs (via __struct_fields__)

This enables proper classification for the reuse-foreign strategy.

1042-1057: LGTM! Proper mapping of DataModelType to type families.

The _get_output_family function correctly maps:

Pydantic variants (BaseModel, V2BaseModel, V2Dataclass) → pydantic

DataclassesDataclass → dataclass

TypingTypedDict → typeddict

MsgspecStruct → msgspec

This aligns with the DataModelType enum values shown in the relevant code snippets.

1060-1068: LGTM! Clean reuse decision logic.

The _should_reuse_type function correctly implements the reuse-foreign strategy:

Enums are always reusable (return True)

Other types are reusable only when source family matches output family

This is the core fix for comparing with output type instead of input type.

1071-1109: LGTM! Updated strategy filtering to use output model type.

The _filter_defs_by_strategy function now correctly:

Derives output_family from output_model_type

Uses _should_reuse_type(type_family, output_family) for the reuse decision

This is the key change that makes reuse-foreign compare against output type rather than input type.

1112-1116: LGTM! Function signature updated with new parameter.

The output_model_type parameter with a sensible default allows backward compatibility while enabling the new behavior.

1827-1832: LGTM! Correctly propagates output_model_type to schema loading.

The config.output_model_type is now passed through to _load_model_schema, ensuring the reuse-foreign strategy uses the correct output type for comparison.

tests/data/python/input_model/mixed_nested.py (1)

1-68: LGTM! Well-structured test fixtures for mixed-type reuse scenarios.

This test data file provides comprehensive coverage for the reuse-foreign strategy:

Category enum (always reused)

NestedTypedDict (reused when output is TypedDict)

NestedPydantic (reused when output is Pydantic)

NestedDataclass (reused when output is dataclass)

Composite models that combine these for testing mixed scenarios

The docstrings clearly document the expected behavior, making tests self-documenting.

codspeed-hq · 2025-12-29T13:28:35Z

CodSpeed Performance Report

Merging #2854 will not alter performance

_{Comparing fix/reuse-foreign-output-type-comparison (da23e93) with main (0f7a6c9)}

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

Summary

✅ 11 untouched
⏩ 98 skipped¹

98 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

codecov · 2025-12-29T13:29:49Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.50%. Comparing base (0f7a6c9) to head (da23e93).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2854   +/-   ##
=======================================
  Coverage   99.50%   99.50%           
=======================================
  Files          90       90           
  Lines       14824    14869   +45     
  Branches     1777     1781    +4     
=======================================
+ Hits        14750    14795   +45     
  Misses         38       38           
  Partials       36       36

Flag	Coverage Δ
unittests	`99.50% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2025-12-30T00:42:56Z

Breaking Change Analysis

Result: Breaking changes detected

Reasoning: This PR changes the behavior of the --input-model-ref-strategy reuse-foreign option. Previously, "foreign" types were determined by comparing against the input model's type family - if you had a Pydantic model as input containing a dataclass field, the dataclass was considered "foreign" and would be imported/reused. After this change, "foreign" types are determined by comparing against the output model's type family. Now a type is only reused if it matches the output type family (or is an enum, which is always reused). This is a breaking change because users relying on the previous behavior of reuse-foreign will see different generated output: types that were previously imported may now be regenerated into the output type, and vice versa. The test changes confirm this - tests were updated to expect different output, showing that the behavior has observably changed.

Content for Release Notes

Default Behavior Changes

--input-model-ref-strategy reuse-foreign behavior changed - Previously, this strategy compared the source type family against the input model's family (e.g., if input was Pydantic, any non-Pydantic type like dataclass was considered "foreign" and reused). Now it compares against the output model's family. This means types that were previously imported/reused may now be regenerated, and vice versa. For example, when converting a Pydantic model containing a dataclass to TypedDict output, the dataclass was previously imported (it was "foreign" to Pydantic input), but now it will be regenerated (it's not the same family as TypedDict output). Enums are always reused regardless of output type. (Fix reuse-foreign to compare with output type instead of input type #2854)

This analysis was performed by Claude Code Action

github-actions · 2026-01-01T00:05:39Z

🎉 Released in 0.51.0

This PR is now available in the latest release. See the release notes for details.

Fix reuse-foreign to compare with output type instead of input type

da23e93

coderabbitai Bot reviewed Dec 29, 2025

View reviewed changes

koxudaxi merged commit c46b64f into main Dec 30, 2025
37 checks passed

koxudaxi deleted the fix/reuse-foreign-output-type-comparison branch December 30, 2025 00:41

github-actions Bot added breaking-change-analyzed breaking-change labels Dec 30, 2025

coderabbitai Bot mentioned this pull request Dec 31, 2025

Add multiple --input-model support with inheritance preservation #2881

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix reuse-foreign to compare with output type instead of input type#2854

Fix reuse-foreign to compare with output type instead of input type#2854
koxudaxi merged 1 commit intomainfrom
fix/reuse-foreign-output-type-comparison

koxudaxi commented Dec 29, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Dec 29, 2025 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

Uh oh!

github-actions Bot commented Dec 29, 2025

Uh oh!

coderabbitai Bot left a comment

Uh oh!

codspeed-hq Bot commented Dec 29, 2025

Uh oh!

codecov Bot commented Dec 29, 2025 •

edited

Loading

Uh oh!

Uh oh!

github-actions Bot commented Dec 30, 2025

Uh oh!

github-actions Bot commented Jan 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

koxudaxi commented Dec 29, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

Pre-merge checks and finishing touches

Uh oh!

github-actions Bot commented Dec 29, 2025

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

codspeed-hq Bot commented Dec 29, 2025

CodSpeed Performance Report

Merging #2854 will not alter performance

Summary

Footnotes

Uh oh!

codecov Bot commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

github-actions Bot commented Dec 30, 2025

Breaking Change Analysis

Content for Release Notes

Default Behavior Changes

Uh oh!

github-actions Bot commented Jan 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

koxudaxi commented Dec 29, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Dec 29, 2025 •

edited

Loading

codecov Bot commented Dec 29, 2025 •

edited

Loading