Add --input-model-ref-strategy option for controlling type reuse#2850
Add --input-model-ref-strategy option for controlling type reuse#2850
Conversation
|
Warning Rate limit exceeded@koxudaxi has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 22 minutes and 33 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (14)
📝 WalkthroughWalkthroughAdds a new CLI option Changes
Sequence DiagramsequenceDiagram
autonumber
participant CLI as CLI Parser
participant Config as Config
participant Loader as _load_model_schema
participant Filter as _filter_defs_by_strategy
participant Parser as JsonSchemaParser
participant Output as Code Generator
CLI->>Config: parse --input-model-ref-strategy
Config->>Loader: pass input_model_ref_strategy
Loader->>Loader: generate JSON Schema from input model
Loader->>Loader: collect nested types (pydantic, enum, dataclass, typedict)
Loader->>Filter: filter/annotate $defs using strategy
alt strategy marks reuse
Filter->>Filter: classify type families, add `x-python-import` to reused defs
Filter->>Loader: return modified schema
end
Loader->>Parser: provide annotated schema
Parser->>Parser: detect `x-python-import` on $ref
alt x-python-import present
Parser->>Parser: create Import, mark ref as handled, skip generation
else
Parser->>Parser: generate model from schema
end
Parser->>Output: emit code and imports
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
📚 Docs Preview: https://pr-2850.datamodel-code-generator.pages.dev |
CodSpeed Performance ReportMerging #2850 will not alter performanceComparing
|
a99405b to
1d6e5a3
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (7)
tests/test_input_model.py (1)
609-801: Ref-strategy tests correctly exercise all strategies and key behaviorsThe new tests cover default vs explicit
regenerate-all,reuse-foreignandreuse-all(including “no regeneration” checks), plus dataclass/TypedDict inputs and the “ignored without --input-model” case. This gives good confidence in the new option’s semantics.If this area grows further, consider parametrizing the nested_models/dataclass_nested cases to reduce repetition, but that’s purely optional.
src/datamodel_code_generator/parser/jsonschema.py (1)
1223-1237: x-python-import handling is aligned with reuse strategies; consider a couple of small robustness tweaksThe new behavior in
get_ref_data_typeandparse_raw_objachieves the intended effect:
$reftargets whose resolved schema hasextras["x-python-import"]are now turned into direct imports viaImport.from_full_path(...)andself.data_type.from_import(import_), with no model generation.- Definitions whose raw value contains
x-python-importare marked asloadedvia_handle_python_import, so the later reserved-ref machinery won’t try to parse them into models, which fits the “reuse imported type” semantics.Two minor points you might want to tighten up:
Guard the parse_raw_obj fast-path on shape, not just key presence
Right now any dict with an
"x-python-import"key is skipped:if isinstance(raw, dict) and "x-python-import" in raw: self._handle_python_import(name, path) returnbut
get_ref_data_typeonly treats it specially whenref_schema.extras["x-python-import"]is adictwith bothmoduleandname. If someone supplied a malformedx-python-import(or used that key for a different extension), we’d end up with a loaded reference and no generated model, whileget_ref_data_typewould fall back to regular ref handling and never see a backing model. You could instead only short-circuit when the value is a proper dict with the expected keys, otherwise fall back to normal parsing.Confirm
self.data_typeis bound to the DataType class
self.data_typeis used elsewhere as a callable factory; here you’re callingself.data_type.from_import(import_). This assumesself.data_typeis a DataType class (or subclass) exposingfrom_import. That matches typical wiring in this project but is worth verifying; if it were ever rebound to a plain factory function, this would raise at runtime. An explicitDataType.from_import(import_)(or going throughself.data_type_managerif you prefer) would make the dependency clearer.These are non-blocking; the core logic and interactions with
model_resolver/reserved_refslook solid.Also applies to: 3536-3549
src/datamodel_code_generator/__main__.py (5)
447-447: Remove unused noqa directive.The
noqa: UP045directive is unused per static analysis. While the code is correct, removing unnecessary directives keeps the codebase clean.🔎 Proposed fix
- input_model_ref_strategy: Optional[InputModelRefStrategy] = None # noqa: UP045 + input_model_ref_strategy: Optional[InputModelRefStrategy] = NoneBased on static analysis hints.
686-709: Remove unused noqa directives.The function logic correctly handles collecting nested types from Pydantic models, dataclasses, and TypedDicts. However, the
noqa: PLC0415directive at line 688 is unused per static analysis.🔎 Proposed fix
- from dataclasses import is_dataclass # noqa: PLC0415 + from dataclasses import is_dataclassBased on static analysis hints.
712-726: Remove unused noqa directives.The enhanced logic to find BaseModel subclasses, Enums, and dataclasses looks correct. However, static analysis flags unused
noqa: PLC0415directives at lines 714, 715, and 716.🔎 Proposed fix
- from dataclasses import is_dataclass # noqa: PLC0415 - from enum import Enum as PyEnum # noqa: PLC0415 + from dataclasses import is_dataclass + from enum import Enum as PyEnum from typing import get_args # noqa: PLC0415Based on static analysis hints.
791-828: Remove unused_OUTPUT_TYPE_FAMILYglobal and unused noqa directives.The
_get_type_family()function implementation looks correct for determining type families. However:
- The
_OUTPUT_TYPE_FAMILYmapping defined at lines 797-804 is never used in the code (flagged by CodeQL).- The
noqa: PLC0415directives at lines 809-810 are unused per static analysis.If
_OUTPUT_TYPE_FAMILYwas intended for future use or comparison against output model types, consider either implementing that functionality or removing the unused mapping to reduce code clutter.🔎 Proposed fix
_TYPE_FAMILY_ENUM = "enum" _TYPE_FAMILY_PYDANTIC = "pydantic" _TYPE_FAMILY_DATACLASS = "dataclass" _TYPE_FAMILY_TYPEDDICT = "typeddict" _TYPE_FAMILY_OTHER = "other" -_OUTPUT_TYPE_FAMILY: dict[DataModelType, str] = { - DataModelType.PydanticBaseModel: _TYPE_FAMILY_PYDANTIC, - DataModelType.PydanticV2BaseModel: _TYPE_FAMILY_PYDANTIC, - DataModelType.DataclassesDataclass: _TYPE_FAMILY_DATACLASS, - DataModelType.PydanticV2Dataclass: _TYPE_FAMILY_PYDANTIC, - DataModelType.TypingTypedDict: _TYPE_FAMILY_TYPEDDICT, - DataModelType.MsgspecStruct: "msgspec", -} - def _get_type_family(tp: type) -> str: """Determine the type family of a Python type.""" - from dataclasses import is_dataclass # noqa: PLC0415 - from enum import Enum as PyEnum # noqa: PLC0415 + from dataclasses import is_dataclass + from enum import Enum as PyEnumBased on static analysis hints and CodeQL findings.
872-999: Remove unused noqa directive; otherwise LGTM.The enhanced
_load_model_schemafunction correctly applies the reference strategy for both Pydantic models and dataclass/TypedDict inputs. The logic consistently:
- Generates the JSON schema
- Adds Python type information
- Filters
$defsbased on the strategy when applicableHowever, the
noqa: PLR0912, PLR0914, PLR0915directive at line 872 is unused per static analysis.🔎 Proposed fix
-def _load_model_schema( # noqa: PLR0912, PLR0914, PLR0915 +def _load_model_schema( input_model: str, input_file_type: InputFileType, ref_strategy: InputModelRefStrategy | None = None,Based on static analysis hints.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
docs/cli-reference/base-options.mddocs/cli-reference/index.mddocs/cli-reference/quick-reference.mdsrc/datamodel_code_generator/__init__.pysrc/datamodel_code_generator/__main__.pysrc/datamodel_code_generator/arguments.pysrc/datamodel_code_generator/cli_options.pysrc/datamodel_code_generator/enums.pysrc/datamodel_code_generator/parser/jsonschema.pysrc/datamodel_code_generator/prompt_data.pytests/data/python/input_model/dataclass_nested.pytests/data/python/input_model/nested_models.pytests/test_input_model.py
🧰 Additional context used
🧬 Code graph analysis (9)
src/datamodel_code_generator/enums.py (1)
src/datamodel_code_generator/model/enum.py (1)
Enum(39-121)
src/datamodel_code_generator/arguments.py (1)
src/datamodel_code_generator/enums.py (1)
InputModelRefStrategy(199-210)
tests/data/python/input_model/dataclass_nested.py (2)
src/datamodel_code_generator/model/enum.py (1)
Enum(39-121)src/datamodel_code_generator/model/base.py (1)
name(827-829)
tests/test_input_model.py (1)
src/datamodel_code_generator/__main__.py (2)
main(1355-1658)Exit(124-130)
src/datamodel_code_generator/__init__.py (1)
src/datamodel_code_generator/enums.py (1)
InputModelRefStrategy(199-210)
src/datamodel_code_generator/parser/jsonschema.py (3)
src/datamodel_code_generator/reference.py (4)
get(983-985)reference(76-78)add_ref(805-828)add(906-981)src/datamodel_code_generator/imports.py (2)
Import(20-38)from_full_path(35-38)src/datamodel_code_generator/types.py (2)
imports(482-536)from_import(358-382)
tests/data/python/input_model/nested_models.py (3)
src/datamodel_code_generator/model/enum.py (1)
Enum(39-121)src/datamodel_code_generator/model/typed_dict.py (1)
key(131-135)src/datamodel_code_generator/model/base.py (1)
name(827-829)
src/datamodel_code_generator/cli_options.py (1)
src/datamodel_code_generator/model/base.py (1)
name(827-829)
src/datamodel_code_generator/__main__.py (1)
src/datamodel_code_generator/enums.py (3)
InputModelRefStrategy(199-210)DataModelType(48-56)InputFileType(35-45)
🪛 GitHub Check: CodeQL
src/datamodel_code_generator/__main__.py
[notice] 797-797: Unused global variable
The global variable '_OUTPUT_TYPE_FAMILY' is not used.
🪛 Ruff (0.14.10)
src/datamodel_code_generator/__main__.py
447-447: Unused noqa directive (non-enabled: UP045)
Remove unused noqa directive
(RUF100)
688-688: Unused noqa directive (non-enabled: PLC0415)
Remove unused noqa directive
(RUF100)
714-714: Unused noqa directive (non-enabled: PLC0415)
Remove unused noqa directive
(RUF100)
715-715: Unused noqa directive (non-enabled: PLC0415)
Remove unused noqa directive
(RUF100)
716-716: Unused noqa directive (non-enabled: PLC0415)
Remove unused noqa directive
(RUF100)
809-809: Unused noqa directive (non-enabled: PLC0415)
Remove unused noqa directive
(RUF100)
810-810: Unused noqa directive (non-enabled: PLC0415)
Remove unused noqa directive
(RUF100)
872-872: Unused noqa directive (non-enabled: PLR0912, PLR0914, PLR0915)
Remove unused noqa directive
(RUF100)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: 3.10 on Windows
- GitHub Check: 3.13 on Windows
🔇 Additional comments (13)
docs/cli-reference/index.md (1)
11-11: LGTM! Documentation properly updated for new option.The Base Options count is correctly incremented to 7, and the new
--input-model-ref-strategyoption is properly added to the alphabetical index in the correct position.Also applies to: 110-110
tests/data/python/input_model/dataclass_nested.py (1)
1-31: LGTM! Well-structured test data for dataclass nesting.This test data file provides clear examples of nested types (Enum + dataclasses) for exercising the
--input-model-ref-strategyfeature. The structure is clean with proper type hints and descriptive docstrings explaining the purpose of each component.tests/data/python/input_model/nested_models.py (1)
1-40: LGTM! Excellent test data for mixed model types.This file provides comprehensive test coverage by mixing three different model families (Enum, dataclass, BaseModel) in a single structure. The
Usermodel references all three types, which effectively exercises the different reuse strategies. The docstrings clearly explain the intended behavior for each strategy.docs/cli-reference/quick-reference.md (1)
23-23: LGTM! Quick reference properly updated.The new option is correctly added to both the Base Options table and the alphabetical index with a clear, concise description.
Also applies to: 256-256
src/datamodel_code_generator/enums.py (1)
199-211: LGTM! Well-defined enum with clear documentation.The
InputModelRefStrategyenum follows the established patterns in this module:
- PascalCase member names with kebab-case string values
- Comprehensive docstring explaining each strategy's purpose
- Proper placement in the module and
__all__export listAlso applies to: 236-236
src/datamodel_code_generator/__init__.py (1)
42-42: LGTM! Proper public API exposure.The
InputModelRefStrategyenum is correctly imported from the enums module and added to the__all__export list, maintaining alphabetical ordering in both locations. This properly exposes it as part of the package's public API.Also applies to: 1059-1059
src/datamodel_code_generator/cli_options.py (1)
69-69: LGTM! CLI metadata properly registered.The new option is correctly registered in the CLI metadata system with the appropriate category (BASE) and is logically placed next to the related
--input-modeloption.docs/cli-reference/base-options.md (1)
11-11: Documentation is accurate and well-written.The documentation correctly identifies
regenerate-allas the default strategy. This is confirmed by the argument parser help text ("If not specified, defaults to regenerate-all behavior") and the code logic, whereNone(the default argument value) results in RegenerateAll behavior since the special filtering is only applied whenref_strategyis explicitly set to a non-default value.src/datamodel_code_generator/prompt_data.py (1)
66-66: New prompt description for--input-model-ref-strategyis consistentThe added OPTION_DESCRIPTIONS entry is concise and aligned with the CLI/help semantics of the new flag. No issues here.
src/datamodel_code_generator/arguments.py (1)
16-35: CLI wiring for--input-model-ref-strategylooks correct; verify default mappingImporting
InputModelRefStrategyand exposing--input-model-ref-strategyviabase_optionswithchoices=[s.value for s in InputModelRefStrategy]is consistent with how other enum-backed options are handled.The help text says the behavior defaults to
regenerate-all, while the argparse default isNone. That’s fine as long asConfig.input_model_ref_strategyor_load_model_schemamapsNonetoInputModelRefStrategy.RegenerateAllinternally. Please double-check that mapping so the documented default matches runtime behavior.Also applies to: 167-176
src/datamodel_code_generator/__main__.py (3)
62-62: LGTM!The import of
InputModelRefStrategyis correctly added and used throughout the file.
830-869: LGTM!The
_filter_defs_by_strategy()function correctly implements the filtering logic:
- RegenerateAll: No filtering (defensive check, though callers already exclude this case)
- ReuseAll: All nested types marked for reuse with
x-python-import- ReuseForeign: Only types from different families than the input model are marked for reuse
The implementation aligns with the documented strategy behaviors.
1585-1589: LGTM!The call to
_load_model_schemacorrectly passesconfig.input_model_ref_strategy, properly wiring the new configuration option through to the implementation.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2850 +/- ##
========================================
Coverage 99.50% 99.50%
========================================
Files 90 90
Lines 14489 14605 +116
Branches 1736 1748 +12
========================================
+ Hits 14417 14533 +116
Misses 37 37
Partials 35 35
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
docs/cli-reference/base-options.mddocs/cli-reference/index.mddocs/cli-reference/quick-reference.mdsrc/datamodel_code_generator/__init__.pysrc/datamodel_code_generator/__main__.pysrc/datamodel_code_generator/arguments.pysrc/datamodel_code_generator/cli_options.pysrc/datamodel_code_generator/enums.pysrc/datamodel_code_generator/parser/jsonschema.pysrc/datamodel_code_generator/prompt_data.pytests/data/python/input_model/dataclass_nested.pytests/data/python/input_model/nested_models.pytests/test_input_model.py
🚧 Files skipped from review as they are similar to previous changes (9)
- src/datamodel_code_generator/init.py
- tests/data/python/input_model/nested_models.py
- tests/data/python/input_model/dataclass_nested.py
- src/datamodel_code_generator/parser/jsonschema.py
- src/datamodel_code_generator/cli_options.py
- src/datamodel_code_generator/enums.py
- docs/cli-reference/quick-reference.md
- src/datamodel_code_generator/prompt_data.py
- docs/cli-reference/base-options.md
🧰 Additional context used
🧬 Code graph analysis (3)
src/datamodel_code_generator/arguments.py (1)
src/datamodel_code_generator/enums.py (1)
InputModelRefStrategy(199-210)
tests/test_input_model.py (1)
src/datamodel_code_generator/__main__.py (2)
main(1355-1658)Exit(124-130)
src/datamodel_code_generator/__main__.py (1)
src/datamodel_code_generator/enums.py (2)
InputModelRefStrategy(199-210)DataModelType(48-56)
🪛 GitHub Check: CodeQL
src/datamodel_code_generator/__main__.py
[notice] 797-797: Unused global variable
The global variable '_OUTPUT_TYPE_FAMILY' is not used.
🪛 Ruff (0.14.10)
src/datamodel_code_generator/__main__.py
447-447: Unused noqa directive (non-enabled: UP045)
Remove unused noqa directive
(RUF100)
688-688: Unused noqa directive (non-enabled: PLC0415)
Remove unused noqa directive
(RUF100)
714-714: Unused noqa directive (non-enabled: PLC0415)
Remove unused noqa directive
(RUF100)
715-715: Unused noqa directive (non-enabled: PLC0415)
Remove unused noqa directive
(RUF100)
716-716: Unused noqa directive (non-enabled: PLC0415)
Remove unused noqa directive
(RUF100)
809-809: Unused noqa directive (non-enabled: PLC0415)
Remove unused noqa directive
(RUF100)
810-810: Unused noqa directive (non-enabled: PLC0415)
Remove unused noqa directive
(RUF100)
872-872: Unused noqa directive (non-enabled: PLR0912, PLR0914, PLR0915)
Remove unused noqa directive
(RUF100)
🔇 Additional comments (14)
docs/cli-reference/index.md (2)
11-11: LGTM! Documentation update is accurate.The Base Options count has been correctly incremented from 6 to 7 to reflect the addition of the new
--input-model-ref-strategyoption.
110-110: LGTM! New option properly documented.The new
--input-model-ref-strategyoption has been correctly added to the alphabetical index under section I, with proper formatting and link to the base-options documentation.tests/test_input_model.py (2)
609-627: LGTM! Well-documented test with CLI documentation marker.The test includes proper CLI documentation metadata via the
@pytest.mark.cli_docdecorator, which helps generate user-facing documentation. The description clearly explains the three strategy options and their purposes.
628-803: LGTM! Comprehensive test coverage for the new feature.The test suite thoroughly covers all three strategies (regenerate-all, reuse-foreign, reuse-all) with both positive assertions (what should be present) and negative assertions (what should NOT be regenerated). Tests also cover:
- Multiple output model types (TypedDict, dataclass)
- Edge cases (strategy without --input-model)
- Different input model types (Pydantic models, dataclasses, TypedDict)
The use of helper functions like
run_input_model_and_assertkeeps tests concise and maintainable.src/datamodel_code_generator/arguments.py (2)
26-26: LGTM! Import correctly added.The
InputModelRefStrategyenum is properly imported alongside other enums in the appropriate section.
167-176: LGTM! Well-designed CLI argument with clear documentation.The new
--input-model-ref-strategyoption is properly implemented with:
- Clear help text explaining all three strategies
- Proper validation using enum values
- Sensible default of
Noneto distinguish unspecified from explicitly setThe help text effectively communicates the behavior of each strategy to users.
src/datamodel_code_generator/__main__.py (8)
62-62: LGTM! Import correctly added.The
InputModelRefStrategyimport is properly placed with other enum imports from the main package.
447-447: LGTM! Config field properly added.The
input_model_ref_strategyfield is correctly typed and positioned in the Config model.
686-727: LGTM! Enhanced type collection for ref strategy support.The functions
_collect_nested_modelsand_find_models_in_typehave been properly extended to:
- Collect Enum and dataclass types in addition to BaseModel subclasses
- Recursively discover nested types in dataclass and TypedDict via type hints
- Handle both Pydantic models (via
model_fields) and dataclass/TypedDict (via annotations)This enhancement is essential for the ref-strategy feature to identify which types can be reused vs. regenerated.
830-870: LGTM! Core strategy logic correctly implemented.The
_filter_defs_by_strategyfunction properly implements the three strategies:
- RegenerateAll: Returns schema unchanged (all types regenerated)
- ReuseForeign: Marks types from different families with
x-python-importfor reuse- ReuseAll: Marks all nested types with
x-python-importfor reuseThe logic correctly:
- Preserves unknown types (not in nested_models)
- Determines type families and applies strategy rules
- Uses
x-python-importannotation with module and name for import generation
872-876: LGTM! Function signature properly updated.The
_load_model_schemafunction signature correctly adds the optionalref_strategyparameter with proper type hints and documentation.
954-964: LGTM! Pydantic model path correctly implements ref strategy.The code properly:
- Adds x-python-type information for type preservation
- Only applies strategy filtering when a non-RegenerateAll strategy is specified
- Collects nested models including the main model if present in $defs
- Determines input family and applies the filter
982-996: LGTM! Dataclass/TypedDict path correctly implements ref strategy.Parallel to the Pydantic path, this properly:
- Adds x-python-type information via the generic function
- Applies strategy filtering when specified
- Handles both dataclass and TypedDict as obj_type
The implementation is consistent with the Pydantic path.
1585-1589: LGTM! Main function correctly wires the ref strategy.The
mainfunction properly passesconfig.input_model_ref_strategyto_load_model_schema, completing the end-to-end flow from CLI argument to schema loading logic.
0523e20 to
9212c59
Compare
69711cc to
ddf21d3
Compare
🤖 Generated by GitHub Actions
Breaking Change AnalysisResult: No breaking changes detected Reasoning: This PR is purely additive - it introduces a new optional CLI option This analysis was performed by Claude Code Action |
|
🎉 Released in 0.51.0 This PR is now available in the latest release. See the release notes for details. |
Summary by CodeRabbit
New Features
Documentation
Tests
✏️ Tip: You can customize this high-level summary in your review settings.