Refactor parser base post-processing for DRY and type-safe implementation#2730
Refactor parser base post-processing for DRY and type-safe implementation#2730
Conversation
WalkthroughAdds import utility methods for alias resolution and unused-import removal, introduces model capability flags across several data-model implementations, refactors the parser into a module-based pipeline with ModuleContext/ParseConfig, and adds a recursive DataType.walk visitor with cycle protection. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant CLI as CLI / Caller
participant Parser as Parser
participant Builder as _build_module_structure
participant Processor as _process_single_module
participant Finalizer as _finalize_modules
participant Renderer as _generate_module_output
participant FS as FileSystem
CLI->>Parser: parse(input, config)
Parser->>Parser: _prepare_parse_config(config)
Parser->>Builder: _build_module_structure(parse_config)
Builder-->>Parser: ModuleModels, internal map, ForwarderMap
Parser->>Processor: for each module -> _process_single_module(context)
Processor-->>Parser: ModuleContext (per-module)
Parser->>Finalizer: _finalize_modules(list(ModuleContext))
Finalizer-->>Parser: finalized contexts (imports cleaned, forwarders added)
Parser->>Renderer: _generate_module_output(ModuleContext)
Renderer-->>FS: write module files / __init__ exports
FS-->>CLI: output paths
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2730 +/- ##
=======================================
Coverage 99.35% 99.36%
=======================================
Files 83 83
Lines 12004 12062 +58
Branches 1452 1456 +4
=======================================
+ Hits 11927 11985 +58
Misses 45 45
Partials 32 32
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
CodSpeed Performance ReportMerging #2730 will not alter performanceComparing Summary
Footnotes
|
77b7529 to
661dc23
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/datamodel_code_generator/parser/base.py (1)
2592-2657: Potential stale mapping forunused_modelsafter circular-module resolution
_build_module_structurebuildsmodel_to_module_modelsbefore calling__resolve_circular_imports, which can move models into synthetic_internalmodules and replace the per-module model lists. Later,_finalize_modulesusesmodel_to_module_modelsto locate and removeunused_models(from__collapse_root_models) and to update imports viamodule_to_import. For models that have been relocated into an_internalmodule, this mapping will still point at the original module and its pre-resolution list, so:
- The
models.remove(unused_model)call can operate on a list that is no longer the one used by anyModuleContext, leaving the unused model in the actual output.- Imports for the real module containing the unused model (e.g., the
_internalmodule) will not be cleaned up.This is an edge case (it requires both cross-module SCCs and collapsible root models), but it’s a real behavioral divergence.
Suggested refactor to bind removals to the final module structure
One approach is to derive the “model → context” mapping from
contextsinstead of carryingmodel_to_module_modelsthrough:- def _finalize_modules( - self, - contexts: list[ModuleContext], - unused_models: list[DataModel], - model_to_module_models: dict[DataModel, tuple[ModulePath, list[DataModel]]], - module_to_import: dict[ModulePath, Imports], - ) -> None: + def _finalize_modules( + self, + contexts: list[ModuleContext], + unused_models: list[DataModel], + ) -> None: """Finalize module processing: apply generic base class and remove unused imports.""" self.__apply_generic_base_class(contexts) - for ctx in contexts: - for model in ctx.models: - ctx.imports.append(model.imports) - - for unused_model in unused_models: - module, models = model_to_module_models[unused_model] - if unused_model in models: - imports = module_to_import[module] - imports.remove(unused_model.imports) - models.remove(unused_model) + # Bind each model to its final module context. + model_to_context: dict[DataModel, ModuleContext] = {} + for ctx in contexts: + for model in ctx.models: + model_to_context[model] = ctx + ctx.imports.append(model.imports) + + # Drop models that became unused after root-model collapsing, etc. + for unused_model in unused_models: + ctx = model_to_context.get(unused_model) + if not ctx: + continue + ctx.imports.remove(unused_model.imports) + try: + ctx.models.remove(unused_model) + except ValueError: + # Model already removed; ignore. + passand update the call site in
parse()accordingly:- self._finalize_modules(contexts, unused_models, model_to_module_models, module_to_import) + self._finalize_modules(contexts, unused_models)This ties removals to the post–
__resolve_circular_importsmodule layout and ensures unused models really disappear from both code and import sets.Also applies to: 2721-2745
🧹 Nitpick comments (3)
src/datamodel_code_generator/imports.py (1)
171-197: Alias-aware unused-import pruning looks correct for current usage
get_effective_nameplusremove_unusedcorrectly consider both the alias and original imported name, which matches how_collect_used_names_from_modelsbuildsused_names(based on aliases and simple identifiers). The loop that repeatedly callsremoveuntil the counter reaches zero respects the reference counting inImports. For typical module sizes this O(n²) behavior overreference_pathsis acceptable; you can revisit it if modules ever accumulate hundreds of distinct imports.src/datamodel_code_generator/parser/base.py (2)
2547-2590: ParseConfig helper looks good; consider cleaning up stalenoqadirectivesThe new
_prepare_parse_configcleanly centralizes computation ofuse_deferred_annotations, globalIMPORT_ANNOTATIONS, and the optionalCodeFormatter. The subsequentparse()refactor that delegates to this helper improves readability.Static analysis (RUF100) points out that several
# noqadirectives on these definitions (PLR0913,PLR0914,PLR0917,FBT001,FBT002) are now unused under the current Ruff configuration. If you don’t plan to enable those checks, you can drop the corresponding# noqacomments to keep the file tidy.Also applies to: 2843-2852
2547-2552: Ruff RUF100: severalnoqacomments can likely be droppedRuff reports unused
# noqadirectives forPLR0913,PLR0914,PLR0917,FBT001, andFBT002onparse,_prepare_parse_config,_process_single_module, and_generate_module_output. If your current Ruff configuration doesn’t enable these rules, you can safely remove the corresponding# noqacomments; otherwise consider enabling the checks so the directives remain meaningful.Also applies to: 2659-2668, 2749-2755, 2843-2848
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
src/datamodel_code_generator/imports.pysrc/datamodel_code_generator/model/base.pysrc/datamodel_code_generator/model/dataclass.pysrc/datamodel_code_generator/model/msgspec.pysrc/datamodel_code_generator/model/pydantic/base_model.pysrc/datamodel_code_generator/model/pydantic_v2/base_model.pysrc/datamodel_code_generator/parser/base.pysrc/datamodel_code_generator/types.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/datamodel_code_generator/parser/base.py (3)
src/datamodel_code_generator/model/base.py (5)
DataModel(470-760)imports(283-302)imports(642-647)path(739-741)name(658-660)src/datamodel_code_generator/imports.py (2)
remove(91-123)remove_unused(175-196)src/datamodel_code_generator/format.py (1)
format_code(272-291)
🪛 Ruff (0.14.8)
src/datamodel_code_generator/parser/base.py
2547-2547: Unused noqa directive (non-enabled: PLR0913, PLR0917)
Remove unused noqa directive
(RUF100)
2549-2549: Unused noqa directive (non-enabled: FBT001)
Remove unused noqa directive
(RUF100)
2550-2550: Unused noqa directive (non-enabled: FBT001)
Remove unused noqa directive
(RUF100)
2552-2552: Unused noqa directive (non-enabled: FBT001)
Remove unused noqa directive
(RUF100)
2659-2659: Unused noqa directive (non-enabled: PLR0913, PLR0917)
Remove unused noqa directive
(RUF100)
2749-2749: Unused noqa directive (non-enabled: PLR0913, PLR0917)
Remove unused noqa directive
(RUF100)
2843-2843: Unused noqa directive (non-enabled: PLR0913, PLR0914, PLR0917)
Remove unused noqa directive
(RUF100)
2845-2845: Unused noqa directive (non-enabled: FBT001, FBT002)
Remove unused noqa directive
(RUF100)
2846-2846: Unused noqa directive (non-enabled: FBT001, FBT002)
Remove unused noqa directive
(RUF100)
2848-2848: Unused noqa directive (non-enabled: FBT001, FBT002)
Remove unused noqa directive
(RUF100)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: 3.11 on Windows
- GitHub Check: 3.14 on Windows
- GitHub Check: 3.12 on macOS
- GitHub Check: py312-isort7 on Ubuntu
- GitHub Check: py312-pydantic1 on Ubuntu
- GitHub Check: 3.13 on Ubuntu
- GitHub Check: 3.12 on Windows
- GitHub Check: 3.13 on Windows
- GitHub Check: 3.10 on Windows
- GitHub Check: benchmarks
- GitHub Check: Analyze (python)
🔇 Additional comments (12)
src/datamodel_code_generator/model/dataclass.py (1)
43-46: Dataclass capability flags are consistent with new parser gatingExposing
SUPPORTS_DISCRIMINATORandSUPPORTS_KW_ONLYonDataClassaligns with the newDataModeldefaults and the parser’s feature-gated paths (__apply_discriminator_type,__get_dataclass_inherited_info). No behavioral issues spotted.src/datamodel_code_generator/model/msgspec.py (1)
108-117: Struct discriminator support flag fits the new feature-flag patternSetting
SUPPORTS_DISCRIMINATOR = TrueonStructmatches how the parser now gates discriminator handling onDataModel.SUPPORTS_DISCRIMINATORand avoids backend-specific isinstance checks.src/datamodel_code_generator/model/pydantic_v2/base_model.py (1)
170-179: Pydantic v2 capability flags correctly advertise supported features
SUPPORTS_DISCRIMINATOR,SUPPORTS_FIELD_RENAMING, andSUPPORTS_WRAPPED_DEFAULTbeingTruefor the v2BaseModellines up with how the parser now conditionally runs discriminator logic, field renaming, and root-model default wrapping.src/datamodel_code_generator/model/pydantic/base_model.py (1)
329-333: Pydantic v1 discriminator flag preserves existing behaviorAdding
SUPPORTS_DISCRIMINATOR = Trueto the v1BaseModelkeeps discriminator handling enabled under the new generic flag-based checks without changing previous semantics.src/datamodel_code_generator/types.py (1)
432-448:DataType.walkis a reasonable, cycle-safe traversal helperThe new
walkmethod visits each node once using anid-based visited set and correctly recurses throughdata_typesanddict_key. This is suitable for the import-usage collection in the parser without changing existingall_data_typesbehavior.src/datamodel_code_generator/model/base.py (1)
476-485: BaseDataModelcapability flags are a clean extensibility hookThe new
SUPPORTS_*class vars (discriminator, field renaming, wrapped default, kw-only) defaulting toFalseprovide a clear, type-level contract for parsers to query backend capabilities instead of hard-coding concrete model classes.src/datamodel_code_generator/parser/base.py (6)
104-129: ModuleContext/ParseConfig abstractions clarify the parsing pipelineIntroducing
ModulePath/ModuleModels/ForwarderMapaliases plusModuleContextandParseConfigmakes the later helpers (_build_module_structure, _process_single_module, _generate_module_output) much easier to reason about. The fields onModuleContext(module path, key, models, init flag, imports, resolver) are exactly what the downstream steps need.
2238-2256: UsingDataType.walkin_collect_used_names_from_modelsimproves import reachability analysisSwitching from manual iteration over
all_data_typestofield.data_type.walk(collect_data_type_names)ensures every reachableDataType(including nested dict keys) contributes its alias/type and reference short name toused_names, while avoiding cycles via thevisitedset inwalk. Combined withImports.remove_unused, this should safely prune genuinely unused imports without dropping ones referenced only from nested types.
2659-2720: Single-module processing pipeline is well-structured
_process_single_modulecleanly sequences all the per-module transformations that used to be inlined inparse(): shadowed-import aliasing, required-field overrides, unique-items→set conversion, cross-module import rewrites, enum extraction, reuse, root-model collapsing, default wrapping, sort/model-order handling, discriminator application, and type-alias updates. Returning aModuleContextencapsulates everything the later phases need (module path, models, imports, resolver) and makes the flow more testable.
2749-2815: Module output generation correctly composes imports, exports, and body
_generate_module_output:
- Builds imports from
future_imports_str, global parser imports, and per-module imports only whenwith_importis enabled.- Optionally injects
__all__-driven re-export imports for__init__.pybased on_collect_exports_for_initand collision strategy.- Emits either rendered models plus resolve-reference actions, or, when there are no models, a forwarder module based on
forwarder_map.The final optional formatting via
code_formattermatches the rest of the tool’s behavior.
2816-2842: Empty-init export generation and overall parse orchestration look coherent
_generate_empty_init_exportsfills in bare__init__.pyfiles with re-exports whenall_exports_scopeis set, reusing the same collision-resolution logic as non-empty modules. The refactoredparse()orchestrates:
- configuration (
_prepare_parse_config),- model sorting,
- module-structure construction (including SCC/internal-module handling),
- per-module processing and finalization,
- and per-module output plus any empty-
__init__exports.The post-processing that normalizes module keys when
treat_dot_as_moduleis on appears unchanged from the previous logic.Also applies to: 2898-2916
1271-1274: Feature-flag gating for discriminators, wrapped defaults, renaming, and kw-only is sound
__apply_discriminator_typechecksisinstance(discriminator_model, DataModel)andSUPPORTS_DISCRIMINATOR, decoupling parser logic from specific backend classes.__wrap_root_model_default_valuesis guarded byself.data_model_type.SUPPORTS_WRAPPED_DEFAULT, ensuring only backends that supportWrappedDefaultreceive this behavior.__change_field_nameruns only whenself.data_model_type.SUPPORTS_FIELD_RENAMINGisTrue, leaving legacy backends unchanged.__get_dataclass_inherited_infoearly-outs whenmodel.SUPPORTS_KW_ONLYis false, applying the dataclass field-order fix only to kw-only-capable models.All flags are properly defined on backend classes (False by default in base.py) and correctly set for each backend: Pydantic v2 supports all four, dataclass supports discriminators and kw-only, msgspec and Pydantic v1 support discriminators.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/datamodel_code_generator/parser/base.py (1)
2547-2590: Well-executed refactoring with improved modularity.The extraction of helper methods (
_prepare_parse_config,_build_module_structure,_process_single_module,_finalize_modules,_generate_module_output,_generate_empty_init_exports) significantly improves code organization by separating concerns into focused, testable units while the mainparsemethod now serves as a clear orchestrator.🔎 Optional cleanup: Remove unused noqa directives
Static analysis reports several unused
noqadirectives that can be safely removed:
- Line 2547: Remove
# noqa: PLR0913, PLR0917- Line 2549: Remove
# noqa: FBT001- Line 2550: Remove
# noqa: FBT001- Line 2552: Remove
# noqa: FBT001- Line 2659: Remove
# noqa: PLR0913, PLR0917- Line 2749: Remove
# noqa: PLR0913, PLR0917- Line 2843: Remove
# noqa: PLR0913, PLR0914, PLR0917- Line 2845: Remove
# noqa: FBT001, FBT002- Line 2846: Remove
# noqa: FBT001, FBT002- Line 2848: Remove
# noqa: FBT001, FBT002These directives are no longer necessary as the refactored methods don't trigger the corresponding linting rules.
Also applies to: 2592-2657, 2659-2719, 2721-2747, 2749-2814, 2816-2841, 2843-2923
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
src/datamodel_code_generator/imports.pysrc/datamodel_code_generator/model/base.pysrc/datamodel_code_generator/model/dataclass.pysrc/datamodel_code_generator/model/msgspec.pysrc/datamodel_code_generator/model/pydantic/base_model.pysrc/datamodel_code_generator/model/pydantic_v2/base_model.pysrc/datamodel_code_generator/parser/base.pysrc/datamodel_code_generator/types.py
🚧 Files skipped from review as they are similar to previous changes (4)
- src/datamodel_code_generator/model/pydantic_v2/base_model.py
- src/datamodel_code_generator/model/msgspec.py
- src/datamodel_code_generator/imports.py
- src/datamodel_code_generator/model/base.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/datamodel_code_generator/parser/base.py (4)
src/datamodel_code_generator/model/base.py (6)
DataModel(470-760)imports(283-302)imports(642-647)field(333-335)path(739-741)name(658-660)src/datamodel_code_generator/imports.py (7)
Imports(41-196)append(74-89)remove(91-123)remove_unused(175-196)add_export(146-150)dump_all(152-169)extract_future(130-144)src/datamodel_code_generator/format.py (2)
CodeFormatter(162-332)format_code(272-291)src/datamodel_code_generator/__init__.py (3)
ModuleSplitMode(303-309)AllExportsScope(249-257)AllExportsCollisionStrategy(260-270)
🪛 Ruff (0.14.8)
src/datamodel_code_generator/parser/base.py
2547-2547: Unused noqa directive (non-enabled: PLR0913, PLR0917)
Remove unused noqa directive
(RUF100)
2549-2549: Unused noqa directive (non-enabled: FBT001)
Remove unused noqa directive
(RUF100)
2550-2550: Unused noqa directive (non-enabled: FBT001)
Remove unused noqa directive
(RUF100)
2552-2552: Unused noqa directive (non-enabled: FBT001)
Remove unused noqa directive
(RUF100)
2659-2659: Unused noqa directive (non-enabled: PLR0913, PLR0917)
Remove unused noqa directive
(RUF100)
2749-2749: Unused noqa directive (non-enabled: PLR0913, PLR0917)
Remove unused noqa directive
(RUF100)
2843-2843: Unused noqa directive (non-enabled: PLR0913, PLR0914, PLR0917)
Remove unused noqa directive
(RUF100)
2845-2845: Unused noqa directive (non-enabled: FBT001, FBT002)
Remove unused noqa directive
(RUF100)
2846-2846: Unused noqa directive (non-enabled: FBT001, FBT002)
Remove unused noqa directive
(RUF100)
2848-2848: Unused noqa directive (non-enabled: FBT001, FBT002)
Remove unused noqa directive
(RUF100)
🔇 Additional comments (7)
src/datamodel_code_generator/model/dataclass.py (1)
45-46: LGTM! Feature flags added for capability detection.The addition of
SUPPORTS_DISCRIMINATORandSUPPORTS_KW_ONLYflags enables the parser to detect dataclass capabilities at runtime, allowing the code generation pipeline to conditionally enable discriminator handling and kw_only field support based on the target model type.src/datamodel_code_generator/types.py (1)
432-448: LGTM! Well-implemented visitor pattern with cycle protection.The
walkmethod provides a clean top-down traversal of the DataType tree with proper cycle detection. The traversal order (self → data_types → dict_key) correctly implements the visitor pattern by visiting parent nodes before children, which differs intentionally from the existingall_data_typesiterator that yields children before parents.src/datamodel_code_generator/model/pydantic/base_model.py (1)
331-331: LGTM! Discriminator support flag added.The
SUPPORTS_DISCRIMINATORflag enables the parser to detect Pydantic v1 BaseModel's discriminator capabilities, consistent with the feature flag pattern introduced across other model types.src/datamodel_code_generator/parser/base.py (4)
104-129: LGTM! Well-designed abstractions for module processing.The new type aliases (
ModulePath,ModuleModels,ForwarderMap) and NamedTuples (ModuleContext,ParseConfig) provide clear, type-safe abstractions that improve code organization and make the refactored parsing pipeline easier to understand and maintain.
1271-1274: LGTM! Proper capability check for discriminator support.The addition of the
SUPPORTS_DISCRIMINATORcheck correctly gates discriminator type application based on the model's capabilities, preventing the feature from being applied to model types that don't support it.
1767-1767: LGTM! Capability checks correctly guard feature application.The capability flag checks at lines 1767 (
SUPPORTS_WRAPPED_DEFAULT), 1843 (SUPPORTS_FIELD_RENAMING), and 1901 (SUPPORTS_KW_ONLY) properly gate their respective features based on model type capabilities, ensuring features are only applied to compatible model types.Also applies to: 1843-1843, 1901-1901
2238-2256: LGTM! Excellent use of the new walk method.The
_collect_used_names_from_modelsmethod effectively uses the newDataType.walkmethod (line 2255) to collect all referenced names, replacing manual traversal with a cleaner, more maintainable visitor pattern.
Summary by CodeRabbit
New Features
Refactor
✏️ Tip: You can customize this high-level summary in your review settings.