Add automatic handling of unserializable types in --input-model by koxudaxi · Pull Request #2851 · koxudaxi/datamodel-code-generator

koxudaxi · 2025-12-29T04:23:45Z

Summary by CodeRabbit

New Features
- Enhanced input-model feature to properly preserve and serialize complex Python typing annotations, including Callable signatures and Type references, in generated schemas.
Tests
- Added extensive test coverage for various Callable patterns, Type fields, nested callables, and union types in input-model processing.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-29T04:23:52Z

📝 Walkthrough

Walkthrough

This pull request introduces a comprehensive mechanism to preserve and serialize complex Python typing information (including Callable, Type, Union, and generics) into JSON Schema via x-python-type annotations for Pydantic v2 models. It adds schema post-processing helpers, type override detection logic, and test coverage for edge cases involving Callable and custom types.

Changes

Cohort / File(s)	Summary
Schema Serialization & Post-Processing `src/datamodel_code_generator/__main__.py`	Introduces _UNSERIALIZABLE_MARKER, serialization helpers (_serialize_python_type_full, _serialize_callable, _is_callable_origin, _is_type_origin, etc.), lazy-initialized InputModelJsonSchema class, and post-processing flow via _load_model_schema to annotate x-python-type for unserializable types in JSON schemas.
Type Override & Compatibility Logic `src/datamodel_code_generator/parser/jsonschema.py`	Adds "Type" to PYTHON_TYPE_IMPORTS, introduces PYTHON_TYPE_OVERRIDE_ALWAYS set with {"Callable", "Type"}, extends _is_compatible_python_type to check override requirements, adds _extract_all_type_names and _get_python_type_override to compute DataType overrides for incompatible types, and updates parse_item to apply overrides.
Test Model Definitions `tests/data/python/input_model/pydantic_models.py`	Adds six new Pydantic v2 models: ModelWithCallableTypes, NestedCallableModel, ModelWithNestedCallable, CustomClass, ModelWithCustomClass, and ModelWithUnionCallable to cover Callable signatures, Type fields, nested callables, and custom type handling.
Expected Output Update `tests/data/expected/main/jsonschema/x_python_type_no_schema_type.py`	Updates callback field type annotation from NotRequired[Any] to NotRequired[Callable[[str], str]] and replaces typing.Any import with collections.abc.Callable.
Test Coverage `tests/test_input_model.py`	Adds ten new test functions (test_input_model_callable_basic, test_input_model_callable_multi_param, test_input_model_variadic, test_input_model_no_param, test_input_model_callable_optional, test_input_model_type_field, test_input_model_nested_callable, test_input_model_nested_model_with_callable, test_input_model_custom_class, test_input_model_union_callable), all marked to skip on Pydantic v1.

Sequence Diagram(s)

sequenceDiagram
    participant Input as Input Model <br/>(Pydantic v2)
    participant Generator as Schema Generator <br/>(Generator)
    participant Processor as Post-Processor <br/>(x-python-type Annotator)
    participant Parser as Parser <br/>(Override Check)
    participant Output as Generated <br/>Schema/DataType

    Input->>Generator: model_json_schema() via custom generator
    Generator->>Processor: raw JSON schema
    
    rect rgb(200, 220, 255)
    Note over Processor: _add_python_type_for_unserializable
    Processor->>Processor: traverse $defs & properties
    Processor->>Processor: detect unserializable types (Callable, Type, Union)
    Processor->>Processor: mark with _UNSERIALIZABLE_MARKER
    end
    
    Processor->>Processor: _add_python_type_info post-processing
    Processor->>Output: annotated schema (x-python-type fields)
    
    rect rgb(220, 255, 220)
    Note over Parser: During parsing
    Parser->>Parser: _is_compatible_python_type check
    alt Type in PYTHON_TYPE_OVERRIDE_ALWAYS
        Parser->>Parser: _get_python_type_override
        Parser->>Parser: build DataType override with imports
        Parser->>Output: return override DataType
    else Compatible type
        Parser->>Output: standard type resolution
    end
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Add --input-model-ref-strategy option for controlling type reuse #2850 — Modifies _load_model_schema and JSON Schema post-processing in main.py to annotate x-python-type/x-python-import for special Python types.
Support incompatible Python types in x-python-type extension #2841 — Adds x-python-type override and compatibility logic (PYTHON_TYPE_OVERRIDE_ALWAYS, _get_python_type_override) to jsonschema.py parser.
Preserve Python types (Set, Mapping, Sequence) in --input-model #2837 — Extends the input-model JSON Schema preservation flow to serialize Python-origin types into x-python-type annotations.

Suggested labels

breaking-change-analyzed

Poem

🐰 Twitching whiskers with joy!
Callables now dance through JSON's embrace,
Type[] fields preserved without a trace,
Unserializable types find their place—
Schema marshals complex typing grace! ✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: adding automatic handling of unserializable types (like Callable and Type) in the --input-model feature.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/input-model-callable-type-handling

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2025-12-29T04:24:32Z

📚 Docs Preview: https://pr-2851.datamodel-code-generator.pages.dev

codspeed-hq · 2025-12-29T04:26:47Z

CodSpeed Performance Report

Merging #2851 will degrade performance by 17.32%

_{Comparing feature/input-model-callable-type-handling (fa4acb5) with main (055f8ed)}

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

Summary

❌ 11 regressions
⏩ 98 skipped¹

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
❌	WallTime	`test_perf_multiple_files_input`	3.2 s	3.7 s	-15.58%
❌	WallTime	`test_perf_deep_nested`	5.2 s	6.3 s	-16.45%
❌	WallTime	`test_perf_complex_refs`	1.8 s	2.1 s	-15.67%
❌	WallTime	`test_perf_all_options_enabled`	5.7 s	6.7 s	-15.09%
❌	WallTime	`test_perf_duplicate_names`	865.7 ms	1,032.4 ms	-16.15%
❌	WallTime	`test_perf_kubernetes_style_pydantic_v2`	2.3 s	2.7 s	-15.93%
❌	WallTime	`test_perf_stripe_style_pydantic_v2`	1.8 s	2.1 s	-15.88%
❌	WallTime	`test_perf_openapi_large`	2.5 s	3 s	-16.57%
❌	WallTime	`test_perf_graphql_style_pydantic_v2`	715.7 ms	846.2 ms	-15.42%
❌	WallTime	`test_perf_aws_style_openapi_pydantic_v2`	1.7 s	2 s	-16.21%
❌	WallTime	`test_perf_large_models_pydantic_v2`	3.1 s	3.8 s	-17.32%

98 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

codecov · 2025-12-29T04:27:59Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.49%. Comparing base (055f8ed) to head (fa4acb5).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2851      +/-   ##
==========================================
- Coverage   99.50%   99.49%   -0.01%     
==========================================
  Files          90       90              
  Lines       14605    14740     +135     
  Branches     1748     1771      +23     
==========================================
+ Hits        14533    14666     +133     
- Misses         37       38       +1     
- Partials       35       36       +1

Flag	Coverage Δ
unittests	`99.49% <100.00%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

src/datamodel_code_generator/parser/jsonschema.py (1)
2732-2739: Drop unused noqa codes on parse_item definition

Ruff reports # noqa: PLR0911, PLR0912, PLR0914 here as unused (those rules aren’t enabled), causing RUF100. You can simply remove the directive to silence the warning while keeping behavior unchanged.
Proposed clean-up
-    def parse_item(  # noqa: PLR0911, PLR0912, PLR0914
+    def parse_item(
         self,
         name: str,
         item: JsonSchemaObject,
src/datamodel_code_generator/__main__.py (2)
602-815: Unserializable-type preservation pipeline is solid; add a small guard for items

The new marker-based flow (_UNSERIALIZABLE_MARKER, _serialize_python_type_full, _process_unserializable_property, _add_python_type_for_unserializable) cleanly annotates Pydantic v2 schemas with x-python-type for otherwise-unserializable annotations (Callable, Type, custom classes, nested generics) and aligns with the parser’s new override logic.

One defensive improvement:
In _process_unserializable_property, the items branch assumes prop["items"] is a dict:
elif "items" in prop and prop["items"].get(_UNSERIALIZABLE_MARKER):
    prop["x-python-type"] = _serialize_python_type_full(annotation)
    prop["items"].pop(_UNSERIALIZABLE_MARKER, None)
JSON Schema allows items to be a list; guarding with isinstance(prop.get("items"), dict) avoids a potential AttributeError if Pydantic ever emits a non-dict items with the marker.
Proposed defensive fix for the items branch
-    elif "items" in prop and prop["items"].get(_UNSERIALIZABLE_MARKER):
-        prop["x-python-type"] = _serialize_python_type_full(annotation)
-        prop["items"].pop(_UNSERIALIZABLE_MARKER, None)
+    elif isinstance(prop.get("items"), dict) and prop["items"].get(_UNSERIALIZABLE_MARKER):
+        prop["x-python-type"] = _serialize_python_type_full(annotation)
+        prop["items"].pop(_UNSERIALIZABLE_MARKER, None)
606-707: Clean up unused noqa directives on new helpers

Ruff flags several of the new helpers for unused # noqa directives (e.g. PLR0911, PLC0415, PLR6301), resulting in RUF100. Since these rules aren’t enabled in your config, the suppressions are unnecessary and can be dropped without changing behavior.

Examples include:

Line 606: # noqa: PLR0911 on _serialize_python_type_full

Lines 618, 619, 665, 707, 738, 765: # noqa: PLC0415 / # noqa: PLR6301 on local imports and methods

You can either remove these comments or enable the corresponding rules in Ruff; removing them is simplest.
Illustrative clean-up (subset)
-def _serialize_python_type_full(tp: type) -> str:  # noqa: PLR0911
+def _serialize_python_type_full(tp: type) -> str:
@@
-    import types  # noqa: PLC0415
-    from typing import Union, get_args, get_origin  # noqa: PLC0415
+    import types
+    from typing import Union, get_args, get_origin
@@
-    from collections.abc import Callable as ABCCallable  # noqa: PLC0415
+    from collections.abc import Callable as ABCCallable
@@
-    from pydantic.json_schema import GenerateJsonSchema  # noqa: PLC0415
+    from pydantic.json_schema import GenerateJsonSchema
@@
-        def handle_invalid_for_json_schema(  # noqa: PLR6301
+        def handle_invalid_for_json_schema(
@@
-        def callable_schema(  # noqa: PLR6301
+        def callable_schema(
@@
-    from typing import get_origin  # noqa: PLC0415
+    from typing import get_origin
@@
-    from typing import Union, get_args, get_origin  # noqa: PLC0415
+    from typing import Union, get_args, get_origin
(Apply similarly to the remaining new helpers.)
Also applies to: 712-723, 738-742, 765-765

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 055f8ed and fa4acb5.

📒 Files selected for processing (5)

src/datamodel_code_generator/__main__.py
src/datamodel_code_generator/parser/jsonschema.py
tests/data/expected/main/jsonschema/x_python_type_no_schema_type.py
tests/data/python/input_model/pydantic_models.py
tests/test_input_model.py

🧰 Additional context used

🧬 Code graph analysis (1)

src/datamodel_code_generator/__main__.py (3)

src/datamodel_code_generator/model/base.py (1)

name (827-829)

src/datamodel_code_generator/reference.py (2)

get (983-985)

add (906-981)

src/datamodel_code_generator/parser/base.py (1)

add (2468-2471)

🪛 Ruff (0.14.10)

src/datamodel_code_generator/__main__.py

606-606: Unused noqa directive (non-enabled: PLR0911)