Add automatic handling of unserializable types in --input-model#2851
Add automatic handling of unserializable types in --input-model#2851
Conversation
📝 WalkthroughWalkthroughThis pull request introduces a comprehensive mechanism to preserve and serialize complex Python typing information (including Callable, Type, Union, and generics) into JSON Schema via x-python-type annotations for Pydantic v2 models. It adds schema post-processing helpers, type override detection logic, and test coverage for edge cases involving Callable and custom types. Changes
Sequence Diagram(s)sequenceDiagram
participant Input as Input Model <br/>(Pydantic v2)
participant Generator as Schema Generator <br/>(Generator)
participant Processor as Post-Processor <br/>(x-python-type Annotator)
participant Parser as Parser <br/>(Override Check)
participant Output as Generated <br/>Schema/DataType
Input->>Generator: model_json_schema() via custom generator
Generator->>Processor: raw JSON schema
rect rgb(200, 220, 255)
Note over Processor: _add_python_type_for_unserializable
Processor->>Processor: traverse $defs & properties
Processor->>Processor: detect unserializable types (Callable, Type, Union)
Processor->>Processor: mark with _UNSERIALIZABLE_MARKER
end
Processor->>Processor: _add_python_type_info post-processing
Processor->>Output: annotated schema (x-python-type fields)
rect rgb(220, 255, 220)
Note over Parser: During parsing
Parser->>Parser: _is_compatible_python_type check
alt Type in PYTHON_TYPE_OVERRIDE_ALWAYS
Parser->>Parser: _get_python_type_override
Parser->>Parser: build DataType override with imports
Parser->>Output: return override DataType
else Compatible type
Parser->>Output: standard type resolution
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
📚 Docs Preview: https://pr-2851.datamodel-code-generator.pages.dev |
CodSpeed Performance ReportMerging #2851 will degrade performance by 17.32%Comparing
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | WallTime | test_perf_multiple_files_input |
3.2 s | 3.7 s | -15.58% |
| ❌ | WallTime | test_perf_deep_nested |
5.2 s | 6.3 s | -16.45% |
| ❌ | WallTime | test_perf_complex_refs |
1.8 s | 2.1 s | -15.67% |
| ❌ | WallTime | test_perf_all_options_enabled |
5.7 s | 6.7 s | -15.09% |
| ❌ | WallTime | test_perf_duplicate_names |
865.7 ms | 1,032.4 ms | -16.15% |
| ❌ | WallTime | test_perf_kubernetes_style_pydantic_v2 |
2.3 s | 2.7 s | -15.93% |
| ❌ | WallTime | test_perf_stripe_style_pydantic_v2 |
1.8 s | 2.1 s | -15.88% |
| ❌ | WallTime | test_perf_openapi_large |
2.5 s | 3 s | -16.57% |
| ❌ | WallTime | test_perf_graphql_style_pydantic_v2 |
715.7 ms | 846.2 ms | -15.42% |
| ❌ | WallTime | test_perf_aws_style_openapi_pydantic_v2 |
1.7 s | 2 s | -16.21% |
| ❌ | WallTime | test_perf_large_models_pydantic_v2 |
3.1 s | 3.8 s | -17.32% |
Footnotes
-
98 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2851 +/- ##
==========================================
- Coverage 99.50% 99.49% -0.01%
==========================================
Files 90 90
Lines 14605 14740 +135
Branches 1748 1771 +23
==========================================
+ Hits 14533 14666 +133
- Misses 37 38 +1
- Partials 35 36 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
8835d01 to
f4f4d11
Compare
f4f4d11 to
fa4acb5
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
src/datamodel_code_generator/parser/jsonschema.py (1)
2732-2739: Drop unusednoqacodes onparse_itemdefinitionRuff reports
# noqa: PLR0911, PLR0912, PLR0914here as unused (those rules aren’t enabled), causingRUF100. You can simply remove the directive to silence the warning while keeping behavior unchanged.Proposed clean-up
- def parse_item( # noqa: PLR0911, PLR0912, PLR0914 + def parse_item( self, name: str, item: JsonSchemaObject,src/datamodel_code_generator/__main__.py (2)
602-815: Unserializable-type preservation pipeline is solid; add a small guard foritemsThe new marker-based flow (
_UNSERIALIZABLE_MARKER,_serialize_python_type_full,_process_unserializable_property,_add_python_type_for_unserializable) cleanly annotates Pydantic v2 schemas withx-python-typefor otherwise-unserializable annotations (Callable, Type, custom classes, nested generics) and aligns with the parser’s new override logic.One defensive improvement:
In
_process_unserializable_property, theitemsbranch assumesprop["items"]is a dict:elif "items" in prop and prop["items"].get(_UNSERIALIZABLE_MARKER): prop["x-python-type"] = _serialize_python_type_full(annotation) prop["items"].pop(_UNSERIALIZABLE_MARKER, None)JSON Schema allows
itemsto be a list; guarding withisinstance(prop.get("items"), dict)avoids a potentialAttributeErrorif Pydantic ever emits a non-dictitemswith the marker.Proposed defensive fix for the
itemsbranch- elif "items" in prop and prop["items"].get(_UNSERIALIZABLE_MARKER): - prop["x-python-type"] = _serialize_python_type_full(annotation) - prop["items"].pop(_UNSERIALIZABLE_MARKER, None) + elif isinstance(prop.get("items"), dict) and prop["items"].get(_UNSERIALIZABLE_MARKER): + prop["x-python-type"] = _serialize_python_type_full(annotation) + prop["items"].pop(_UNSERIALIZABLE_MARKER, None)
606-707: Clean up unusednoqadirectives on new helpersRuff flags several of the new helpers for unused
# noqadirectives (e.g.PLR0911,PLC0415,PLR6301), resulting inRUF100. Since these rules aren’t enabled in your config, the suppressions are unnecessary and can be dropped without changing behavior.Examples include:
- Line 606:
# noqa: PLR0911on_serialize_python_type_full- Lines 618, 619, 665, 707, 738, 765:
# noqa: PLC0415/# noqa: PLR6301on local imports and methodsYou can either remove these comments or enable the corresponding rules in Ruff; removing them is simplest.
Illustrative clean-up (subset)
-def _serialize_python_type_full(tp: type) -> str: # noqa: PLR0911 +def _serialize_python_type_full(tp: type) -> str: @@ - import types # noqa: PLC0415 - from typing import Union, get_args, get_origin # noqa: PLC0415 + import types + from typing import Union, get_args, get_origin @@ - from collections.abc import Callable as ABCCallable # noqa: PLC0415 + from collections.abc import Callable as ABCCallable @@ - from pydantic.json_schema import GenerateJsonSchema # noqa: PLC0415 + from pydantic.json_schema import GenerateJsonSchema @@ - def handle_invalid_for_json_schema( # noqa: PLR6301 + def handle_invalid_for_json_schema( @@ - def callable_schema( # noqa: PLR6301 + def callable_schema( @@ - from typing import get_origin # noqa: PLC0415 + from typing import get_origin @@ - from typing import Union, get_args, get_origin # noqa: PLC0415 + from typing import Union, get_args, get_origin(Apply similarly to the remaining new helpers.)
Also applies to: 712-723, 738-742, 765-765
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
src/datamodel_code_generator/__main__.pysrc/datamodel_code_generator/parser/jsonschema.pytests/data/expected/main/jsonschema/x_python_type_no_schema_type.pytests/data/python/input_model/pydantic_models.pytests/test_input_model.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/datamodel_code_generator/__main__.py (3)
src/datamodel_code_generator/model/base.py (1)
name(827-829)src/datamodel_code_generator/reference.py (2)
get(983-985)add(906-981)src/datamodel_code_generator/parser/base.py (1)
add(2468-2471)
🪛 Ruff (0.14.10)
src/datamodel_code_generator/__main__.py
606-606: Unused noqa directive (non-enabled: PLR0911)
Remove unused noqa directive
(RUF100)
618-618: Unused noqa directive (non-enabled: PLC0415)
Remove unused noqa directive
(RUF100)
619-619: Unused noqa directive (non-enabled: PLC0415)
Remove unused noqa directive
(RUF100)
665-665: Unused noqa directive (non-enabled: PLC0415)
Remove unused noqa directive
(RUF100)
707-707: Unused noqa directive (non-enabled: PLC0415)
Remove unused noqa directive
(RUF100)
712-712: Unused noqa directive (non-enabled: PLR6301)
Remove unused noqa directive
(RUF100)
723-723: Unused noqa directive (non-enabled: PLR6301)
Remove unused noqa directive
(RUF100)
738-738: Unused noqa directive (non-enabled: PLC0415)
Remove unused noqa directive
(RUF100)
765-765: Unused noqa directive (non-enabled: PLC0415)
Remove unused noqa directive
(RUF100)
src/datamodel_code_generator/parser/jsonschema.py
2732-2732: Unused noqa directive (non-enabled: PLR0911, PLR0912, PLR0914)
Remove unused noqa directive
(RUF100)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: 3.14 on Windows
- GitHub Check: 3.10 on Windows
- GitHub Check: 3.13 on Windows
- GitHub Check: 3.12 on Windows
- GitHub Check: 3.10 on macOS
- GitHub Check: 3.11 on Windows
- GitHub Check: 3.12 on macOS
- GitHub Check: 3.14 on macOS
- GitHub Check: 3.11 on macOS
- GitHub Check: 3.13 on macOS
- GitHub Check: Analyze (python)
- GitHub Check: benchmarks
🔇 Additional comments (8)
tests/data/python/input_model/pydantic_models.py (2)
5-6: LGTM! Proper imports for Callable and Type support.The addition of
Callablefromcollections.abcandAny, Typefromtypingcorrectly supports the new test models. Usingcollections.abc.Callablefollows modern Python best practices.
44-88: Excellent test model coverage for Callable and unserializable types.The new test models comprehensively cover edge cases:
- Various Callable signatures (single/multi-param, variadic, no-param)
- Optional and Union combinations with Callable
- Nested structures (
list[Callable[[str], int]])- Cross-model references for
$defsprocessing- Custom arbitrary types with
model_configThe models are well-structured, properly documented, and follow Pydantic v2 conventions.
tests/test_input_model.py (1)
609-723: Excellent comprehensive test coverage for Callable and unserializable types.The new test suite thoroughly validates the preservation of complex type annotations:
✅ Comprehensive coverage:
- Various Callable signatures (multi-param, variadic, no-param, optional)
- Nested structures (
list[Callable[[str], int]])- Union combinations with Callable
- Type[BaseModel] handling
- Custom arbitrary types
- Cross-model references for
$defsprocessing✅ Best practices:
- All tests properly gated with
SKIP_PYDANTIC_V1- Clear, descriptive docstrings
- Consistent use of helper functions
- Appropriate assertions for expected output
- Follows established test patterns
The test structure is well-organized and maintainable.
tests/data/expected/main/jsonschema/x_python_type_no_schema_type.py (2)
14-14: Excellent improvement in type preservation!The change from
NotRequired[Any]toNotRequired[Callable[[str], str]]successfully preserves the specific callable signature instead of falling back to a genericAnytype. This aligns perfectly with the PR objective of handling unserializable types and provides better type safety for code using this generated model.
7-8: No changes needed. The use ofcollections.abc.Callableis correct for this project, which targets Python >=3.10. PEP 585 generics likecollections.abc.Callableare available in Python 3.9+, so there is no version compatibility issue.Likely an incorrect or invalid review comment.
src/datamodel_code_generator/parser/jsonschema.py (2)
560-580: Callable/Type override wiring and imports look correctAdding
"Type"toPYTHON_TYPE_IMPORTSand introducingPYTHON_TYPE_OVERRIDE_ALWAYS = {"Callable", "Type"}cleanly aligns the parser with the newx-python-typeproducer:Type[...]andCallable[...]are now always routed through the override path with the right imports. No functional issues spotted here.
1354-1411: x-python-type override logic for Callable/Type is robustThe combination of
_is_compatible_python_type,_extract_all_type_names, and_get_python_type_overridecorrectly:
- Forces override when
Callable/Typeappear at the top level or nested insideUnion/Optional.- Handles fully-qualified names by stripping module prefixes and constructing appropriate
Importobjects.- Adds nested imports for inner ABCs (e.g.,
IterableinCallable[[Iterable[str]], str]) without disturbing JSON Schema–driven typing for other cases.This should give the parser exactly the extra information produced by the new input-model schema generator without regressing existing path-based type resolution.
src/datamodel_code_generator/__main__.py (1)
1154-1167: Pydantic v2 schema generator customization integrates correctly; keep version compatibility in mindSwitching the Pydantic BaseModel path in
_load_model_schemato:
- Use a custom
GenerateJsonSchemasubclass viaschema_generator=_get_input_model_json_schema_class(), and- Post-process with
_add_python_type_for_unserializablebefore the existing_add_python_type_infois a good way to preserve full Python typing information (especially for Callable and Type) for
--input-model.Because this relies on Pydantic’s
model_json_schema(schema_generator=...)hook and specific generator method names (handle_invalid_for_json_schema,callable_schema), it’s worth ensuring your test matrix covers the supported Pydantic v2 range so that signature or behavior changes won’t silently regress this flow.
|
🎉 Released in 0.51.0 This PR is now available in the latest release. See the release notes for details. |
Summary by CodeRabbit
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.