Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions docs/cli-reference/base-options.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
| [`--input`](#input) | Specify the input schema file path. |
| [`--input-file-type`](#input-file-type) | Specify the input file type for code generation. |
| [`--input-model`](#input-model) | Import a Python type or dict schema from a module. |
| [`--input-model-ref-strategy`](#input-model-ref-strategy) | Strategy for referenced types when using --input-model. |
| [`--output`](#output) | Specify the destination path for generated Python code. |
| [`--url`](#url) | Fetch schema from URL with custom HTTP headers. |

Expand Down Expand Up @@ -241,6 +242,30 @@ Use the format `module:Object` or `path/to/file.py:Object` to specify the type.

---

## `--input-model-ref-strategy` {#input-model-ref-strategy}

Strategy for referenced types when using --input-model.

The `--input-model-ref-strategy` option determines whether to regenerate or import
referenced types. Use `regenerate-all` (default) to regenerate all types,
`reuse-foreign` to import types from different families (like enums when generating
dataclasses) while regenerating same-family types, or `reuse-all` to import all
referenced types directly.

!!! tip "Usage"

```bash
datamodel-codegen --input schema.json --input-model-ref-strategy reuse-foreign # (1)!
```

1. :material-arrow-left: `--input-model-ref-strategy` - the option documented here

??? example "Examples"

**Output:**

---

## `--output` {#output}

Specify the destination path for generated Python code.
Expand Down
3 changes: 2 additions & 1 deletion docs/cli-reference/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ This documentation is auto-generated from test cases.

| Category | Options | Description |
|----------|---------|-------------|
| 📁 [Base Options](base-options.md) | 6 | Input/output configuration |
| 📁 [Base Options](base-options.md) | 7 | Input/output configuration |
| 🔧 [Typing Customization](typing-customization.md) | 26 | Type annotation and import behavior |
| 🏷️ [Field Customization](field-customization.md) | 22 | Field naming and docstring behavior |
| 🏗️ [Model Customization](model-customization.md) | 36 | Model generation behavior |
Expand Down Expand Up @@ -107,6 +107,7 @@ This documentation is auto-generated from test cases.
- [`--input`](base-options.md#input)
- [`--input-file-type`](base-options.md#input-file-type)
- [`--input-model`](base-options.md#input-model)
- [`--input-model-ref-strategy`](base-options.md#input-model-ref-strategy)

### K {#k}

Expand Down
2 changes: 2 additions & 0 deletions docs/cli-reference/quick-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ datamodel-codegen [OPTIONS]
| [`--input`](base-options.md#input) | Specify the input schema file path. |
| [`--input-file-type`](base-options.md#input-file-type) | Specify the input file type for code generation. |
| [`--input-model`](base-options.md#input-model) | Import a Python type or dict schema from a module. |
| [`--input-model-ref-strategy`](base-options.md#input-model-ref-strategy) | Strategy for referenced types when using --input-model. |
| [`--output`](base-options.md#output) | Specify the destination path for generated Python code. |
| [`--url`](base-options.md#url) | Fetch schema from URL with custom HTTP headers. |

Expand Down Expand Up @@ -252,6 +253,7 @@ All options sorted alphabetically:
- [`--input`](base-options.md#input) - Specify the input schema file path.
- [`--input-file-type`](base-options.md#input-file-type) - Specify the input file type for code generation.
- [`--input-model`](base-options.md#input-model) - Import a Python type or dict schema from a module.
- [`--input-model-ref-strategy`](base-options.md#input-model-ref-strategy) - Strategy for referenced types when using --input-model.
- [`--keep-model-order`](model-customization.md#keep-model-order) - Keep model definition order as specified in schema.
- [`--keyword-only`](model-customization.md#keyword-only) - Generate dataclasses with keyword-only fields (Python 3.10+)...
- [`--model-extra-keys`](model-customization.md#model-extra-keys) - Add model-level schema extensions to ConfigDict json_schema_...
Expand Down
2 changes: 2 additions & 0 deletions src/datamodel_code_generator/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@
FieldTypeCollisionStrategy,
GraphQLScope,
InputFileType,
InputModelRefStrategy,
ModuleSplitMode,
NamingStrategy,
OpenAPIScope,
Expand Down Expand Up @@ -1055,6 +1056,7 @@ def infer_input_type(text: str) -> InputFileType:
"Error",
"GeneratedModules",
"InputFileType",
"InputModelRefStrategy",
"InvalidClassNameError",
"InvalidFileFormatError",
"LiteralType",
Expand Down
137 changes: 122 additions & 15 deletions src/datamodel_code_generator/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@
Error,
FieldTypeCollisionStrategy,
InputFileType,
InputModelRefStrategy,
InvalidClassNameError,
ModuleSplitMode,
NamingStrategy,
Expand Down Expand Up @@ -443,6 +444,7 @@ def validate_all_exports_collision_strategy(cls, values: dict[str, Any]) -> dict

input: Optional[Union[Path, str]] = None # noqa: UP007, UP045
input_model: Optional[str] = None # noqa: UP045
input_model_ref_strategy: Optional[InputModelRefStrategy] = None # noqa: UP045
input_file_type: InputFileType = InputFileType.Auto
output_model_type: DataModelType = DataModelType.PydanticBaseModel
output: Optional[Path] = None # noqa: UP045
Expand Down Expand Up @@ -682,7 +684,7 @@ def _simple_type_name(tp: type) -> str:


def _collect_nested_models(model: type, visited: set[type] | None = None) -> dict[str, type]:
"""Collect all nested BaseModel subclasses from a model's fields."""
"""Collect all nested types (BaseModel, Enum, dataclass) from a model's fields."""
if visited is None:
visited = set()

Expand All @@ -691,24 +693,32 @@ def _collect_nested_models(model: type, visited: set[type] | None = None) -> dic
visited.add(model)

result: dict[str, type] = {}
model_fields = getattr(model, "model_fields", None)
if model_fields is None: # pragma: no cover
return result

for field_info in model_fields.values():
tp = field_info.annotation
_find_models_in_type(tp, result, visited)
model_fields = getattr(model, "model_fields", None)
if model_fields is not None:
for field_info in model_fields.values():
tp = field_info.annotation
_find_models_in_type(tp, result, visited)
else:
type_hints = _get_type_hints_safe(model)
for tp in type_hints.values():
_find_models_in_type(tp, result, visited)

return result


def _find_models_in_type(tp: type, result: dict[str, type], visited: set[type]) -> None:
"""Recursively find BaseModel subclasses in a type annotation."""
"""Recursively find BaseModel subclasses, Enums, and dataclasses in a type annotation."""
from dataclasses import is_dataclass # noqa: PLC0415
from enum import Enum as PyEnum # noqa: PLC0415
from typing import get_args # noqa: PLC0415

if isinstance(tp, type) and issubclass(tp, BaseModel) and tp not in visited:
result[tp.__name__] = tp
result.update(_collect_nested_models(tp, visited))
if isinstance(tp, type) and tp not in visited:
if issubclass(tp, BaseModel):
result[tp.__name__] = tp
result.update(_collect_nested_models(tp, visited))
elif issubclass(tp, PyEnum) or is_dataclass(tp):
result[tp.__name__] = tp

for arg in get_args(tp):
_find_models_in_type(arg, result, visited)
Expand Down Expand Up @@ -776,15 +786,87 @@ def _add_python_type_info_generic(schema: dict[str, Any], obj: type) -> dict[str
return schema


def _load_model_schema( # noqa: PLR0912, PLR0915
_TYPE_FAMILY_ENUM = "enum"
_TYPE_FAMILY_PYDANTIC = "pydantic"
_TYPE_FAMILY_DATACLASS = "dataclass"
_TYPE_FAMILY_TYPEDDICT = "typeddict"
_TYPE_FAMILY_OTHER = "other"


def _get_type_family(tp: type) -> str:
"""Determine the type family of a Python type."""
from dataclasses import is_dataclass # noqa: PLC0415
from enum import Enum as PyEnum # noqa: PLC0415

if isinstance(tp, type) and issubclass(tp, PyEnum):
return _TYPE_FAMILY_ENUM

if isinstance(tp, type) and issubclass(tp, BaseModel):
return _TYPE_FAMILY_PYDANTIC

if hasattr(tp, "__pydantic_fields__") and is_dataclass(tp): # pragma: no cover
return _TYPE_FAMILY_PYDANTIC

if is_dataclass(tp):
return _TYPE_FAMILY_DATACLASS

if isinstance(tp, type) and hasattr(tp, "__required_keys__"):
return _TYPE_FAMILY_TYPEDDICT

return _TYPE_FAMILY_OTHER # pragma: no cover

Comment thread
coderabbitai[bot] marked this conversation as resolved.

def _filter_defs_by_strategy(
schema: dict[str, Any],
nested_models: dict[str, type],
input_model_family: str,
strategy: InputModelRefStrategy,
) -> dict[str, Any]:
"""Filter $defs based on ref strategy, marking reused types with x-python-import."""
if strategy == InputModelRefStrategy.RegenerateAll: # pragma: no cover
return schema

if "$defs" not in schema: # pragma: no cover
return schema

new_defs: dict[str, Any] = {}

for def_name, def_schema in schema["$defs"].items():
if def_name not in nested_models: # pragma: no cover
new_defs[def_name] = def_schema
continue

nested_type = nested_models[def_name]
type_family = _get_type_family(nested_type)

should_reuse = strategy == InputModelRefStrategy.ReuseAll or (
strategy == InputModelRefStrategy.ReuseForeign and type_family != input_model_family
)

if should_reuse:
new_defs[def_name] = {
"x-python-import": {
"module": nested_type.__module__,
"name": nested_type.__name__,
},
}
else:
new_defs[def_name] = def_schema

return {**schema, "$defs": new_defs}


def _load_model_schema( # noqa: PLR0912, PLR0914, PLR0915
input_model: str,
input_file_type: InputFileType,
ref_strategy: InputModelRefStrategy | None = None,
) -> dict[str, object]:
"""Load schema from a Python import path.

Args:
input_model: Import path in 'module.path:ObjectName' format
input_file_type: Current input file type setting for validation
ref_strategy: Strategy for handling referenced types

Returns:
Schema dict
Expand Down Expand Up @@ -856,7 +938,17 @@ def _load_model_schema( # noqa: PLR0912, PLR0915
msg = "--input-model with Pydantic model requires Pydantic v2 runtime. Please upgrade Pydantic to v2."
raise Error(msg)
schema = obj.model_json_schema()
return _add_python_type_info(schema, obj)
schema = _add_python_type_info(schema, obj)

if ref_strategy and ref_strategy != InputModelRefStrategy.RegenerateAll:
nested_models = _collect_nested_models(obj)
model_name = getattr(obj, "__name__", None)
if model_name and "$defs" in schema and model_name in schema["$defs"]: # pragma: no cover
nested_models[model_name] = obj
input_family = _get_type_family(obj)
schema = _filter_defs_by_strategy(schema, nested_models, input_family, ref_strategy)

return schema

# Check for dataclass or TypedDict - use TypeAdapter
from dataclasses import is_dataclass # noqa: PLC0415
Expand All @@ -874,11 +966,22 @@ def _load_model_schema( # noqa: PLR0912, PLR0915
from pydantic import TypeAdapter # noqa: PLC0415

schema = TypeAdapter(obj).json_schema()
return _add_python_type_info_generic(schema, cast("type", obj))
schema = _add_python_type_info_generic(schema, cast("type", obj))

if ref_strategy and ref_strategy != InputModelRefStrategy.RegenerateAll:
obj_type = cast("type", obj)
nested_models = _collect_nested_models(obj_type)
obj_name = getattr(obj, "__name__", None)
if obj_name and "$defs" in schema and obj_name in schema["$defs"]: # pragma: no cover
nested_models[obj_name] = obj_type
input_family = _get_type_family(obj_type)
schema = _filter_defs_by_strategy(schema, nested_models, input_family, ref_strategy)
except ImportError as e:
msg = "--input-model with dataclass/TypedDict requires Pydantic v2 runtime."
raise Error(msg) from e

return schema

msg = f"{qualname!r} is not a supported type. Supported: dict, Pydantic v2 BaseModel, dataclass, TypedDict"
raise Error(msg)

Expand Down Expand Up @@ -1466,7 +1569,11 @@ def main(args: Sequence[str] | None = None) -> Exit: # noqa: PLR0911, PLR0912,
try:
input_: Path | str | ParseResult
if config.input_model:
schema = _load_model_schema(config.input_model, config.input_file_type)
schema = _load_model_schema(
config.input_model,
config.input_file_type,
config.input_model_ref_strategy,
)
input_ = json.dumps(schema)
if config.input_file_type == InputFileType.Auto:
config.input_file_type = InputFileType.JsonSchema
Expand Down
11 changes: 11 additions & 0 deletions src/datamodel_code_generator/arguments.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
DataModelType,
FieldTypeCollisionStrategy,
InputFileType,
InputModelRefStrategy,
ModuleSplitMode,
NamingStrategy,
OpenAPIScope,
Expand Down Expand Up @@ -163,6 +164,16 @@ def start_section(self, heading: str | None) -> None:
"Cannot be used with --input or --url.",
metavar="MODULE:NAME",
)
base_options.add_argument(
"--input-model-ref-strategy",
help="Strategy for referenced types in --input-model. "
"'regenerate-all': Regenerate all types. "
"'reuse-foreign': Reuse types from different families (Enum, etc.), regenerate same-family. "
"'reuse-all': Reuse all referenced types via import. "
"If not specified, defaults to regenerate-all behavior.",
choices=[s.value for s in InputModelRefStrategy],
default=None,
)

# ======================================================================================
# Customization options for generated models
Expand Down
1 change: 1 addition & 0 deletions src/datamodel_code_generator/cli_options.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ class CLIOptionMeta:
"--output": CLIOptionMeta(name="--output", category=OptionCategory.BASE),
"--url": CLIOptionMeta(name="--url", category=OptionCategory.BASE),
"--input-model": CLIOptionMeta(name="--input-model", category=OptionCategory.BASE),
"--input-model-ref-strategy": CLIOptionMeta(name="--input-model-ref-strategy", category=OptionCategory.BASE),
"--input-file-type": CLIOptionMeta(name="--input-file-type", category=OptionCategory.BASE),
"--encoding": CLIOptionMeta(name="--encoding", category=OptionCategory.BASE),
# ==========================================================================
Expand Down
15 changes: 15 additions & 0 deletions src/datamodel_code_generator/enums.py
Original file line number Diff line number Diff line change
Expand Up @@ -196,6 +196,20 @@ class UnionMode(Enum):
left_to_right = "left_to_right"


class InputModelRefStrategy(Enum):
"""Strategy for handling referenced types in --input-model.

RegenerateAll: Regenerate all referenced types into target output type.
ReuseForeign: Reuse types from different model families via import,
regenerate same-family types into target output type.
ReuseAll: Reuse all referenced types via import, no regeneration.
"""

RegenerateAll = "regenerate-all"
ReuseForeign = "reuse-foreign"
ReuseAll = "reuse-all"


class StrictTypes(Enum):
"""Strict type options for generated models."""

Expand All @@ -219,6 +233,7 @@ class StrictTypes(Enum):
"FieldTypeCollisionStrategy",
"GraphQLScope",
"InputFileType",
"InputModelRefStrategy",
"ModuleSplitMode",
"NamingStrategy",
"OpenAPIScope",
Expand Down
Loading
Loading