Skip to content

Add schema_features property to parsers for version detection#2929

Merged
koxudaxi merged 20 commits intomainfrom
feature/schema-object-refactoring
Jan 6, 2026
Merged

Add schema_features property to parsers for version detection#2929
koxudaxi merged 20 commits intomainfrom
feature/schema-object-refactoring

Conversation

@koxudaxi
Copy link
Copy Markdown
Owner

@koxudaxi koxudaxi commented Jan 5, 2026

Fixes: N/A

Summary by CodeRabbit

  • New Features

    • Added schema version hints and mode (lenient/strict) with version-aware parsing and feature detection for JSON Schema and OpenAPI.
  • CLI / Config

    • New CLI options and config fields to set schema version and validation mode.
  • Documentation

    • New docs describing supported schema/OpenAPI versions, behaviors, and CLI reference updates.
  • Tests

    • Expanded tests and E2E coverage for version detection, mode behavior, warnings, and error cases.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jan 5, 2026

📝 Walkthrough

Walkthrough

Adds version-aware schema handling across CLI, config, and parsers: new CLI options and config fields for schema version/mode, propagation into parser configs, Parser base refactor to support schema feature flags, and per-parser cached schema_features to detect/apply JSON Schema/OpenAPI capabilities with warnings in strict mode.

Changes

Cohort / File(s) Summary
CLI & Top-level wiring
src/datamodel_code_generator/__main__.py, src/datamodel_code_generator/arguments.py, src/datamodel_code_generator/arguments.py, src/datamodel_code_generator/cli_options.py, src/datamodel_code_generator/prompt_data.py
Add --schema-version and --schema-version-mode options; add Config fields schema_version and schema_version_mode; propagate these values into generate/run and CLI metadata/help.
Config & Types
src/datamodel_code_generator/config.py, src/datamodel_code_generator/_types/generate_config_dict.py, src/datamodel_code_generator/_types/parser_config_dicts.py
Expose new config fields: schema_version, schema_version_mode, jsonschema_version, openapi_version; add typing for version enums and VersionMode.
Parser base & constructor changes
src/datamodel_code_generator/parser/base.py, src/datamodel_code_generator/__init__.py
Rework Parser generic to accept SchemaFeatures type param; add abstract schema_features property, dynamic _config_class_name resolver, and updated Parser constructor/_create_default_config to accept parser-specific config dicts and feature propagation.
JSON Schema parser
src/datamodel_code_generator/parser/jsonschema.py
Export JsonSchemaVersion; import JsonSchemaFeatures under TYPE_CHECKING; add schema_features cached_property and related logic to detect/or honor configured JSON Schema version, adjust schema_paths, strict-mode validations, and array/raw-object checks to be version-aware.
OpenAPI parser
src/datamodel_code_generator/parser/openapi.py
Add schema_features cached_property using OpenAPISchemaFeatures.from_openapi_version, use feature flags for nullable handling and version-mode dependent warnings; add _config_class_name.
GraphQL parser
src/datamodel_code_generator/parser/graphql.py
Conform GraphQLParser to new Parser generics; add _config_class_name and schema_features cached_property (defaults to Draft202012 features); remove legacy default-config helper.
Schema feature flags
src/datamodel_code_generator/parser/schema_version.py
Add exclusive_as_number flag to JsonSchemaFeatures and propagate into OpenAPI features; expand feature detection/constructors per version.
Tests & expected outputs
tests/parser/test_schema_version.py, tests/parser/test_base.py, tests/parser/test_graphql.py, tests/main/test_public_api_signature_baseline.py, tests/data/expected/...
Add extensive tests for version detection, strict/lenient behavior, CLI scenarios, public API signature updates, and update generated-expected artifacts to include new config fields and VersionMode alias.
Docs & navigation
docs/supported_formats.md, docs/cli-reference/*, zensical.toml
New supported_formats doc detailing versions/features, add CLI docs for --schema-version and --schema-version-mode, and add nav entry.
Misc (prompt/help/metadata)
src/datamodel_code_generator/prompt_data.py, src/datamodel_code_generator/__init__.py
Add help text entries and option propagation for schema versioning; validation of provided schema_version into appropriate enums with error messages.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant CLI
  participant Config
  participant AppInit as _create_parser_config
  participant Parser
  participant Detector as VersionDetector
  participant Features as SchemaFeatures

  User->>CLI: invoke generate (--schema-version, --schema-version-mode)
  CLI->>Config: load Config (schema_version, schema_version_mode)
  CLI->>AppInit: _create_parser_config(..., additional_options includes schema_version/schema_version_mode)
  AppInit->>Parser: instantiate Parser with config dict
  Parser->>Detector: detect_jsonschema_version / detect_openapi_version (from raw_obj or config)
  Detector-->>Features: map to JsonSchemaFeatures/OpenAPISchemaFeatures
  Features-->>Parser: return schema_features (cached_property)
  Parser->>Parser: apply feature flags (schema_paths, nullable, array handling)
  Parser-->>CLI: warnings/errors based on schema_version_mode
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

breaking-change-analyzed

Suggested reviewers

  • ilovelinux

Poem

🐰 I hopped through options, sniffed each schema line,
Found versions hiding, flagged features fine.
Cached my findings, warned in strict mode,
Parsers now dance on a versioned road.
Bravo—little rabbit leaves a hopping sign! 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title 'Add schema_features property to parsers for version detection' accurately summarizes the main changes, which introduce a new schema_features property across multiple parser classes (JSONSchemaParser, OpenAPIParser, GraphQLParser) to enable version detection and version-aware behavior.
Docstring Coverage ✅ Passed Docstring coverage is 98.77% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jan 5, 2026

📚 Docs Preview: https://pr-2929.datamodel-code-generator.pages.dev

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Jan 5, 2026

Merging this PR will not alter performance

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

Summary

✅ 11 untouched benchmarks
⏩ 98 skipped benchmarks1


Comparing feature/schema-object-refactoring (332cccd) with main (d4adf40)

Open in CodSpeed

Footnotes

  1. 98 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Base automatically changed from feature/format-registry-separation to feature/schema-version-detection January 6, 2026 14:57
Base automatically changed from feature/schema-version-detection to main January 6, 2026 15:43
@koxudaxi koxudaxi force-pushed the feature/schema-object-refactoring branch from 4c3c9e9 to 811cf13 Compare January 6, 2026 15:46
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
src/datamodel_code_generator/parser/openapi.py (1)

173-182: Remove unused noqa directive.

Static analysis indicates that the noqa: PLC0415 directive on line 176 is unnecessary since PLC0415 is not enabled. This applies to the import statement inside the method.

🔎 Proposed fix
     @cached_property
     def schema_features(self) -> OpenAPISchemaFeatures:
         """Get schema features based on detected OpenAPI version."""
-        from datamodel_code_generator.parser.schema_version import (  # noqa: PLC0415
+        from datamodel_code_generator.parser.schema_version import (
             OpenAPISchemaFeatures,
             detect_openapi_version,
         )

         version = detect_openapi_version(self.raw_obj) if self.raw_obj else OpenAPIVersion.Auto
         return OpenAPISchemaFeatures.from_openapi_version(version)
src/datamodel_code_generator/parser/jsonschema.py (1)

774-783: Remove unused noqa directive.

Static analysis indicates that the noqa: PLC0415 directive on line 777 is unnecessary since PLC0415 is not enabled. This mirrors the same issue in the OpenAPIParser.

🔎 Proposed fix
     @cached_property
     def schema_features(self) -> JsonSchemaFeatures:
         """Get schema features based on detected version."""
-        from datamodel_code_generator.parser.schema_version import (  # noqa: PLC0415
+        from datamodel_code_generator.parser.schema_version import (
             JsonSchemaFeatures,
             detect_jsonschema_version,
         )

         version = detect_jsonschema_version(self.raw_obj) if self.raw_obj else JsonSchemaVersion.Auto
         return JsonSchemaFeatures.from_version(version)
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 60f7335 and 811cf13.

📒 Files selected for processing (3)
  • src/datamodel_code_generator/parser/jsonschema.py
  • src/datamodel_code_generator/parser/openapi.py
  • tests/parser/test_schema_version.py
🧰 Additional context used
🧬 Code graph analysis (3)
tests/parser/test_schema_version.py (2)
src/datamodel_code_generator/parser/jsonschema.py (1)
  • schema_features (775-783)
src/datamodel_code_generator/parser/openapi.py (1)
  • schema_features (174-182)
src/datamodel_code_generator/parser/openapi.py (3)
src/datamodel_code_generator/enums.py (1)
  • OpenAPIVersion (257-265)
src/datamodel_code_generator/parser/schema_version.py (3)
  • OpenAPISchemaFeatures (85-123)
  • detect_openapi_version (168-182)
  • from_openapi_version (99-123)
src/datamodel_code_generator/parser/jsonschema.py (1)
  • schema_features (775-783)
src/datamodel_code_generator/parser/jsonschema.py (3)
src/datamodel_code_generator/enums.py (1)
  • JsonSchemaVersion (243-254)
src/datamodel_code_generator/parser/schema_version.py (3)
  • JsonSchemaFeatures (20-81)
  • detect_jsonschema_version (139-165)
  • from_version (43-81)
src/datamodel_code_generator/parser/openapi.py (1)
  • schema_features (174-182)
🪛 Ruff (0.14.10)
src/datamodel_code_generator/parser/openapi.py

176-176: Unused noqa directive (non-enabled: PLC0415)

Remove unused noqa directive

(RUF100)

src/datamodel_code_generator/parser/jsonschema.py

777-777: Unused noqa directive (non-enabled: PLC0415)

Remove unused noqa directive

(RUF100)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: py312-isort5 on Ubuntu
  • GitHub Check: 3.11 on Windows
  • GitHub Check: 3.10 on macOS
  • GitHub Check: 3.10 on Windows
  • GitHub Check: 3.11 on macOS
  • GitHub Check: 3.12 on Windows
  • GitHub Check: 3.12 on macOS
  • GitHub Check: 3.14 on Windows
  • GitHub Check: 3.13 on Windows
  • GitHub Check: Analyze (python)
  • GitHub Check: benchmarks
🔇 Additional comments (6)
src/datamodel_code_generator/parser/openapi.py (3)

14-14: LGTM!

The cached_property import is correctly added to support the new schema_features property.


30-30: LGTM!

The OpenAPIVersion import is correctly added for use in the fallback case when raw_obj is falsy.


50-50: LGTM!

Using TYPE_CHECKING for the OpenAPISchemaFeatures import is the correct pattern to avoid circular imports at runtime while maintaining proper type hints.

src/datamodel_code_generator/parser/jsonschema.py (2)

31-31: LGTM!

The JsonSchemaVersion import is correctly added for use in the fallback case when raw_obj is falsy.


91-91: LGTM!

Using TYPE_CHECKING for the JsonSchemaFeatures import is the correct pattern to avoid circular imports at runtime while maintaining proper type hints.

tests/parser/test_schema_version.py (1)

404-423: No issues found with the test pattern—the code is safe as written.

The tests correctly set raw_obj before accessing schema_features. Since schema_features is a @cached_property, it is lazily evaluated only on first access. The implementation includes guard clauses (if self.raw_obj else ...) that safely handle the case where raw_obj might be unset, and the test setup order ensures this is never reached with an uninitialized value.

The test pattern is straightforward and works correctly without requiring additional clarifying comments or refactoring.

koxudaxi and others added 4 commits January 7, 2026 01:08
* Add --jsonschema-version and --openapi-version CLI options

* Add --schema-version and --schema-version-mode CLI options

* Regenerate CLI docs

* Add version-specific schema processing using schema_features

* Implement flag-based behavior control for schema version

* Add comprehensive version-specific feature checks with exclusive_as_number flag

* Replace getattr with direct config access for schema_version_mode
Generated by GitHub Actions
Comment thread src/datamodel_code_generator/parser/base.py Dismissed
Comment thread src/datamodel_code_generator/parser/base.py Dismissed
Comment thread src/datamodel_code_generator/parser/base.py Fixed
Comment thread src/datamodel_code_generator/parser/base.py Fixed
Comment thread src/datamodel_code_generator/parser/base.py Dismissed
@koxudaxi koxudaxi enabled auto-merge (squash) January 6, 2026 17:48
@koxudaxi koxudaxi merged commit c4d4b8d into main Jan 6, 2026
36 of 38 checks passed
@koxudaxi koxudaxi deleted the feature/schema-object-refactoring branch January 6, 2026 17:51
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jan 6, 2026

Breaking Change Analysis

Result: Breaking changes detected

Reasoning: This PR modifies the Parser base class signature by adding a second generic type parameter (SchemaFeaturesT) and introduces a new abstract property (schema_features) that all parser subclasses must implement. These are breaking changes for users who have created custom parser subclasses. However, the new CLI options (--schema-version, --schema-version-mode) and config fields are additive with None defaults, so they don't break existing usage. The default behavior remains unchanged since schema_version defaults to "auto" and schema_version_mode defaults to "lenient".

Content for Release Notes

Custom Template Update Required

  • Parser subclass signature change - The Parser base class now requires two generic type parameters: Parser[ParserConfigT, SchemaFeaturesT] instead of just Parser[ParserConfigT]. Custom parser subclasses must be updated to include the second type parameter. (Add schema_features property to parsers for version detection #2929)

    # Before
    class MyCustomParser(Parser["MyParserConfig"]):
        ...
    
    # After
    class MyCustomParser(Parser["MyParserConfig", "JsonSchemaFeatures"]):
        ...
  • New abstract schema_features property required - Custom parser subclasses must now implement the schema_features abstract property that returns a JsonSchemaFeatures (or subclass) instance. (Add schema_features property to parsers for version detection #2929)

    from functools import cached_property
    from datamodel_code_generator.parser.schema_version import JsonSchemaFeatures
    from datamodel_code_generator.enums import JsonSchemaVersion
    
    class MyCustomParser(Parser["MyParserConfig", "JsonSchemaFeatures"]):
        @cached_property
        def schema_features(self) -> JsonSchemaFeatures:
            return JsonSchemaFeatures.from_version(JsonSchemaVersion.Draft202012)
  • Parser _create_default_config refactored to use class variable - Subclasses that override _create_default_config should now set the _config_class_name class variable instead. The base implementation uses this variable to dynamically instantiate the correct config class. (Add schema_features property to parsers for version detection #2929)

    # Before
    @classmethod
    def _create_default_config(cls, options: MyConfigDict) -> MyParserConfig:
        # custom implementation...
    
    # After
    _config_class_name: ClassVar[str] = "MyParserConfig"
    # No need to override _create_default_config if using standard config creation

This analysis was performed by Claude Code Action

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/datamodel_code_generator/_types/generate_config_dict.py (1)

6-8: Duplicate TypedDict import.

TypedDict is imported from both typing (line 6) and typing_extensions (line 8). The typing_extensions version is preferred for closed=True support and broader compatibility. Remove the duplicate from typing.

🔎 Proposed fix
-from typing import TYPE_CHECKING, Any, TypedDict
+from typing import TYPE_CHECKING, Any

 from typing_extensions import NotRequired, TypedDict
src/datamodel_code_generator/_types/parser_config_dicts.py (1)

6-8: Duplicate TypedDict import (same issue as generate_config_dict.py).

TypedDict is imported from both typing (line 6) and typing_extensions (line 8). Remove the duplicate from typing.

🔎 Proposed fix
-from typing import TYPE_CHECKING, Any, TypeAlias, TypedDict
+from typing import TYPE_CHECKING, Any, TypeAlias

 from typing_extensions import NotRequired, TypedDict
🤖 Fix all issues with AI Agents
In @docs/cli-reference/base-options.md:
- Around line 569-613: Add two example subsections under the
`--schema-version-mode` section demonstrating the difference between lenient and
strict: create a JSON schema that declares draft-07 but uses a draft-2020+
feature such as "prefixItems"; show the datamodel-codegen invocation with
`--schema-version-mode strict` and note the expected warning output (mention the
warning text or that a warning about unsupported "prefixItems" or version
mismatch appears), then show the invocation with `--schema-version-mode lenient`
and state that generation proceeds without warnings and produces the same model;
reference `--schema-version-mode`, `strict`, `lenient`, and `prefixItems` so
readers can locate the related option and feature in the docs.
- Around line 331-567: Summary: The OpenAPI example is overly verbose and
contains a bogus error output, and neither example demonstrates how
--schema-version affects generation. Fix: replace the large OpenAPI YAML under
the "OpenAPI" example with a minimal single-endpoint spec (keep the section
title "OpenAPI") and provide actual generated output instead of `> **Error:**
File not found: openapi/api.py`; shorten or remove the 180-line spec in the diff
and ensure the Usage snippet still shows the datamodel-codegen invocation with
--schema-version; add a small JSON Schema example that uses a draft-specific
feature (e.g., exclusiveMinimum as boolean vs numeric) and include two short
generated outputs labeled for draft-04 and draft-07 to illustrate the effect of
the --schema-version option; update the examples sections ("OpenAPI" and "JSON
Schema") to explicitly mention --schema-version in their captions.

In @src/datamodel_code_generator/parser/jsonschema.py:
- Around line 781-790: _check_version_specific_features currently assumes raw is
a bool or dict and calls raw.get("type"), which raises AttributeError for other
YamlValue shapes (list, str, number); update _check_version_specific_features to
first check types (e.g., if isinstance(raw, bool): handle boolean branch; elif
isinstance(raw, dict): inspect raw.get(...); else: return/skip without accessing
attributes) so non-dict/non-bool values are safely ignored and errors remain
wrapped by _validate_schema_object (apply the same type-guard change to the
other copy of this logic around the 3648-3711 region).
🧹 Nitpick comments (9)
tests/parser/test_schema_version.py (2)

576-832: Warning-oriented tests are thorough; consider minor DRY helper

The version/mode‑specific warning tests (nullable keyword, null in type array, exclusiveMinimum forms, prefixItems, items as array, boolean schemas, and lenient mode) give excellent coverage of the new schema_features checks and modes.

There is a lot of repeated warnings.catch_warnings / filtering / extraction boilerplate; if this pattern expands further, consider a small helper (e.g. collect_warnings(func, category) or a fixture) to keep the tests concise and easier to maintain.


1011-1083: Clean up temporary files created in E2E strict-mode tests

Both E2E tests use NamedTemporaryFile(..., delete=False) but never remove the file, which can leave junk in the temp directory over many runs.

You can delete the file after generate() completes while still inside the NamedTemporaryFile context:

Proposed cleanup for temp files
-    with tempfile.NamedTemporaryFile(encoding="utf-8", mode="w", suffix=".json", delete=False) as f:
-        json.dump(schema, f)
-        f.flush()
-
-        with warnings.catch_warnings(record=True) as w:
-            warnings.simplefilter("always")
-            result = generate(
-                Path(f.name),
+    with tempfile.NamedTemporaryFile(encoding="utf-8", mode="w", suffix=".json", delete=False) as f:
+        json.dump(schema, f)
+        f.flush()
+        tmp_path = Path(f.name)
+
+        with warnings.catch_warnings(record=True) as w:
+            warnings.simplefilter("always")
+            result = generate(
+                tmp_path,
                 input_file_type=datamodel_code_generator.InputFileType.JsonSchema,
                 schema_version="draft-07",
                 schema_version_mode=VersionMode.Strict,
             )
             user_warnings = [x for x in w if issubclass(x.category, UserWarning)]
             assert any("exclusiveMaximum as boolean" in str(uw.message) for uw in user_warnings)
             assert result is not None
+        tmp_path.unlink(missing_ok=True)

Apply the same pattern to the Draft4 test.

src/datamodel_code_generator/parser/graphql.py (1)

65-79: GraphQLParser integration with generic Parser and schema_features is consistent

Extending GraphQLParser to Parser["GraphQLParserConfig", "JsonSchemaFeatures"] and providing a cached schema_features property that returns Draft 2020‑12 defaults neatly aligns it with the new version/feature infrastructure while keeping GraphQL version‑agnostic.

Minor nit: the # noqa: PLC0415 markers on the local imports inside schema_features are flagged as unused by Ruff; they can be dropped without behavior change.

src/datamodel_code_generator/__main__.py (1)

72-72: Config/schema_version wiring into generate is correct; minor lint cleanup possible

Importing VersionMode, adding schema_version/schema_version_mode to Config, and forwarding both into generate() in run_generate_from_config line up with the new CLI options and parser behavior exercised in the tests. This keeps pyproject, CLI args, and programmatic use in sync.

The # noqa: UP045 markers on the new fields are reported as unused by Ruff; since they don’t affect behavior, you can safely remove them if you want a cleaner lint run.

Also applies to: 624-625, 1069-1071

src/datamodel_code_generator/parser/base.py (2)

715-760: Dynamic config resolution via _get_config_class is good; watch config/kwargs API change

Using _config_class_name and _get_config_class() to locate the concrete config model, and then having _create_default_config() build it for Pydantic v1/v2, is a solid refactor that removes the hard-coded ParserConfig dependency and lets each parser choose its config type.

Two minor points:

  1. Public API behavior change
    __init__ now raises ValueError when both config and keyword **options are provided. If any external code previously relied on “config plus overrides via kwargs”, this will now fail. If that usage is unsupported, the explicit error is fine; otherwise you may want to either:

    • Keep the old behavior (merge options into a copy of config), or
    • Document this as a breaking change.
  2. Unused # noqa: PLC0415 markers
    Ruff reports the # noqa: PLC0415 comments on the inline imports (importlib, types_module, model_base, is_pydantic_v2) as unused. They can be removed without changing behavior.

Also applies to: 761-785


724-727: Inline imports are appropriate; only the noqa markers are noisy

The function-local imports in _get_config_class() and _create_default_config() are reasonable to avoid top-level import cycles and startup cost. Given static analysis now flags the # noqa: PLC0415 directives on those imports as unused, it’s safe to drop the # noqa comments while keeping the imports themselves.

Also applies to: 735-737

src/datamodel_code_generator/parser/openapi.py (2)

173-186: Remove unused # noqa: PLC0415 directive on runtime import

Ruff flags the # noqa: PLC0415 on the from datamodel_code_generator.parser.schema_version import ... line as unused (non‑enabled code). Since this import is intentional and there’s no active PLC0415 check, the directive can be dropped to satisfy linting without changing behavior.


219-241: Nullable handling for OpenAPI looks correct but mutates obj.type in-place

The new get_data_type() logic for OpenAPI nullable semantics (3.0 vs 3.1) is coherent: it normalizes nullable: true into type: [..., "null"] when strict_nullable is enabled, and warns in VersionMode.Strict for 3.1+.

One side effect is that obj.type is mutated in place. Given how JsonSchemaObject is used elsewhere, this is probably acceptable, but it is worth being aware that other code paths working on the same instance will now see the normalized type list rather than the original scalar type.

If you want to avoid side effects, you could instead clone obj locally before mutating type, but that’s optional and would be a trade‑off with performance.

src/datamodel_code_generator/parser/jsonschema.py (1)

724-777: Version-aware schema_paths and schema_features design looks good; consider multi-document behavior and remove unused noqa

The new schema_paths / schema_features interplay is solid:

  • schema_features.definitions_key drives which of #/definitions vs #/$defs is primary.
  • schema_version_mode == VersionMode.Strict limits to the primary path; lenient mode keeps the previous “try both” behavior.
  • schema_features respects an explicit jsonschema_version first, then falls back to auto-detection via detect_jsonschema_version(self.raw_obj).

Two small follow-ups:

  1. Unused # noqa: PLC0415
    The inline import in schema_features is marked # noqa: PLC0415, but Ruff reports PLC0415 as non-enabled here. Since the import is intentional and there’s no active PLC0415 rule, you can drop the directive to silence RUF100.

  2. Per-instance caching across multiple documents (optional)
    schema_features is a cached_property based on self.raw_obj at first access. If one JsonSchemaParser instance is used to parse multiple top-level documents with different $schema declarations, only the first document’s detected version will be used for all subsequent files. If that scenario matters for your users, you might want to:

    • compute schema_features per root document in parse_raw() (e.g., not cached, or cached per source), or
    • document that a parser instance is intended to be used per logical schema version.

Also applies to: 781-790

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 811cf13 and 332cccd.

⛔ Files ignored due to path filters (2)
  • docs/llms-full.txt is excluded by none and included by none
  • docs/llms.txt is excluded by none and included by none
📒 Files selected for processing (24)
  • docs/cli-reference/base-options.md
  • docs/cli-reference/index.md
  • docs/cli-reference/quick-reference.md
  • docs/supported_formats.md
  • src/datamodel_code_generator/__init__.py
  • src/datamodel_code_generator/__main__.py
  • src/datamodel_code_generator/_types/generate_config_dict.py
  • src/datamodel_code_generator/_types/parser_config_dicts.py
  • src/datamodel_code_generator/arguments.py
  • src/datamodel_code_generator/cli_options.py
  • src/datamodel_code_generator/config.py
  • src/datamodel_code_generator/parser/base.py
  • src/datamodel_code_generator/parser/graphql.py
  • src/datamodel_code_generator/parser/jsonschema.py
  • src/datamodel_code_generator/parser/openapi.py
  • src/datamodel_code_generator/parser/schema_version.py
  • src/datamodel_code_generator/prompt_data.py
  • tests/data/expected/main/input_model/config_class.py
  • tests/data/expected/main/jsonschema/simple_string.py
  • tests/main/test_public_api_signature_baseline.py
  • tests/parser/test_base.py
  • tests/parser/test_graphql.py
  • tests/parser/test_schema_version.py
  • zensical.toml
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-01-02T08:25:22.111Z
Learnt from: koxudaxi
Repo: koxudaxi/datamodel-code-generator PR: 2890
File: tests/data/expected/main/jsonschema/ref_nullable_with_constraint.py:14-15
Timestamp: 2026-01-02T08:25:22.111Z
Learning: The datamodel-code-generator currently generates RootModel subclasses with an explicit `root` field annotation (e.g., `class StringType(RootModel[str]): root: str`). This is existing behavior of the code generator and should not be flagged as an issue introduced by new changes.

Applied to files:

  • tests/data/expected/main/jsonschema/simple_string.py
🧬 Code graph analysis (14)
tests/data/expected/main/input_model/config_class.py (2)
src/datamodel_code_generator/enums.py (1)
  • VersionMode (268-276)
src/datamodel_code_generator/model/type_alias.py (1)
  • TypeAlias (37-42)
src/datamodel_code_generator/_types/parser_config_dicts.py (1)
src/datamodel_code_generator/enums.py (3)
  • JsonSchemaVersion (243-254)
  • OpenAPIVersion (257-265)
  • VersionMode (268-276)
src/datamodel_code_generator/parser/graphql.py (3)
src/datamodel_code_generator/parser/schema_version.py (2)
  • JsonSchemaFeatures (20-87)
  • from_version (45-87)
src/datamodel_code_generator/parser/base.py (2)
  • Parser (694-3252)
  • schema_features (707-713)
src/datamodel_code_generator/enums.py (1)
  • JsonSchemaVersion (243-254)
tests/parser/test_graphql.py (3)
src/datamodel_code_generator/parser/schema_version.py (1)
  • JsonSchemaFeatures (20-87)
src/datamodel_code_generator/parser/graphql.py (2)
  • GraphQLParser (65-516)
  • schema_features (72-77)
tests/parser/test_base.py (1)
  • schema_features (44-49)
tests/parser/test_base.py (5)
src/datamodel_code_generator/parser/schema_version.py (2)
  • JsonSchemaFeatures (20-87)
  • from_version (45-87)
src/datamodel_code_generator/parser/jsonschema.py (1)
  • schema_features (779-790)
src/datamodel_code_generator/parser/base.py (1)
  • schema_features (707-713)
src/datamodel_code_generator/parser/graphql.py (1)
  • schema_features (72-77)
src/datamodel_code_generator/enums.py (1)
  • JsonSchemaVersion (243-254)
src/datamodel_code_generator/_types/generate_config_dict.py (1)
src/datamodel_code_generator/enums.py (1)
  • VersionMode (268-276)
src/datamodel_code_generator/parser/jsonschema.py (2)
src/datamodel_code_generator/enums.py (2)
  • JsonSchemaVersion (243-254)
  • VersionMode (268-276)
src/datamodel_code_generator/parser/schema_version.py (3)
  • JsonSchemaFeatures (20-87)
  • detect_jsonschema_version (149-179)
  • from_version (45-87)
src/datamodel_code_generator/__init__.py (3)
src/datamodel_code_generator/_types/parser_config_dicts.py (4)
  • GraphQLParserConfigDict (166-169)
  • JSONSchemaParserConfigDict (172-174)
  • OpenAPIParserConfigDict (177-182)
  • ParserConfigDict (46-163)
src/datamodel_code_generator/enums.py (3)
  • JsonSchemaVersion (243-254)
  • OpenAPIVersion (257-265)
  • InputFileType (35-45)
src/datamodel_code_generator/config.py (1)
  • JSONSchemaParserConfig (355-359)
src/datamodel_code_generator/config.py (1)
src/datamodel_code_generator/enums.py (3)
  • JsonSchemaVersion (243-254)
  • OpenAPIVersion (257-265)
  • VersionMode (268-276)
src/datamodel_code_generator/parser/openapi.py (3)
src/datamodel_code_generator/enums.py (2)
  • OpenAPIVersion (257-265)
  • VersionMode (268-276)
src/datamodel_code_generator/parser/schema_version.py (3)
  • OpenAPISchemaFeatures (91-133)
  • detect_openapi_version (182-196)
  • from_openapi_version (105-133)
src/datamodel_code_generator/parser/jsonschema.py (1)
  • schema_features (779-790)
tests/parser/test_schema_version.py (3)
src/datamodel_code_generator/parser/jsonschema.py (5)
  • JsonSchemaParser (567-3945)
  • schema_features (779-790)
  • schema_paths (751-776)
  • JsonSchemaObject (207-507)
  • get_data_type (1127-1156)
src/datamodel_code_generator/parser/openapi.py (3)
  • schema_features (174-185)
  • OpenAPIParser (168-825)
  • get_data_type (218-242)
src/datamodel_code_generator/enums.py (3)
  • JsonSchemaVersion (243-254)
  • OpenAPIVersion (257-265)
  • VersionMode (268-276)
src/datamodel_code_generator/arguments.py (1)
src/datamodel_code_generator/enums.py (1)
  • VersionMode (268-276)
tests/main/test_public_api_signature_baseline.py (1)
src/datamodel_code_generator/enums.py (1)
  • VersionMode (268-276)
src/datamodel_code_generator/__main__.py (1)
src/datamodel_code_generator/enums.py (1)
  • VersionMode (268-276)
🪛 Ruff (0.14.10)
src/datamodel_code_generator/parser/graphql.py

74-74: Unused noqa directive (non-enabled: PLC0415)

Remove unused noqa directive

(RUF100)


75-75: Unused noqa directive (non-enabled: PLC0415)

Remove unused noqa directive

(RUF100)

src/datamodel_code_generator/parser/base.py

724-724: Unused noqa directive (non-enabled: PLC0415)

Remove unused noqa directive

(RUF100)


735-735: Unused noqa directive (non-enabled: PLC0415)

Remove unused noqa directive

(RUF100)


736-736: Unused noqa directive (non-enabled: PLC0415)

Remove unused noqa directive

(RUF100)


737-737: Unused noqa directive (non-enabled: PLC0415)

Remove unused noqa directive

(RUF100)

src/datamodel_code_generator/parser/jsonschema.py

781-781: Unused noqa directive (non-enabled: PLC0415)

Remove unused noqa directive

(RUF100)

src/datamodel_code_generator/parser/openapi.py

176-176: Unused noqa directive (non-enabled: PLC0415)

Remove unused noqa directive

(RUF100)

src/datamodel_code_generator/__main__.py

624-624: Unused noqa directive (non-enabled: UP045)

Remove unused noqa directive

(RUF100)


625-625: Unused noqa directive (non-enabled: UP045)

Remove unused noqa directive

(RUF100)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: 3.12 on macOS
  • GitHub Check: 3.10 on macOS
  • GitHub Check: 3.11 on Ubuntu
  • GitHub Check: 3.14 on Windows
  • GitHub Check: py312-black22 on Ubuntu
  • GitHub Check: 3.11 on macOS
  • GitHub Check: 3.13 on Windows
  • GitHub Check: 3.10 on Windows
  • GitHub Check: 3.11 on Windows
  • GitHub Check: 3.12 on Windows
  • GitHub Check: benchmarks
  • GitHub Check: Analyze (python)
🔇 Additional comments (32)
zensical.toml (1)

32-32: LGTM!

The navigation entry addition is appropriate and correctly formatted. It provides a clear entry point for users to learn about the new schema version support feature.

src/datamodel_code_generator/prompt_data.py (1)

96-97: LGTM!

The option descriptions are clear and concise. They appropriately document the new schema version configuration options.

tests/parser/test_graphql.py (1)

150-173: LGTM!

The test comprehensively verifies that GraphQLParser exposes the schema_features property and returns the correct JsonSchemaFeatures instance with Draft 2020-12 feature flags. The use of inline_snapshot ensures precise validation of feature flag values.

docs/cli-reference/base-options.md (1)

13-14: LGTM!

The table entries are properly formatted and provide clear descriptions for the new options.

tests/data/expected/main/jsonschema/simple_string.py (1)

1-10: LGTM!

This test fixture correctly represents the expected output for a simple JSON Schema with a single required string field. The generated model follows standard Pydantic patterns with appropriate imports and type annotations.

docs/cli-reference/index.md (2)

11-11: LGTM! Base Options count correctly updated.

The count of 9 Base Options accurately reflects the addition of the two new schema-version options.


164-165: LGTM! New options documented in alphabetical order.

The two new schema-version options are correctly added to the S section with appropriate links to the base-options documentation.

tests/parser/test_base.py (3)

6-6: LGTM! Appropriate use of TYPE_CHECKING.

The TYPE_CHECKING guard correctly avoids runtime import overhead while maintaining type safety for the JsonSchemaFeatures annotation.

Also applies to: 13-14


43-49: LGTM! Correct implementation of abstract schema_features property.

The test stub correctly implements the schema_features property required by the Parser base class, returning a JsonSchemaFeatures instance based on Draft202012. This aligns with the pattern used in production parser implementations.


68-69: LGTM! Good test coverage for the new property.

The assertion validates that the schema_features property returns a correctly configured JsonSchemaFeatures instance with expected flag values.

tests/main/test_public_api_signature_baseline.py (2)

31-31: LGTM! VersionMode import added for new parameter types.

The import supports the new schema_version_mode parameter added to the public API baseline.


191-192: LGTM! Backward-compatible API extension.

The two new parameters correctly maintain backward compatibility by:

  • Being keyword-only (after the * marker)
  • Defaulting to None
  • Following the established type annotation pattern

This ensures existing callers continue to work without modification.

tests/data/expected/main/input_model/config_class.py (2)

117-118: LGTM! VersionMode type alias correctly defined.

The Literal type accurately represents the two modes ('lenient', 'strict') from the VersionMode enum.


252-253: LGTM! GenerateConfig extended with schema version fields.

The two new fields are correctly defined:

  • Both use NotRequired for optional configuration
  • Types match the public API signature baseline
  • schema_version_mode correctly references the VersionMode type alias
src/datamodel_code_generator/cli_options.py (1)

73-74: LGTM! CLI option metadata correctly registered.

The two new schema-version options are properly categorized under BASE and follow the established pattern. The total count of 9 Base Options is consistent with the documentation.

src/datamodel_code_generator/_types/generate_config_dict.py (1)

172-173: New versioning fields added correctly.

The schema_version and schema_version_mode fields are properly typed and consistent with the GenerateConfig model in config.py.

docs/supported_formats.md (2)

1-92: Comprehensive version support documentation.

The documentation thoroughly covers JSON Schema and OpenAPI version support, feature compatibility matrices, version detection logic, and format mappings. This aligns well with the implementation in schema_version.py.


188-192: All referenced documentation files exist and links are valid.

The "See Also" links to supported-data-types.md, jsonschema.md, and openapi.md all exist in the docs/ directory. No broken links.

src/datamodel_code_generator/arguments.py (1)

988-1006: Well-structured CLI options for schema versioning.

The new --schema-version and --schema-version-mode options are appropriately placed in base_options, have clear help text, and correctly derive choices from VersionMode. The default of None for both enables auto-detection with lenient behavior.

src/datamodel_code_generator/config.py (2)

211-212: GenerateConfig versioning fields added correctly.

Using str | None for schema_version allows flexible CLI input while the parser-specific configs use typed enums. The schema_version_mode properly uses the VersionMode enum type.


355-369: Parser config hierarchy correctly propagates versioning fields.

JSONSchemaParserConfig defines jsonschema_version and schema_version_mode, while OpenAPIParserConfig extends it with openapi_version. This inheritance ensures schema_version_mode is available to both parsers.

src/datamodel_code_generator/parser/schema_version.py (3)

36-43: exclusive_as_number flag correctly added to feature set.

The new flag properly reflects JSON Schema evolution: Draft 4 used boolean exclusiveMinimum/exclusiveMaximum as modifiers to minimum/maximum, while Draft 6+ changed them to standalone numeric values.


108-121: OpenAPI 3.0 correctly uses Draft 4 semantics for exclusive constraints.

The comment accurately explains that OpenAPI 3.0's schema dialect inherits JSON Schema Draft 4/5 semantics where exclusiveMinimum/exclusiveMaximum are boolean modifiers, not standalone numeric values.


171-179: Heuristic detection logic is sound.

The detection priority (explicit $schema$defs presence → definitions presence → Draft7 fallback) is reasonable. Defaulting to Draft 2020-12 when $defs is present is a sensible choice since it's the most permissive superset.

src/datamodel_code_generator/_types/parser_config_dicts.py (1)

172-182: TypedDict config extensions properly mirror Pydantic models.

The new version fields in JSONSchemaParserConfigDict and OpenAPIParserConfigDict correctly use NotRequired and match the types in the corresponding config.py Pydantic models.

docs/cli-reference/quick-reference.md (1)

25-26: Documentation entries added correctly.

The new options are properly documented in both the Base Options table (lines 25-26) and the alphabetical index (lines 304-305) with consistent descriptions and correct link anchors to base-options.md.

tests/parser/test_schema_version.py (3)

422-464: Schema feature detection and config override tests look solid

The JsonSchemaParser/OpenAPIParser schema_features detection and config-override tests exercise both auto-detection and explicit version settings in a minimal, focused way; behavior matches the parser snippets (auto from $schema/openapi, config taking precedence). No issues from a test-design perspective.


513-574: schema_paths coverage matches VersionMode and features design

The schema_paths_* and test_openapi_schema_paths_unchanged tests correctly cover:

  • Lenient mode: both definitions/$defs with version‑appropriate ordering.
  • Strict mode: only the version’s primary key.
  • OpenAPI: always using SCHEMA_PATHS independent of version mode.

This aligns with the JsonSchemaParser/OpenAPIParser schema_paths implementation and should guard against regressions.


838-955: CLI parameterization for schema-version/mode is appropriate

The parametrized CLI-style tests for --schema-version (JSON Schema/OpenAPI) and --schema-version-mode exercise both variants and confirm that stable output is produced (disable_timestamp=True plus snapshots where relevant). This should catch regressions in the CLI/config → parser wiring for the new options.

src/datamodel_code_generator/parser/graphql.py (1)

101-101: _config_class_name correctly hooks into dynamic config resolution

Setting _config_class_name: ClassVar[str] = "GraphQLParserConfig" is the right way to plug GraphQLParser into Parser._get_config_class() / _create_default_config(). This should make the new base logic pick up the GraphQL-specific config class without additional wiring.

src/datamodel_code_generator/parser/base.py (1)

22-23: Generic SchemaFeaturesT and abstract schema_features are well-structured

Binding SchemaFeaturesT to JsonSchemaFeatures and making Parser generic in both ParserConfigT and SchemaFeaturesT, with an abstract schema_features property, cleanly enforces at the type level that all parsers expose a consistent feature set object (or subclass). This matches how JsonSchemaParser/OpenAPIParser/GraphQLParser are now structured and should help static checkers catch misuse.

Also applies to: 93-97, 694-703

src/datamodel_code_generator/parser/jsonschema.py (1)

2928-2940: Strict-mode array feature checks are well placed

Hooking _check_array_version_features() at the top of parse_array_fields() gives good coverage for:

  • prefixItems usage where schema_features.prefix_items is False.
  • items as an array when schema_features.prefix_items is True (Draft 2020‑12+).

The behavior (warnings only in VersionMode.Strict, no change to parsing semantics) aligns with the version-mode intent and should be safe.

No changes needed here; just noting that the approach is sound.

Also applies to: 3712-3739

Comment on lines +331 to +567
## `--schema-version` {#schema-version}

Schema version to use for parsing.

The `--schema-version` option specifies the schema version to use instead of auto-detection.
Valid values depend on input type: JsonSchema (draft-04, draft-06, draft-07, 2019-09, 2020-12)
or OpenAPI (3.0, 3.1). Default is 'auto' (detected from $schema or openapi field).

!!! tip "Usage"

```bash
datamodel-codegen --input schema.json --schema-version draft-07 # (1)!
```

1. :material-arrow-left: `--schema-version` - the option documented here

??? example "Examples"

=== "OpenAPI"

**Input Schema:**

```yaml
openapi: "3.0.0"
info:
version: 1.0.0
title: Swagger Petstore
license:
name: MIT
servers:
- url: http://petstore.swagger.io/v1
paths:
/pets:
get:
summary: List all pets
operationId: listPets
tags:
- pets
parameters:
- name: limit
in: query
description: How many items to return at one time (max 100)
required: false
schema:
type: integer
format: int32
responses:
'200':
description: A paged array of pets
headers:
x-next:
description: A link to the next page of responses
schema:
type: string
content:
application/json:
schema:
$ref: "#/components/schemas/Pets"
default:
description: unexpected error
content:
application/json:
schema:
$ref: "#/components/schemas/Error"
x-amazon-apigateway-integration:
uri:
Fn::Sub: arn:aws:apigateway:${AWS::Region}:lambda:path/2015-03-31/functions/${PythonVersionFunction.Arn}/invocations
passthroughBehavior: when_no_templates
httpMethod: POST
type: aws_proxy
post:
summary: Create a pet
operationId: createPets
tags:
- pets
responses:
'201':
description: Null response
default:
description: unexpected error
content:
application/json:
schema:
$ref: "#/components/schemas/Error"
x-amazon-apigateway-integration:
uri:
Fn::Sub: arn:aws:apigateway:${AWS::Region}:lambda:path/2015-03-31/functions/${PythonVersionFunction.Arn}/invocations
passthroughBehavior: when_no_templates
httpMethod: POST
type: aws_proxy
/pets/{petId}:
get:
summary: Info for a specific pet
operationId: showPetById
tags:
- pets
parameters:
- name: petId
in: path
required: true
description: The id of the pet to retrieve
schema:
type: string
responses:
'200':
description: Expected response to a valid request
content:
application/json:
schema:
$ref: "#/components/schemas/Pets"
default:
description: unexpected error
content:
application/json:
schema:
$ref: "#/components/schemas/Error"
x-amazon-apigateway-integration:
uri:
Fn::Sub: arn:aws:apigateway:${AWS::Region}:lambda:path/2015-03-31/functions/${PythonVersionFunction.Arn}/invocations
passthroughBehavior: when_no_templates
httpMethod: POST
type: aws_proxy
components:
schemas:
Pet:
required:
- id
- name
properties:
id:
type: integer
format: int64
default: 1
name:
type: string
tag:
type: string
Pets:
type: array
items:
$ref: "#/components/schemas/Pet"
Users:
type: array
items:
required:
- id
- name
properties:
id:
type: integer
format: int64
name:
type: string
tag:
type: string
Id:
type: string
Rules:
type: array
items:
type: string
Error:
description: error result
required:
- code
- message
properties:
code:
type: integer
format: int32
message:
type: string
apis:
type: array
items:
type: object
properties:
apiKey:
type: string
description: To be used as a dataset parameter value
apiVersionNumber:
type: string
description: To be used as a version parameter value
apiUrl:
type: string
format: uri
description: "The URL describing the dataset's fields"
apiDocumentationUrl:
type: string
format: uri
description: A URL to the API console for each API
Event:
type: object
description: Event object
properties:
name:
type: string
Result:
type: object
properties:
event:
$ref: '#/components/schemas/Event'
```

**Output:**

> **Error:** File not found: openapi/api.py

=== "JSON Schema"

**Input Schema:**

```json
{
"$schema": "http://json-schema.org/draft-07/schema",
"type": "object",
"properties": {"s": {"type": ["string"]}},
"required": ["s"]
}
```

**Output:**

```python
# generated by datamodel-codegen:
# filename: simple_string.json

from __future__ import annotations

from pydantic import BaseModel


class Model(BaseModel):
s: str
```

---
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Improve example output and consider simplifying the OpenAPI example.

Issues with the documentation examples:

  1. Error message as output (line 537): The OpenAPI example shows > **Error:** File not found: openapi/api.py as the output, which appears to be a placeholder or incorrect content. This will confuse users.

  2. Verbose OpenAPI example (lines 354-533): The 180-line OpenAPI specification is excessive for demonstrating the --schema-version option. A simpler example would be more effective.

  3. Examples don't demonstrate the option's effect: Neither example clearly shows what happens when you specify --schema-version draft-07 versus auto-detection or a different version. Consider adding examples that highlight version-specific behavior (e.g., using features from different JSON Schema drafts).

📋 Suggested improvements

For the OpenAPI example, either:

  • Replace the error message with actual generated output
  • Use a much simpler OpenAPI spec (e.g., a single endpoint with one schema)

For demonstrating the option's purpose, consider adding an example that shows:

  • A schema using a draft-specific feature (e.g., exclusiveMinimum as boolean in Draft 4 vs. number in Draft 6+)
  • How specifying --schema-version affects the generated output or validation behavior
🤖 Prompt for AI Agents
In @docs/cli-reference/base-options.md around lines 331 - 567, Summary: The
OpenAPI example is overly verbose and contains a bogus error output, and neither
example demonstrates how --schema-version affects generation. Fix: replace the
large OpenAPI YAML under the "OpenAPI" example with a minimal single-endpoint
spec (keep the section title "OpenAPI") and provide actual generated output
instead of `> **Error:** File not found: openapi/api.py`; shorten or remove the
180-line spec in the diff and ensure the Usage snippet still shows the
datamodel-codegen invocation with --schema-version; add a small JSON Schema
example that uses a draft-specific feature (e.g., exclusiveMinimum as boolean vs
numeric) and include two short generated outputs labeled for draft-04 and
draft-07 to illustrate the effect of the --schema-version option; update the
examples sections ("OpenAPI" and "JSON Schema") to explicitly mention
--schema-version in their captions.

Comment on lines +569 to +613
## `--schema-version-mode` {#schema-version-mode}

Schema version validation mode.

The `--schema-version-mode` option controls how schema version validation is performed.
'lenient' (default): accept all features regardless of version.
'strict': warn on features outside the declared/detected version.

!!! tip "Usage"

```bash
datamodel-codegen --input schema.json --schema-version-mode lenient # (1)!
```

1. :material-arrow-left: `--schema-version-mode` - the option documented here

??? example "Examples"

**Input Schema:**

```json
{
"$schema": "http://json-schema.org/draft-07/schema",
"type": "object",
"properties": {"s": {"type": ["string"]}},
"required": ["s"]
}
```

**Output:**

```python
# generated by datamodel-codegen:
# filename: simple_string.json

from __future__ import annotations

from pydantic import BaseModel


class Model(BaseModel):
s: str
```

---
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add examples demonstrating lenient vs. strict mode behavior.

The current example doesn't illustrate the difference between lenient and strict modes. According to the description (lines 573-575), strict mode should "warn on features outside the declared/detected version," but the example shows identical output for both modes.

Consider adding examples that:

  1. Show a schema using features from a newer version than declared (e.g., prefixItems in a Draft 7 schema)
  2. Demonstrate the warnings generated in strict mode
  3. Show how lenient mode accepts these features without warnings

This will help users understand when and why to use each mode.

🤖 Prompt for AI Agents
In @docs/cli-reference/base-options.md around lines 569 - 613, Add two example
subsections under the `--schema-version-mode` section demonstrating the
difference between lenient and strict: create a JSON schema that declares
draft-07 but uses a draft-2020+ feature such as "prefixItems"; show the
datamodel-codegen invocation with `--schema-version-mode strict` and note the
expected warning output (mention the warning text or that a warning about
unsupported "prefixItems" or version mismatch appears), then show the invocation
with `--schema-version-mode lenient` and state that generation proceeds without
warnings and produces the same model; reference `--schema-version-mode`,
`strict`, `lenient`, and `prefixItems` so readers can locate the related option
and feature in the docs.

Comment on lines +746 to +767
# Convert schema_version string to appropriate enum based on input type
jsonschema_version: JsonSchemaVersion | None = None
openapi_version: OpenAPIVersion | None = None
if config.schema_version and config.schema_version != "auto":
if input_file_type == InputFileType.OpenAPI:
try:
openapi_version = OpenAPIVersion(config.schema_version)
except ValueError:
valid = [v.value for v in OpenAPIVersion]
msg = f"Invalid OpenAPI version: {config.schema_version}. Valid values: {valid}"
raise Error(msg) from None
elif input_file_type == InputFileType.GraphQL:
msg = f"--schema-version is not supported for {input_file_type.value}"
raise Error(msg)
else:
try:
jsonschema_version = JsonSchemaVersion(config.schema_version)
except ValueError:
valid = [v.value for v in JsonSchemaVersion]
msg = f"Invalid JSON Schema version: {config.schema_version}. Valid values: {valid}"
raise Error(msg) from None

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Explicit jsonschema_version/openapi_version options are shadowed when schema_version is unset

The new wiring for schema_version unintentionally overrides existing per-parser version settings:

  • openapi_version and jsonschema_version are always injected into openapi_additional_options / jsonschema_additional_options, even when the derived openapi_version / jsonschema_version variables are None.
  • _create_parser_config() then prefers keys present in additional_options over fields coming from GenerateConfig, so any GenerateConfig.openapi_version / GenerateConfig.jsonschema_version values (e.g. from --openapi-version or --jsonschema-version) are discarded and replaced with None whenever schema_version is not set.

This effectively breaks the older --openapi-version and --jsonschema-version flags unless --schema-version is also provided.

To preserve backwards compatibility and still let schema_version override when explicitly set, only inject these keys into additional_options when a concrete enum was derived from config.schema_version. For example:

Proposed fix
@@
-    jsonschema_version: JsonSchemaVersion | None = None
-    openapi_version: OpenAPIVersion | None = None
+    jsonschema_version: JsonSchemaVersion | None = None
+    openapi_version: OpenAPIVersion | None = None
@@
-    if input_file_type == InputFileType.OpenAPI:
+    if input_file_type == InputFileType.OpenAPI:
@@
-        openapi_additional_options: OpenAPIParserConfigDict = {
-            "openapi_scopes": config.openapi_scopes,
-            "include_path_parameters": config.include_path_parameters,
-            "use_status_code_in_response_name": config.use_status_code_in_response_name,
-            "openapi_include_paths": config.openapi_include_paths,
-            "openapi_version": openapi_version,
-            **additional_options,
-        }
+        openapi_additional_options: OpenAPIParserConfigDict = {
+            "openapi_scopes": config.openapi_scopes,
+            "include_path_parameters": config.include_path_parameters,
+            "use_status_code_in_response_name": config.use_status_code_in_response_name,
+            "openapi_include_paths": config.openapi_include_paths,
+            **additional_options,
+        }
+        if openapi_version is not None:
+            openapi_additional_options["openapi_version"] = openapi_version
@@
-    else:
+    else:
@@
-        jsonschema_additional_options: JSONSchemaParserConfigDict = {
-            "jsonschema_version": jsonschema_version,
-            **additional_options,
-        }
+        jsonschema_additional_options: JSONSchemaParserConfigDict = {
+            **additional_options,
+        }
+        if jsonschema_version is not None:
+            jsonschema_additional_options["jsonschema_version"] = jsonschema_version

This keeps current behavior for --schema-version while restoring support for dedicated --jsonschema-version / --openapi-version flags.

Also applies to: 768-799

Comment on lines +781 to +790
from datamodel_code_generator.parser.schema_version import ( # noqa: PLC0415
JsonSchemaFeatures,
detect_jsonschema_version,
)

config_version = getattr(self.config, "jsonschema_version", None)
if config_version is not None and config_version != JsonSchemaVersion.Auto:
return JsonSchemaFeatures.from_version(config_version)
version = detect_jsonschema_version(self.raw_obj) if self.raw_obj else JsonSchemaVersion.Auto
return JsonSchemaFeatures.from_version(version)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard _check_version_specific_features against non-dict, non-bool raw values

In parse_raw_obj(), _check_version_specific_features(raw, path) is called for every schema fragment before _validate_schema_object(), but _check_version_specific_features() assumes:

  • raw is bool (handled by the boolean-schema branch), or
  • raw is a dict with .get.

For other YamlValue shapes allowed by the type alias (e.g. lists, strings, numbers), the function will hit:

type_value = raw.get("type")

and raise AttributeError in Strict mode, bypassing the intended SchemaParseError wrapping from _validate_schema_object().

This is a behavior change for invalid or edge-case schemas when schema_version_mode == VersionMode.Strict, and could surprise users.

You can make the check robust by only inspecting dicts and booleans:

Proposed fix
     def _check_version_specific_features(
         self,
         raw: dict[str, YamlValue] | YamlValue,
         path: list[str],
     ) -> None:
@@
-        if self.config.schema_version_mode != VersionMode.Strict:
-            return
-
-        # Check boolean schemas (Draft 6+)
-        if isinstance(raw, bool):
+        if self.config.schema_version_mode != VersionMode.Strict:
+            return
+
+        # Only booleans and dict-like schemas are relevant here; anything else
+        # should be left to _validate_schema_object to error on.
+        # Check boolean schemas (Draft 6+)
+        if isinstance(raw, bool):
             if not self.schema_features.boolean_schemas:
@@
-        # Check null in type array (Draft 2020-12 / OpenAPI 3.1+)
-        type_value = raw.get("type")
+        if not isinstance(raw, dict):
+            return
+
+        # Check null in type array (Draft 2020-12 / OpenAPI 3.1+)
+        type_value = raw.get("type")

This preserves the new Strict-mode warnings without introducing new AttributeErrors for non-object raw values.

Also applies to: 3648-3711

🧰 Tools
🪛 Ruff (0.14.10)

781-781: Unused noqa directive (non-enabled: PLC0415)

Remove unused noqa directive

(RUF100)

🤖 Prompt for AI Agents
In @src/datamodel_code_generator/parser/jsonschema.py around lines 781 - 790,
_check_version_specific_features currently assumes raw is a bool or dict and
calls raw.get("type"), which raises AttributeError for other YamlValue shapes
(list, str, number); update _check_version_specific_features to first check
types (e.g., if isinstance(raw, bool): handle boolean branch; elif
isinstance(raw, dict): inspect raw.get(...); else: return/skip without accessing
attributes) so non-dict/non-bool values are safely ignored and errors remain
wrapped by _validate_schema_object (apply the same type-guard change to the
other copy of this logic around the 3648-3711 region).

@github-actions
Copy link
Copy Markdown
Contributor

🎉 Released in 0.53.0

This PR is now available in the latest release. See the release notes for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants