Skip to content

Add AST-based type string parsing helpers#2856

Merged
koxudaxi merged 3 commits intomainfrom
refactor/ast-type-parser-helper
Dec 30, 2025
Merged

Add AST-based type string parsing helpers#2856
koxudaxi merged 3 commits intomainfrom
refactor/ast-type-parser-helper

Conversation

@koxudaxi
Copy link
Copy Markdown
Owner

@koxudaxi koxudaxi commented Dec 30, 2025

Summary by CodeRabbit

Release Notes

  • New Features

    • Added type annotation parsing utilities for extracting base type names, subscript arguments, and fully qualified names from type expressions, with enhanced support for modern union syntax and nested types.
  • Refactor

    • Improved internal type-base extraction and union type handling logic for more robust and maintainable processing of complex type expressions.
  • Tests

    • Expanded test coverage for new type parsing utilities and Python type flag interpretation across various type scenarios.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Dec 30, 2025

📝 Walkthrough

Walkthrough

New utility functions for type parsing are added to types.py, including get_type_base_name, get_subscript_args, and extract_qualified_names. The jsonschema.py parser is refactored to use these utilities, replacing custom type-parsing logic in _get_python_type_flags and _get_python_type_base for more robust union and base-type extraction.

Changes

Cohort / File(s) Summary
Type Utilities
src/datamodel_code_generator/types.py
Added three new utility functions: get_type_base_name() (extracts base type via AST with fallback parsing), get_subscript_args() (extracts type arguments from subscripted types including union-operator format), and extract_qualified_names() (finds fully qualified names in type annotations). Includes ast module import.
JSON Schema Parser Refactoring
src/datamodel_code_generator/parser/jsonschema.py
Updated imports to include get_subscript_args and get_type_base_name. Refactored _get_python_type_flags() to use get_type_base_name() for base-type detection and iterate over get_subscript_args() for union handling. Simplified _get_python_type_base() to use get_type_base_name() directly.
Parser Test Coverage
tests/parser/test_jsonschema.py
Added parameterized test test_get_python_type_flags() covering direct types, unions, and container scenarios to validate flag extraction (e.g., is_set, is_frozen_set, is_sequence, is_mapping).
Type Utilities Test Coverage
tests/test_types.py
Added test cases for get_type_base_name(), get_subscript_args(), and extract_qualified_names() with scenarios including simple types, subscripted types, unions, and qualified-name expressions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

breaking-change-analyzed

Poem

🐰 A rabbit hops through type trees tall,
With AST parsing standing guard,
Base names and subscripts, union and all—
No more custom parsing, oh so hard!
Type extraction blooms, fresh and new, 🌿
Robust and simple, tested true!

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add AST-based type string parsing helpers' accurately and specifically describes the main change: introducing new utility functions using AST for parsing type annotation strings.
Docstring Coverage ✅ Passed Docstring coverage is 87.50% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Add three helper functions to types.py for robust AST-based parsing of
Python type annotation strings:

- get_type_base_name(): Extract base type name (e.g., "List[str]" -> "List")
- get_subscript_args(): Extract type arguments (e.g., "Dict[str, int]" -> ["str", "int"])
- extract_qualified_names(): Extract fully qualified names for import handling

Refactor jsonschema.py to use these helpers:
- _get_python_type_flags now uses get_type_base_name and get_subscript_args
- _get_python_type_base now uses get_type_base_name
- Added support for union operator (|) syntax in type flag detection

This provides a solid foundation for handling x-python-type qualified
name imports in a follow-up PR.
@koxudaxi koxudaxi force-pushed the refactor/ast-type-parser-helper branch from b5d059d to 6a5249c Compare December 30, 2025 02:31
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Dec 30, 2025

📚 Docs Preview: https://pr-2856.datamodel-code-generator.pages.dev

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Dec 30, 2025

CodSpeed Performance Report

Merging #2856 will not alter performance

Comparing refactor/ast-type-parser-helper (6763c63) with main (7a4709c)

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

Summary

✅ 11 untouched
⏩ 98 skipped1

Footnotes

  1. 98 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 30, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.52%. Comparing base (7a4709c) to head (6763c63).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2856      +/-   ##
==========================================
+ Coverage   99.50%   99.52%   +0.02%     
==========================================
  Files          90       90              
  Lines       14869    14924      +55     
  Branches     1781     1786       +5     
==========================================
+ Hits        14795    14853      +58     
  Misses         38       38              
+ Partials       36       33       -3     
Flag Coverage Δ
unittests 99.52% <100.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Add comprehensive tests for the three new helper functions:
- get_type_base_name: 14 test cases
- get_subscript_args: 18 test cases
- extract_qualified_names: 20 test cases

Achieves 100% diff coverage for the new code.
Add parametrized test with 25 cases covering:
- Direct matches for special container types (Set, FrozenSet, Mapping, etc.)
- Union types with special containers
- Union types without special containers (completes loop without match)
- Non-special container types

This ensures 100% diff coverage for jsonschema.py line 1324.
@koxudaxi koxudaxi force-pushed the refactor/ast-type-parser-helper branch from 7ac770b to 6763c63 Compare December 30, 2025 04:19
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/datamodel_code_generator/types.py (1)

10-217: AST-based type parsing helpers look correct; consider Python version/ast.unparse compatibility.

The three helpers (get_type_base_name, get_subscript_args, extract_qualified_names) are well-structured:

  • AST-first with sensible string fallbacks on SyntaxError.
  • Correct handling of Subscript, Attribute, Name, and | unions (including nested BitOr chains).
  • extract_qualified_names correctly prefers the longest attribute chain and avoids counting intermediate nodes twice.

Two small follow-ups:

  1. ast.unparse is used in get_subscript_args. This is only available in Python ≥ 3.9. If this project still supports 3.8, these helpers will raise AttributeError at runtime. Either:

    • Guard usage with hasattr(ast, "unparse") and fall back to a simpler string-based reconstruction, or
    • Explicitly require Python ≥ 3.9 and ensure tooling/docs are aligned.
  2. extract_qualified_names may return the same qualified name multiple times if it appears more than once in the annotation. If you ever need uniqueness, you might want to return list(dict.fromkeys(qualified_names)) to preserve order while deduplicating; current behavior is fine if duplicates are acceptable.

Also applies to: 219-255, 258-295

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7a4709c and 6763c63.

📒 Files selected for processing (4)
  • src/datamodel_code_generator/parser/jsonschema.py
  • src/datamodel_code_generator/types.py
  • tests/parser/test_jsonschema.py
  • tests/test_types.py
🧰 Additional context used
🧬 Code graph analysis (3)
tests/parser/test_jsonschema.py (1)
src/datamodel_code_generator/parser/jsonschema.py (2)
  • JsonSchemaObject (207-481)
  • _get_python_type_flags (1293-1329)
tests/test_types.py (1)
src/datamodel_code_generator/types.py (5)
  • _remove_none_from_union (297-358)
  • extract_qualified_names (258-294)
  • get_optional_type (362-370)
  • get_subscript_args (219-255)
  • get_type_base_name (195-216)
src/datamodel_code_generator/parser/jsonschema.py (1)
src/datamodel_code_generator/types.py (2)
  • get_subscript_args (219-255)
  • get_type_base_name (195-216)
🪛 Ruff (0.14.10)
src/datamodel_code_generator/parser/jsonschema.py

1331-1331: Unused noqa directive (non-enabled: PLR6301)

Remove unused noqa directive

(RUF100)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: py312-black22 on Ubuntu
  • GitHub Check: 3.10 on Windows
  • GitHub Check: 3.14 on Windows
  • GitHub Check: 3.11 on Windows
  • GitHub Check: 3.13 on Windows
  • GitHub Check: 3.12 on Windows
  • GitHub Check: Analyze (python)
  • GitHub Check: benchmarks
🔇 Additional comments (4)
tests/test_types.py (1)

7-13: Comprehensive coverage for new type parsing helpers.

The new parametrized tests for get_type_base_name, get_subscript_args, and extract_qualified_names exercise a good mix of simple, subscripted, union (Union[...] and |), qualified, and invalid inputs, matching the helpers’ behavior (including fallbacks). No issues spotted.

Also applies to: 284-382

tests/parser/test_jsonschema.py (1)

1186-1225: New _get_python_type_flags tests align with parser behavior.

The parametrized test_get_python_type_flags nicely covers:

  • Direct special containers (Set, FrozenSet, Sequence, Mapping, mutable/abstract variants).
  • Union/optional forms (both Union[...] and | syntax) where a special container appears among other types.
  • Negative cases where no special container is present.

Expectations match the updated _get_python_type_flags implementation.

src/datamodel_code_generator/parser/jsonschema.py (2)

71-81: Use of shared AST helpers in _get_python_type_flags looks correct.

Switching to get_type_base_name / get_subscript_args improves robustness:

  • Direct special containers (Set, FrozenSet, Sequence, Mapping, etc.) are caught via base_type.
  • Unions/optionals (both Union[...] and " | ") correctly scan subscript/union arguments and return the first matching special container, which matches the new tests.
  • Qualified names like typing.Set[int] still resolve to the simple base name via the helper.

No behavioral issues spotted.

Also applies to: 1319-1328


1331-1334: Remove now-unnecessary noqa on _get_python_type_base.

The function is still an instance method (needed because of snooper_to_methods), but the # noqa: PLR6301 comment is reported as unused (RUF100: unused noqa). You can safely drop it:

Suggested cleanup
-    def _get_python_type_base(self, python_type: str) -> str:  # noqa: PLR6301
+    def _get_python_type_base(self, python_type: str) -> str:
         """Extract base type from a Python type annotation string."""
         return get_type_base_name(python_type)
⛔ Skipped due to learnings
Learnt from: koxudaxi
Repo: koxudaxi/datamodel-code-generator PR: 2681
File: tests/cli_doc/test_cli_doc_coverage.py:82-82
Timestamp: 2025-12-18T13:43:16.235Z
Learning: In datamodel-code-generator project, Ruff preview mode is enabled via `lint.preview = true` in pyproject.toml. This enables preview rules like PLR6301 (no-self-use), so `noqa: PLR6301` directives are necessary and should not be removed even if RUF100 suggests they are unused.
Learnt from: koxudaxi
Repo: koxudaxi/datamodel-code-generator PR: 2799
File: src/datamodel_code_generator/model/pydantic/__init__.py:43-43
Timestamp: 2025-12-25T09:22:22.481Z
Learning: In datamodel-code-generator project, defensive `# noqa: PLC0415` directives should be kept on lazy imports (imports inside functions/methods) even when Ruff reports them as unused via RUF100, to prepare for potential future Ruff configuration changes that might enable the import-outside-top-level rule.

@koxudaxi koxudaxi merged commit b9bee27 into main Dec 30, 2025
37 checks passed
@koxudaxi koxudaxi deleted the refactor/ast-type-parser-helper branch December 30, 2025 04:25
@github-actions
Copy link
Copy Markdown
Contributor

Breaking Change Analysis

Result: No breaking changes detected

Reasoning: This PR adds new internal helper functions for AST-based type string parsing and refactors existing internal methods to use them. The changes are: (1) Three new helper functions added to types.py (get_type_base_name, get_subscript_args, extract_qualified_names) - these are not exported from the public API, (2) Refactored private methods _get_python_type_flags and _get_python_type_base to use the new helpers - these are underscore-prefixed internal methods, (3) Added support for union operator (|) syntax in _get_python_type_flags - this is an additive enhancement that makes previously unhandled cases work correctly. No code generation output changes, no CLI/API changes, no template changes required. The fallback behavior for invalid syntax is preserved. All changes are internal implementation details that don't affect the public interface.


This analysis was performed by Claude Code Action

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jan 1, 2026

🎉 Released in 0.51.0

This PR is now available in the latest release. See the release notes for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant