Add AST-based type string parsing helpers#2856
Conversation
📝 WalkthroughWalkthroughNew utility functions for type parsing are added to Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Add three helper functions to types.py for robust AST-based parsing of Python type annotation strings: - get_type_base_name(): Extract base type name (e.g., "List[str]" -> "List") - get_subscript_args(): Extract type arguments (e.g., "Dict[str, int]" -> ["str", "int"]) - extract_qualified_names(): Extract fully qualified names for import handling Refactor jsonschema.py to use these helpers: - _get_python_type_flags now uses get_type_base_name and get_subscript_args - _get_python_type_base now uses get_type_base_name - Added support for union operator (|) syntax in type flag detection This provides a solid foundation for handling x-python-type qualified name imports in a follow-up PR.
b5d059d to
6a5249c
Compare
|
📚 Docs Preview: https://pr-2856.datamodel-code-generator.pages.dev |
CodSpeed Performance ReportMerging #2856 will not alter performanceComparing
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2856 +/- ##
==========================================
+ Coverage 99.50% 99.52% +0.02%
==========================================
Files 90 90
Lines 14869 14924 +55
Branches 1781 1786 +5
==========================================
+ Hits 14795 14853 +58
Misses 38 38
+ Partials 36 33 -3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Add comprehensive tests for the three new helper functions: - get_type_base_name: 14 test cases - get_subscript_args: 18 test cases - extract_qualified_names: 20 test cases Achieves 100% diff coverage for the new code.
Add parametrized test with 25 cases covering: - Direct matches for special container types (Set, FrozenSet, Mapping, etc.) - Union types with special containers - Union types without special containers (completes loop without match) - Non-special container types This ensures 100% diff coverage for jsonschema.py line 1324.
7ac770b to
6763c63
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/datamodel_code_generator/types.py (1)
10-217: AST-based type parsing helpers look correct; consider Python version/ast.unparsecompatibility.The three helpers (
get_type_base_name,get_subscript_args,extract_qualified_names) are well-structured:
- AST-first with sensible string fallbacks on
SyntaxError.- Correct handling of
Subscript,Attribute,Name, and|unions (including nestedBitOrchains).extract_qualified_namescorrectly prefers the longest attribute chain and avoids counting intermediate nodes twice.Two small follow-ups:
ast.unparseis used inget_subscript_args. This is only available in Python ≥ 3.9. If this project still supports 3.8, these helpers will raiseAttributeErrorat runtime. Either:
- Guard usage with
hasattr(ast, "unparse")and fall back to a simpler string-based reconstruction, or- Explicitly require Python ≥ 3.9 and ensure tooling/docs are aligned.
extract_qualified_namesmay return the same qualified name multiple times if it appears more than once in the annotation. If you ever need uniqueness, you might want to returnlist(dict.fromkeys(qualified_names))to preserve order while deduplicating; current behavior is fine if duplicates are acceptable.Also applies to: 219-255, 258-295
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/datamodel_code_generator/parser/jsonschema.pysrc/datamodel_code_generator/types.pytests/parser/test_jsonschema.pytests/test_types.py
🧰 Additional context used
🧬 Code graph analysis (3)
tests/parser/test_jsonschema.py (1)
src/datamodel_code_generator/parser/jsonschema.py (2)
JsonSchemaObject(207-481)_get_python_type_flags(1293-1329)
tests/test_types.py (1)
src/datamodel_code_generator/types.py (5)
_remove_none_from_union(297-358)extract_qualified_names(258-294)get_optional_type(362-370)get_subscript_args(219-255)get_type_base_name(195-216)
src/datamodel_code_generator/parser/jsonschema.py (1)
src/datamodel_code_generator/types.py (2)
get_subscript_args(219-255)get_type_base_name(195-216)
🪛 Ruff (0.14.10)
src/datamodel_code_generator/parser/jsonschema.py
1331-1331: Unused noqa directive (non-enabled: PLR6301)
Remove unused noqa directive
(RUF100)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: py312-black22 on Ubuntu
- GitHub Check: 3.10 on Windows
- GitHub Check: 3.14 on Windows
- GitHub Check: 3.11 on Windows
- GitHub Check: 3.13 on Windows
- GitHub Check: 3.12 on Windows
- GitHub Check: Analyze (python)
- GitHub Check: benchmarks
🔇 Additional comments (4)
tests/test_types.py (1)
7-13: Comprehensive coverage for new type parsing helpers.The new parametrized tests for
get_type_base_name,get_subscript_args, andextract_qualified_namesexercise a good mix of simple, subscripted, union (Union[...]and|), qualified, and invalid inputs, matching the helpers’ behavior (including fallbacks). No issues spotted.Also applies to: 284-382
tests/parser/test_jsonschema.py (1)
1186-1225: New_get_python_type_flagstests align with parser behavior.The parametrized
test_get_python_type_flagsnicely covers:
- Direct special containers (
Set,FrozenSet,Sequence,Mapping, mutable/abstract variants).- Union/optional forms (both
Union[...]and|syntax) where a special container appears among other types.- Negative cases where no special container is present.
Expectations match the updated
_get_python_type_flagsimplementation.src/datamodel_code_generator/parser/jsonschema.py (2)
71-81: Use of shared AST helpers in_get_python_type_flagslooks correct.Switching to
get_type_base_name/get_subscript_argsimproves robustness:
- Direct special containers (
Set,FrozenSet,Sequence,Mapping, etc.) are caught viabase_type.- Unions/optionals (both
Union[...]and" | ") correctly scan subscript/union arguments and return the first matching special container, which matches the new tests.- Qualified names like
typing.Set[int]still resolve to the simple base name via the helper.No behavioral issues spotted.
Also applies to: 1319-1328
1331-1334: Remove now-unnecessarynoqaon_get_python_type_base.The function is still an instance method (needed because of
snooper_to_methods), but the# noqa: PLR6301comment is reported as unused (RUF100: unused noqa). You can safely drop it:Suggested cleanup
- def _get_python_type_base(self, python_type: str) -> str: # noqa: PLR6301 + def _get_python_type_base(self, python_type: str) -> str: """Extract base type from a Python type annotation string.""" return get_type_base_name(python_type)⛔ Skipped due to learnings
Learnt from: koxudaxi Repo: koxudaxi/datamodel-code-generator PR: 2681 File: tests/cli_doc/test_cli_doc_coverage.py:82-82 Timestamp: 2025-12-18T13:43:16.235Z Learning: In datamodel-code-generator project, Ruff preview mode is enabled via `lint.preview = true` in pyproject.toml. This enables preview rules like PLR6301 (no-self-use), so `noqa: PLR6301` directives are necessary and should not be removed even if RUF100 suggests they are unused.Learnt from: koxudaxi Repo: koxudaxi/datamodel-code-generator PR: 2799 File: src/datamodel_code_generator/model/pydantic/__init__.py:43-43 Timestamp: 2025-12-25T09:22:22.481Z Learning: In datamodel-code-generator project, defensive `# noqa: PLC0415` directives should be kept on lazy imports (imports inside functions/methods) even when Ruff reports them as unused via RUF100, to prepare for potential future Ruff configuration changes that might enable the import-outside-top-level rule.
Breaking Change AnalysisResult: No breaking changes detected Reasoning: This PR adds new internal helper functions for AST-based type string parsing and refactors existing internal methods to use them. The changes are: (1) Three new helper functions added to types.py (get_type_base_name, get_subscript_args, extract_qualified_names) - these are not exported from the public API, (2) Refactored private methods _get_python_type_flags and _get_python_type_base to use the new helpers - these are underscore-prefixed internal methods, (3) Added support for union operator (|) syntax in _get_python_type_flags - this is an additive enhancement that makes previously unhandled cases work correctly. No code generation output changes, no CLI/API changes, no template changes required. The fallback behavior for invalid syntax is preserved. All changes are internal implementation details that don't affect the public interface. This analysis was performed by Claude Code Action |
|
🎉 Released in 0.51.0 This PR is now available in the latest release. See the release notes for details. |
Summary by CodeRabbit
Release Notes
New Features
Refactor
Tests
✏️ Tip: You can customize this high-level summary in your review settings.