Skip to content

Add __hash__ to Pydantic v2 models used in sets#2918

Merged
koxudaxi merged 4 commits intomainfrom
fix/set-item-hash-1614
Jan 4, 2026
Merged

Add __hash__ to Pydantic v2 models used in sets#2918
koxudaxi merged 4 commits intomainfrom
fix/set-item-hash-1614

Conversation

@koxudaxi
Copy link
Copy Markdown
Owner

@koxudaxi koxudaxi commented Jan 4, 2026

Fixes: #1614

Summary by CodeRabbit

  • New Features

    • Generated models that appear in set/frozenset contexts are now made hashable so they can be used in hash-based collections.
    • Pydantic outputs include hash support enabling use with both generic and standard collection types.
  • Tests

    • Added tests for unique-items-as-set scenarios covering enum and model items to verify generated outputs.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jan 4, 2026

📝 Walkthrough

Walkthrough

Adds a per-model extra_template_data attribute and parser logic to detect models used as set/frozenset items, marking those models as hashable so generated classes include __hash__ = object.__hash__. Tests and expected fixtures updated accordingly.

Changes

Cohort / File(s) Change Summary
Model metadata
src/datamodel_code_generator/model/base.py
Add public instance attribute annotation extra_template_data: dict[str, Any] on DataModel in __init__ so per-model template metadata exists on instances.
Parser: set-item hashability
src/datamodel_code_generator/parser/base.py
Add private helpers __collect_set_item_references(...) and __mark_set_item_models_hashable(...) and invoke hashability marking during module finalization (_finalize_modules) so models used as set/frozenset items get flagged for __hash__ generation.
Generated fixtures
tests/data/expected/main/jsonschema/unique_items_enum_set.py, tests/data/expected/main/openapi/..._use_generic_container_types_set.py, tests/data/expected/main/openapi/..._use_standard_collections_set.py
Generated model classes that are set/frozenset items now include __hash__ = object.__hash__.
Tests
tests/main/jsonschema/test_main_jsonschema.py
Add test_unique_items_enum_set() to assert generation with --use-unique-items-as-set (and standard/generic collection options) produces expected hashable models.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant CLI as Generator/CLI
    participant Parser as Parser
    participant Registry as Model Registry
    participant Renderer as Template Renderer
    participant FS as File Output

    CLI->>Parser: parse schemas (unique-items-as-set enabled)
    Parser->>Registry: build DataModel instances
    note right of Registry `#E8F5E9`: DataModel.extra_template_data\ninitialized per-instance
    Parser->>Parser: collect set/frozenset item type references
    Parser->>Registry: mark referenced models (set_item_hashable = True)
    Registry->>Renderer: render models (reads extra_template_data / flags)
    Renderer->>FS: write files (emit __hash__ where flagged)
    FS-->>CLI: generation complete
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

breaking-change-analyzed

Poem

🐰 I hop through code with careful cheer,
I tuck metadata close and bring hash near.
Sets now embrace each model I meet,
No TypeError stops our hopping feet. 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately describes the main change: adding hash to Pydantic v2 models used in sets, which is the core objective of the PR.
Linked Issues check ✅ Passed The PR addresses issue #1614 by implementing automatic hash assignment to Pydantic v2 models used as set items, resolving the TypeError when adding model instances to sets.
Out of Scope Changes check ✅ Passed All changes are scoped to adding hash functionality for set items: DataModel attribute for tracking, parser methods for detection/marking, generated test files, and test cases verifying the feature.
✨ Finishing touches
  • 📝 Generate docstrings

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5de128d and 4249d52.

⛔ Files ignored due to path filters (2)
  • src/datamodel_code_generator/model/template/pydantic_v2/BaseModel.jinja2 is excluded by none and included by none
  • src/datamodel_code_generator/model/template/pydantic_v2/RootModel.jinja2 is excluded by none and included by none
📒 Files selected for processing (1)
  • src/datamodel_code_generator/parser/base.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/datamodel_code_generator/parser/base.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: 3.12 on Windows
  • GitHub Check: 3.14 on Windows
  • GitHub Check: 3.10 on macOS
  • GitHub Check: 3.12 on macOS
  • GitHub Check: 3.10 on Windows
  • GitHub Check: 3.11 on Windows
  • GitHub Check: 3.11 on macOS
  • GitHub Check: 3.13 on macOS
  • GitHub Check: 3.14 on macOS
  • GitHub Check: 3.13 on Windows
  • GitHub Check: Analyze (python)
  • GitHub Check: benchmarks

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jan 4, 2026

📚 Docs Preview: https://pr-2918.datamodel-code-generator.pages.dev

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Jan 4, 2026

CodSpeed Performance Report

Merging #2918 will not alter performance

Comparing fix/set-item-hash-1614 (4249d52) with main (960f7f9)

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

Summary

✅ 11 untouched
⏩ 98 skipped1

Footnotes

  1. 98 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/datamodel_code_generator/model/base.py (1)

681-693: Type annotation mismatch with defaultdict usage.

Line 681 declares extra_template_data: dict[str, Any], but line 693 assigns defaultdict(dict) when the parameter is None. While defaultdict is a subclass of dict, the type annotation should accurately reflect the actual type for proper type checking.

🔎 Proposed fix
-        self.extra_template_data: dict[str, Any]
+        self.extra_template_data: dict[str, Any] | defaultdict[str, dict[str, Any]]
         if extra_template_data is not None:

Alternatively, ensure consistent typing by always using dict:

         else:
-            self.extra_template_data = defaultdict(dict)
+            self.extra_template_data = {}
🧹 Nitpick comments (1)
tests/main/jsonschema/test_main_jsonschema.py (1)

7908-7923: Enum/set regression test is correctly wired; minor docstring nit only

The test is consistent with the rest of the suite: correct skip marker for Pydantic v2, expected fixture path, and CLI args that exercise --use-unique-items-as-set with standard collections. This should reliably catch regressions in the codegen for enum-in-set scenarios.

If you want to be slightly clearer, you could extend the docstring to also mention the positive case (that the generated item model in the same fixture gets __hash__) since the golden file likely asserts both aspects, not just “no __hash__ on the enum”. Otherwise this looks good as-is.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 960f7f9 and b98aae8.

⛔ Files ignored due to path filters (3)
  • src/datamodel_code_generator/model/template/pydantic_v2/BaseModel.jinja2 is excluded by none and included by none
  • src/datamodel_code_generator/model/template/pydantic_v2/RootModel.jinja2 is excluded by none and included by none
  • tests/data/jsonschema/unique_items_enum_set.json is excluded by !tests/data/**/*.json and included by none
📒 Files selected for processing (6)
  • src/datamodel_code_generator/model/base.py
  • src/datamodel_code_generator/parser/base.py
  • tests/data/expected/main/jsonschema/unique_items_enum_set.py
  • tests/data/expected/main/openapi/with_field_constraints_pydantic_v2_use_generic_container_types_set.py
  • tests/data/expected/main/openapi/with_field_constraints_pydantic_v2_use_standard_collections_set.py
  • tests/main/jsonschema/test_main_jsonschema.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-01-02T08:25:19.839Z
Learnt from: koxudaxi
Repo: koxudaxi/datamodel-code-generator PR: 2890
File: tests/data/expected/main/jsonschema/ref_nullable_with_constraint.py:14-15
Timestamp: 2026-01-02T08:25:19.839Z
Learning: The datamodel-code-generator currently generates RootModel subclasses with an explicit `root` field annotation (e.g., `class StringType(RootModel[str]): root: str`). This is existing behavior of the code generator and should not be flagged as an issue introduced by new changes.

Applied to files:

  • src/datamodel_code_generator/model/base.py
🧬 Code graph analysis (2)
tests/data/expected/main/jsonschema/unique_items_enum_set.py (2)
src/datamodel_code_generator/model/enum.py (1)
  • Enum (39-121)
src/datamodel_code_generator/model/base.py (1)
  • name (829-831)
src/datamodel_code_generator/parser/base.py (2)
src/datamodel_code_generator/model/base.py (2)
  • all_data_types (877-881)
  • path (910-912)
src/datamodel_code_generator/types.py (1)
  • all_data_types (576-582)
🪛 GitHub Actions: Lint
src/datamodel_code_generator/parser/base.py

[error] 1551-1552: PLR6301 Method __mark_set_item_models_hashable could be a function, class method, or static method


[error] 1555-1563: PLR1702 Too many nested blocks (7 > 5)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: 3.10 on Windows
  • GitHub Check: py312-pydantic1 on Ubuntu
  • GitHub Check: py312-isort5 on Ubuntu
  • GitHub Check: 3.12 on Windows
  • GitHub Check: 3.13 on Windows
  • GitHub Check: 3.12 on macOS
  • GitHub Check: 3.11 on Ubuntu
  • GitHub Check: 3.12 on Ubuntu
  • GitHub Check: 3.11 on Windows
  • GitHub Check: 3.14 on Windows
  • GitHub Check: benchmarks
  • GitHub Check: Analyze (python)
🔇 Additional comments (4)
tests/data/expected/main/openapi/with_field_constraints_pydantic_v2_use_standard_collections_set.py (1)

10-15: LGTM! Correct implementation of hashability for set usage.

The addition of __hash__ = object.__hash__ correctly enables Pet instances to be used in sets. This matches the solution described in issue #1614 and follows Pydantic v2 best practices for making model instances hashable.

tests/data/expected/main/jsonschema/unique_items_enum_set.py (1)

1-25: LGTM! Demonstrates correct discriminated hashability handling.

This test fixture correctly shows that:

  • Enum types (Status) don't receive __hash__ since they're already hashable by default
  • Pydantic models (Item) used in sets receive __hash__ = object.__hash__

This validates that the __mark_set_item_models_hashable method properly skips Enum types while marking BaseModel types.

tests/data/expected/main/openapi/with_field_constraints_pydantic_v2_use_generic_container_types_set.py (1)

12-16: LGTM! Validates frozenset support.

The addition of __hash__ = object.__hash__ correctly enables Pet instances to be used in frozensets. This confirms that the implementation handles both set and frozenset collection types appropriately.

src/datamodel_code_generator/parser/base.py (1)

2953-2954: LGTM! Correct integration in processing flow.

The call to __mark_set_item_models_hashable is correctly placed immediately after __replace_unique_list_to_set, ensuring that models are marked as hashable after their field types have been converted from list to set.

Comment thread src/datamodel_code_generator/parser/base.py Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Jan 4, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (1b931d5) to head (4249d52).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##              main     #2918   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           93        93           
  Lines        16865     16913   +48     
  Branches      1952      1966   +14     
=========================================
+ Hits         16865     16913   +48     
Flag Coverage Δ
unittests 100.00% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@koxudaxi koxudaxi merged commit 58e73ed into main Jan 4, 2026
38 checks passed
@koxudaxi koxudaxi deleted the fix/set-item-hash-1614 branch January 4, 2026 07:26
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jan 4, 2026

Breaking Change Analysis

Result: No breaking changes detected

Reasoning: This PR adds __hash__ = object.__hash__ to Pydantic v2 models that are used as set/frozenset items, enabling them to be hashable. This is purely additive functionality that: (1) doesn't affect existing code that doesn't use sets, (2) adds backward-compatible hash support using object identity, (3) doesn't change any CLI/API options, and (4) custom templates will gracefully ignore the new class_body_lines variable if not updated (Jinja2 silently handles undefined variables in for loops). The generated code remains valid Python and Pydantic models.


This analysis was performed by Claude Code Action

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jan 5, 2026

🎉 Released in 0.52.2

This PR is now available in the latest release. See the release notes for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Missing hash function for a class

1 participant