Add __hash__ to Pydantic v2 models used in sets by koxudaxi · Pull Request #2918 · koxudaxi/datamodel-code-generator

koxudaxi · 2026-01-04T06:10:33Z

Summary by CodeRabbit

New Features
- Generated models that appear in set/frozenset contexts are now made hashable so they can be used in hash-based collections.
- Pydantic outputs include hash support enabling use with both generic and standard collection types.
Tests
- Added tests for unique-items-as-set scenarios covering enum and model items to verify generated outputs.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-04T06:10:43Z

📝 Walkthrough

Walkthrough

Adds a per-model extra_template_data attribute and parser logic to detect models used as set/frozenset items, marking those models as hashable so generated classes include __hash__ = object.__hash__. Tests and expected fixtures updated accordingly.

Changes

Cohort / File(s)	Change Summary
Model metadata `src/datamodel_code_generator/model/base.py`	Add public instance attribute annotation `extra_template_data: dict[str, Any]` on `DataModel` in `__init__` so per-model template metadata exists on instances.
Parser: set-item hashability `src/datamodel_code_generator/parser/base.py`	Add private helpers `__collect_set_item_references(...)` and `__mark_set_item_models_hashable(...)` and invoke hashability marking during module finalization (`_finalize_modules`) so models used as set/frozenset items get flagged for `__hash__` generation.
Generated fixtures `tests/data/expected/main/jsonschema/unique_items_enum_set.py`, `tests/data/expected/main/openapi/..._use_generic_container_types_set.py`, `tests/data/expected/main/openapi/..._use_standard_collections_set.py`	Generated model classes that are set/frozenset items now include `__hash__ = object.__hash__`.
Tests `tests/main/jsonschema/test_main_jsonschema.py`	Add `test_unique_items_enum_set()` to assert generation with `--use-unique-items-as-set` (and standard/generic collection options) produces expected hashable models.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant CLI as Generator/CLI
    participant Parser as Parser
    participant Registry as Model Registry
    participant Renderer as Template Renderer
    participant FS as File Output

    CLI->>Parser: parse schemas (unique-items-as-set enabled)
    Parser->>Registry: build DataModel instances
    note right of Registry `#E8F5E9`: DataModel.extra_template_data\ninitialized per-instance
    Parser->>Parser: collect set/frozenset item type references
    Parser->>Registry: mark referenced models (set_item_hashable = True)
    Registry->>Renderer: render models (reads extra_template_data / flags)
    Renderer->>FS: write files (emit __hash__ where flagged)
    FS-->>CLI: generation complete

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Fix --use-unique-items-as-set to output set literals for default values #2672: Related changes touching set-handling and hashability detection in parser and model generation.

Suggested labels

breaking-change-analyzed

Poem

🐰 I hop through code with careful cheer,
I tuck metadata close and bring hash near.
Sets now embrace each model I meet,
No TypeError stops our hopping feet. 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title accurately describes the main change: adding hash to Pydantic v2 models used in sets, which is the core objective of the PR.
Linked Issues check	✅ Passed	The PR addresses issue #1614 by implementing automatic hash assignment to Pydantic v2 models used as set items, resolving the TypeError when adding model instances to sets.
Out of Scope Changes check	✅ Passed	All changes are scoped to adding hash functionality for set items: DataModel attribute for tracking, parser methods for detection/marking, generated test files, and test cases verifying the feature.

✨ Finishing touches

📝 Generate docstrings

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5de128d and 4249d52.

⛔ Files ignored due to path filters (2)

src/datamodel_code_generator/model/template/pydantic_v2/BaseModel.jinja2 is excluded by none and included by none
src/datamodel_code_generator/model/template/pydantic_v2/RootModel.jinja2 is excluded by none and included by none

📒 Files selected for processing (1)

src/datamodel_code_generator/parser/base.py

🚧 Files skipped from review as they are similar to previous changes (1)

src/datamodel_code_generator/parser/base.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)

GitHub Check: 3.12 on Windows
GitHub Check: 3.14 on Windows
GitHub Check: 3.10 on macOS
GitHub Check: 3.12 on macOS
GitHub Check: 3.10 on Windows
GitHub Check: 3.11 on Windows
GitHub Check: 3.11 on macOS
GitHub Check: 3.13 on macOS
GitHub Check: 3.14 on macOS
GitHub Check: 3.13 on Windows
GitHub Check: Analyze (python)
GitHub Check: benchmarks

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-01-04T06:11:40Z

📚 Docs Preview: https://pr-2918.datamodel-code-generator.pages.dev

codspeed-hq · 2026-01-04T06:13:55Z

CodSpeed Performance Report

Merging #2918 will not alter performance

_{Comparing fix/set-item-hash-1614 (4249d52) with main (960f7f9)}

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

Summary

✅ 11 untouched
⏩ 98 skipped¹

98 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/datamodel_code_generator/model/base.py (1)
681-693: Type annotation mismatch with defaultdict usage.

Line 681 declares extra_template_data: dict[str, Any], but line 693 assigns defaultdict(dict) when the parameter is None. While defaultdict is a subclass of dict, the type annotation should accurately reflect the actual type for proper type checking.
🔎 Proposed fix
-        self.extra_template_data: dict[str, Any]
+        self.extra_template_data: dict[str, Any] | defaultdict[str, dict[str, Any]]
         if extra_template_data is not None:
Alternatively, ensure consistent typing by always using dict:
         else:
-            self.extra_template_data = defaultdict(dict)
+            self.extra_template_data = {}

🧹 Nitpick comments (1)

tests/main/jsonschema/test_main_jsonschema.py (1)

7908-7923: Enum/set regression test is correctly wired; minor docstring nit only

The test is consistent with the rest of the suite: correct skip marker for Pydantic v2, expected fixture path, and CLI args that exercise --use-unique-items-as-set with standard collections. This should reliably catch regressions in the codegen for enum-in-set scenarios.

If you want to be slightly clearer, you could extend the docstring to also mention the positive case (that the generated item model in the same fixture gets __hash__) since the golden file likely asserts both aspects, not just “no __hash__ on the enum”. Otherwise this looks good as-is.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 960f7f9 and b98aae8.

⛔ Files ignored due to path filters (3)

src/datamodel_code_generator/model/template/pydantic_v2/BaseModel.jinja2 is excluded by none and included by none
src/datamodel_code_generator/model/template/pydantic_v2/RootModel.jinja2 is excluded by none and included by none
tests/data/jsonschema/unique_items_enum_set.json is excluded by !tests/data/**/*.json and included by none

📒 Files selected for processing (6)

src/datamodel_code_generator/model/base.py
src/datamodel_code_generator/parser/base.py
tests/data/expected/main/jsonschema/unique_items_enum_set.py
tests/data/expected/main/openapi/with_field_constraints_pydantic_v2_use_generic_container_types_set.py
tests/data/expected/main/openapi/with_field_constraints_pydantic_v2_use_standard_collections_set.py
tests/main/jsonschema/test_main_jsonschema.py

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2026-01-02T08:25:19.839Z

Learnt from: koxudaxi
Repo: koxudaxi/datamodel-code-generator PR: 2890
File: tests/data/expected/main/jsonschema/ref_nullable_with_constraint.py:14-15
Timestamp: 2026-01-02T08:25:19.839Z
Learning: The datamodel-code-generator currently generates RootModel subclasses with an explicit `root` field annotation (e.g., `class StringType(RootModel[str]): root: str`). This is existing behavior of the code generator and should not be flagged as an issue introduced by new changes.

Applied to files:

src/datamodel_code_generator/model/base.py

🧬 Code graph analysis (2)

tests/data/expected/main/jsonschema/unique_items_enum_set.py (2)

src/datamodel_code_generator/model/enum.py (1)

Enum (39-121)

src/datamodel_code_generator/model/base.py (1)

name (829-831)

src/datamodel_code_generator/parser/base.py (2)

src/datamodel_code_generator/model/base.py (2)

all_data_types (877-881)

path (910-912)

src/datamodel_code_generator/types.py (1)

all_data_types (576-582)

🪛 GitHub Actions: Lint

src/datamodel_code_generator/parser/base.py

[error] 1551-1552: PLR6301 Method __mark_set_item_models_hashable could be a function, class method, or static method

[error] 1555-1563: PLR1702 Too many nested blocks (7 > 5)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)

GitHub Check: 3.10 on Windows
GitHub Check: py312-pydantic1 on Ubuntu
GitHub Check: py312-isort5 on Ubuntu
GitHub Check: 3.12 on Windows
GitHub Check: 3.13 on Windows
GitHub Check: 3.12 on macOS
GitHub Check: 3.11 on Ubuntu
GitHub Check: 3.12 on Ubuntu
GitHub Check: 3.11 on Windows
GitHub Check: 3.14 on Windows
GitHub Check: benchmarks
GitHub Check: Analyze (python)

🔇 Additional comments (4)

tests/data/expected/main/openapi/with_field_constraints_pydantic_v2_use_standard_collections_set.py (1)

10-15: LGTM! Correct implementation of hashability for set usage.

The addition of __hash__ = object.__hash__ correctly enables Pet instances to be used in sets. This matches the solution described in issue #1614 and follows Pydantic v2 best practices for making model instances hashable.

tests/data/expected/main/jsonschema/unique_items_enum_set.py (1)

1-25: LGTM! Demonstrates correct discriminated hashability handling.

This test fixture correctly shows that:

Enum types (Status) don't receive __hash__ since they're already hashable by default

Pydantic models (Item) used in sets receive __hash__ = object.__hash__

This validates that the __mark_set_item_models_hashable method properly skips Enum types while marking BaseModel types.

tests/data/expected/main/openapi/with_field_constraints_pydantic_v2_use_generic_container_types_set.py (1)

12-16: LGTM! Validates frozenset support.

The addition of __hash__ = object.__hash__ correctly enables Pet instances to be used in frozensets. This confirms that the implementation handles both set and frozenset collection types appropriately.

src/datamodel_code_generator/parser/base.py (1)

2953-2954: LGTM! Correct integration in processing flow.

The call to __mark_set_item_models_hashable is correctly placed immediately after __replace_unique_list_to_set, ensuring that models are marked as hashable after their field types have been converted from list to set.

codecov · 2026-01-04T06:14:57Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (1b931d5) to head (4249d52).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff            @@
##              main     #2918   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           93        93           
  Lines        16865     16913   +48     
  Branches      1952      1966   +14     
=========================================
+ Hits         16865     16913   +48

Flag	Coverage Δ
unittests	`100.00% <100.00%> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2026-01-04T07:28:54Z

Breaking Change Analysis

Result: No breaking changes detected

Reasoning: This PR adds __hash__ = object.__hash__ to Pydantic v2 models that are used as set/frozenset items, enabling them to be hashable. This is purely additive functionality that: (1) doesn't affect existing code that doesn't use sets, (2) adds backward-compatible hash support using object identity, (3) doesn't change any CLI/API options, and (4) custom templates will gracefully ignore the new class_body_lines variable if not updated (Jinja2 silently handles undefined variables in for loops). The generated code remains valid Python and Pydantic models.

This analysis was performed by Claude Code Action

github-actions · 2026-01-05T17:28:18Z

🎉 Released in 0.52.2

This PR is now available in the latest release. See the release notes for details.

Add __hash__ to Pydantic v2 models used in sets

b98aae8

coderabbitai Bot reviewed Jan 4, 2026

View reviewed changes

Comment thread src/datamodel_code_generator/parser/base.py Outdated

koxudaxi added 3 commits January 4, 2026 06:40

Fix lint errors: refactor to classmethods and reduce nesting

5de128d

Fix cross-module set item hash detection

8fdb9f3

Refactor: use generic class_body_lines instead of set_item_hashable

4249d52

koxudaxi merged commit 58e73ed into main Jan 4, 2026
38 checks passed

koxudaxi deleted the fix/set-item-hash-1614 branch January 4, 2026 07:26

github-actions Bot added the breaking-change-analyzed label Jan 4, 2026

github-actions Bot mentioned this pull request Jan 5, 2026

Missing hash function for a class #1614

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add hash to Pydantic v2 models used in sets#2918

Add hash to Pydantic v2 models used in sets#2918
koxudaxi merged 4 commits intomainfrom
fix/set-item-hash-1614

koxudaxi commented Jan 4, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jan 4, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

Uh oh!

github-actions Bot commented Jan 4, 2026 •

edited

Loading

Uh oh!

codspeed-hq Bot commented Jan 4, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

codecov Bot commented Jan 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

github-actions Bot commented Jan 4, 2026

Uh oh!

github-actions Bot commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

koxudaxi commented Jan 4, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

Pre-merge checks and finishing touches

Uh oh!

github-actions Bot commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq Bot commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #2918 will not alter performance

Summary

Footnotes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov Bot commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

github-actions Bot commented Jan 4, 2026

Breaking Change Analysis

Uh oh!

github-actions Bot commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

koxudaxi commented Jan 4, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jan 4, 2026 •

edited

Loading

github-actions Bot commented Jan 4, 2026 •

edited

Loading

codspeed-hq Bot commented Jan 4, 2026 •

edited

Loading

codecov Bot commented Jan 4, 2026 •

edited

Loading