docstrings that can be single line to be formatted on a single line#3107
Conversation
📝 WalkthroughWalkthroughThis PR introduces a Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~15 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Merging this PR will degrade performance by 15.03%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | WallTime | test_perf_stripe_style_pydantic_v2 |
1.8 s | 2 s | -10.89% |
| ❌ | WallTime | test_perf_aws_style_openapi_pydantic_v2 |
1.7 s | 2 s | -14.07% |
| ❌ | WallTime | test_perf_complex_refs |
2 s | 2.3 s | -12.92% |
| ❌ | WallTime | test_perf_duplicate_names |
964.4 ms | 1,119.7 ms | -13.87% |
| ❌ | WallTime | test_perf_kubernetes_style_pydantic_v2 |
2.4 s | 2.7 s | -13.15% |
| ❌ | WallTime | test_perf_deep_nested |
5.6 s | 6.3 s | -10.52% |
| ❌ | WallTime | test_perf_graphql_style_pydantic_v2 |
751.9 ms | 871.3 ms | -13.7% |
| ❌ | WallTime | test_perf_large_models_pydantic_v2 |
3.3 s | 3.9 s | -14.71% |
| ❌ | WallTime | test_perf_openapi_large |
2.6 s | 3 s | -13.18% |
| ❌ | WallTime | test_perf_multiple_files_input |
3.4 s | 3.9 s | -15.03% |
| ❌ | WallTime | test_perf_all_options_enabled |
6 s | 6.9 s | -13.28% |
Comparing kevin-paulson-mindbridge-ai:keep_single_line_docstrings_on_a_single_line (6398d94) with main (92bdc27)
Footnotes
-
98 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3107 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 87 87
Lines 18306 18320 +14
Branches 2091 2094 +3
=========================================
+ Hits 18306 18320 +14
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
tests/data/expected/main/graphql/simple_star_wars_extra_fields_allow.py (1)
17-17: Note: Very long single-line docstrings may exceed line length limits.The
IDscalar docstring (line 17, ~330 characters) andStringscalar docstring (line 25, ~200+ characters) significantly exceed typical line length limits (79-120 characters). While this aligns with PEP 257's guidance for single-line docstrings, it may trigger line-length linter warnings.Consider whether docstrings exceeding a certain character threshold should remain multi-line to balance PEP 257 compliance with line-length conventions.
Also applies to: 25-25
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/data/expected/main/graphql/simple_star_wars_extra_fields_allow.py` at line 17, The single-line docstrings for the `ID` and `String` scalar definitions are extremely long and will trigger line-length linters; change them from very long single-line triple-quoted strings to properly wrapped multi-line docstrings (split into multiple shorter lines within the triple quotes) for the `ID` scalar and the `String` scalar so each line stays within your project's limit while preserving the exact text and punctuation.src/datamodel_code_generator/model/base.py (1)
94-101: Consider a max-length guard before forcing single-line output.Current logic collapses any one-line input, even very long text. Adding a length threshold (or configurable limit) would better match “fits on one line” and reduce downstream lint noise.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/datamodel_code_generator/model/base.py` around lines 94 - 101, The current single-line collapse logic should only apply when the one-line text is short; update the branch that checks len(value.splitlines()) to also check a max-length guard (e.g. len(escaped) <= MAX_SINGLE_LINE_LENGTH) before returning the compact form; if the text exceeds that threshold, fall through to the multi-line path that builds indent_text. Introduce a configurable MAX_SINGLE_LINE_LENGTH constant or an optional parameter and use that in the condition (referencing value, escaped, indent_spaces, and indent_text) so long strings stay multi-line and lint noise is reduced.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/datamodel_code_generator/model/base.py`:
- Around line 95-97: The code mutates the docstring by appending a trailing
space when escaped.endswith('"'), which alters original content; instead, detect
the quote-edge case and return a multiline/docstring fallback without changing
the original text. Locate the logic that sets/uses the variable escaped (in the
docstring rendering routine, e.g., the function that composes docstrings) and
replace the escaped += " " mutation with code that emits a multiline/'''...'''
or triple-quoted representation of the original docstring value (or an explicit
multiline fallback path) so the original content is preserved while avoiding
quote-edge parsing issues.
In `@tests/data/expected/main/graphql/custom_formatters.py`:
- Line 18: The docstring generator format_docstring() currently emits
single-line triple-quoted docstrings regardless of length; update
format_docstring() to detect when the resulting docstring (after escaping)
exceeds a target line length (e.g., 79 or configurable max_line_length) and then
reformat it into a multi-line triple-quoted block wrapped to that width (use a
wrapper like textwrap.fill or equivalent), preserving existing indentation and
any leading/trailing blank lines and ensuring existing multi-line inputs are not
double-wrapped; locate format_docstring() in base.py and implement the length
check + wrapping logic so long single-line inputs like the ID scalar docstring
are split into readable lines inside the triple quotes.
In `@tests/data/expected/main/graphql/field_aliases.py`:
- Line 20: The format_docstring function currently treats any single-line
docstring (checked via value.splitlines()) as exempt from wrapping, which allows
very long single-line docstrings (e.g., 172 chars) to bypass style limits;
update format_docstring to enforce a maximum line length for single-line
docstrings by detecting when len(value) (or a configurable max_length) is
exceeded and then wrap/split that single-line into a short-summary first line
plus subsequent wrapped lines (using the same wrapping logic used for multi-line
docstrings), ensuring the resulting output remains PEP 257 compatible while
respecting common linter width limits; reference the format_docstring function
and its single-line branch that currently checks value.splitlines().
---
Nitpick comments:
In `@src/datamodel_code_generator/model/base.py`:
- Around line 94-101: The current single-line collapse logic should only apply
when the one-line text is short; update the branch that checks
len(value.splitlines()) to also check a max-length guard (e.g. len(escaped) <=
MAX_SINGLE_LINE_LENGTH) before returning the compact form; if the text exceeds
that threshold, fall through to the multi-line path that builds indent_text.
Introduce a configurable MAX_SINGLE_LINE_LENGTH constant or an optional
parameter and use that in the condition (referencing value, escaped,
indent_spaces, and indent_text) so long strings stay multi-line and lint noise
is reduced.
In `@tests/data/expected/main/graphql/simple_star_wars_extra_fields_allow.py`:
- Line 17: The single-line docstrings for the `ID` and `String` scalar
definitions are extremely long and will trigger line-length linters; change them
from very long single-line triple-quoted strings to properly wrapped multi-line
docstrings (split into multiple shorter lines within the triple quotes) for the
`ID` scalar and the `String` scalar so each line stays within your project's
limit while preserving the exact text and punctuation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ebe078d0-2766-4318-ae78-72e7b80ba3c9
⛔ Files ignored due to path filters (16)
src/datamodel_code_generator/model/template/Enum.jinja2is excluded by none and included by nonesrc/datamodel_code_generator/model/template/ScalarTypeAliasAnnotation.jinja2is excluded by none and included by nonesrc/datamodel_code_generator/model/template/ScalarTypeAliasType.jinja2is excluded by none and included by nonesrc/datamodel_code_generator/model/template/ScalarTypeStatement.jinja2is excluded by none and included by nonesrc/datamodel_code_generator/model/template/TypeAliasAnnotation.jinja2is excluded by none and included by nonesrc/datamodel_code_generator/model/template/TypeAliasType.jinja2is excluded by none and included by nonesrc/datamodel_code_generator/model/template/TypeStatement.jinja2is excluded by none and included by nonesrc/datamodel_code_generator/model/template/TypedDictClass.jinja2is excluded by none and included by nonesrc/datamodel_code_generator/model/template/TypedDictFunction.jinja2is excluded by none and included by nonesrc/datamodel_code_generator/model/template/dataclass.jinja2is excluded by none and included by nonesrc/datamodel_code_generator/model/template/msgspec.jinja2is excluded by none and included by nonesrc/datamodel_code_generator/model/template/pydantic_v2/BaseModel.jinja2is excluded by none and included by nonesrc/datamodel_code_generator/model/template/pydantic_v2/RootModel.jinja2is excluded by none and included by nonesrc/datamodel_code_generator/model/template/pydantic_v2/RootModelTypeAlias.jinja2is excluded by none and included by nonesrc/datamodel_code_generator/model/template/pydantic_v2/dataclass.jinja2is excluded by none and included by nonetests/data/templates_schema_id/pydantic_v2/BaseModel.jinja2is excluded by none and included by none
📒 Files selected for processing (61)
src/datamodel_code_generator/model/base.pytests/data/expected/main/graphql/additional_imports.pytests/data/expected/main/graphql/annotated.pytests/data/expected/main/graphql/annotated_field_aliases.pytests/data/expected/main/graphql/annotated_use_default_kwarg.pytests/data/expected/main/graphql/annotated_use_standard_collections.pytests/data/expected/main/graphql/annotated_use_standard_collections_use_union_operator.pytests/data/expected/main/graphql/annotated_use_union_operator.pytests/data/expected/main/graphql/casing.pytests/data/expected/main/graphql/custom_formatters.pytests/data/expected/main/graphql/custom_scalar_types.pytests/data/expected/main/graphql/default_values_required_use_default.pytests/data/expected/main/graphql/different_types_of_fields.pytests/data/expected/main/graphql/empty_list_default.pytests/data/expected/main/graphql/enum_literals_all.pytests/data/expected/main/graphql/enum_literals_one.pytests/data/expected/main/graphql/enums.pytests/data/expected/main/graphql/enums_ignore_enum_constraints.pytests/data/expected/main/graphql/enums_no_specialized.pytests/data/expected/main/graphql/enums_specialized.pytests/data/expected/main/graphql/enums_typed_dict.pytests/data/expected/main/graphql/enums_using_subclass.pytests/data/expected/main/graphql/field_aliases.pytests/data/expected/main/graphql/graphql_multiple_aliases_pydantic_v2.pytests/data/expected/main/graphql/interface_mro.pytests/data/expected/main/graphql/no_typename.pytests/data/expected/main/graphql/pydantic_v2_empty_list_default.pytests/data/expected/main/graphql/simple_star_wars.pytests/data/expected/main/graphql/simple_star_wars_class_name_prefix.pytests/data/expected/main/graphql/simple_star_wars_dataclass.pytests/data/expected/main/graphql/simple_star_wars_dataclass_arguments.pytests/data/expected/main/graphql/simple_star_wars_dataclass_frozen_kw_only.pytests/data/expected/main/graphql/simple_star_wars_extra_fields_allow.pytests/data/expected/main/graphql/split_graphql_schemas.pytests/data/expected/main/graphql/type_alias.pytests/data/expected/main/graphql/type_alias_py312.pytests/data/expected/main/graphql/union.pytests/data/expected/main/graphql/union_class_name_prefix.pytests/data/expected/main/graphql/union_snake_case_field.pytests/data/expected/main/graphql/use_standard_collections.pytests/data/expected/main/graphql/use_union_operator.pytests/data/expected/main/jsonschema/allof_mro.pytests/data/expected/main/jsonschema/type_alias_with_field_description.pytests/data/expected/main/jsonschema/type_alias_with_field_description_py312.pytests/data/expected/main/openapi/all_of_with_relative_ref.pytests/data/expected/main/openapi/allof_array_ref_override.pytests/data/expected/main/use_attribute_docstrings.pytests/data/expected/main_kr/main_allof_with_description_only/output.pytests/data/expected/main_kr/main_docstring_special_chars/output.pytests/data/expected/main_kr/main_use_field_description/output.pytests/data/expected/main_kr/main_use_field_description_example/output.pytests/data/expected/main_kr/main_use_field_description_example_dataclass/output.pytests/data/expected/main_kr/main_use_field_description_with_example/output.pytests/data/expected/main_kr/main_use_inline_field_description_example_only/output.pytests/data/expected/main_kr/main_use_schema_description/output.pytests/data/expected/parser/graphql/field-default-enum.pytests/data/expected/parser/graphql/no_typename.pytests/data/expected/parser/graphql/union-aliased-bug.pytests/data/expected/parser/graphql/union-commented.pytests/data/expected/parser/graphql/union_with_prefix.pytests/model/test_base.py
| """ | ||
| The `ID` scalar type represents a unique identifier, often used to refetch an object or as key for a cache. The ID type appears in a JSON response as a String; however, it is not intended to be human-readable. When expected as an input type, any string (such as `"4"`) or integer (such as `4`) input value will be accepted as an ID. | ||
| """ | ||
| """The `ID` scalar type represents a unique identifier, often used to refetch an object or as key for a cache. The ID type appears in a JSON response as a String; however, it is not intended to be human-readable. When expected as an input type, any string (such as `"4"`) or integer (such as `4`) input value will be accepted as an ID.""" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's read the file and check line 18
cat -n tests/data/expected/main/graphql/custom_formatters.py | head -25Repository: koxudaxi/datamodel-code-generator
Length of output: 1268
🏁 Script executed:
# Get the exact character count of line 18
sed -n '18p' tests/data/expected/main/graphql/custom_formatters.py | wc -cRepository: koxudaxi/datamodel-code-generator
Length of output: 81
🏁 Script executed:
# Let's find the formatter implementation and understand this test structure
find . -name "*.py" -path "*/format*" -type f | grep -E "(format|docstring)" | head -20Repository: koxudaxi/datamodel-code-generator
Length of output: 327
🏁 Script executed:
# Let's look at the formatter implementation
cat -n ./src/datamodel_code_generator/format.py | head -100Repository: koxudaxi/datamodel-code-generator
Length of output: 3356
🏁 Script executed:
# Continue reading the format.py file to find docstring handling
wc -l ./src/datamodel_code_generator/format.pyRepository: koxudaxi/datamodel-code-generator
Length of output: 122
🏁 Script executed:
# Search for docstring formatting and line length handling in format.py
rg -n "docstring|line.length|format_docstring" ./src/datamodel_code_generator/format.py -A 3Repository: koxudaxi/datamodel-code-generator
Length of output: 342
🏁 Script executed:
# Search for any docstring-specific formatting logic
rg -n "format_docstring" . -A 5Repository: koxudaxi/datamodel-code-generator
Length of output: 17528
🏁 Script executed:
# Let's read the format_docstring function implementation
sed -n '70,120p' ./src/datamodel_code_generator/model/base.pyRepository: koxudaxi/datamodel-code-generator
Length of output: 2003
🏁 Script executed:
# Let's check the related test for field_aliases.py mentioned in the review
cat -n tests/data/expected/main/graphql/field_aliases.py | head -30Repository: koxudaxi/datamodel-code-generator
Length of output: 1152
🏁 Script executed:
# Check character count on line 20 of field_aliases.py
sed -n '20p' tests/data/expected/main/graphql/field_aliases.py | wc -cRepository: koxudaxi/datamodel-code-generator
Length of output: 81
🏁 Script executed:
# Search for tests related to custom_formatters.py to understand the input
find . -name "*custom*scalar*" -type f | grep -E "\.(graphql|json)" | head -10Repository: koxudaxi/datamodel-code-generator
Length of output: 172
🏁 Script executed:
# Check the GraphQL input file
cat -n ./tests/data/graphql/custom-scalar-types.graphqlRepository: koxudaxi/datamodel-code-generator
Length of output: 170
🏁 Script executed:
# Check if there's any configuration or mechanism for handling line length in docstring formatting
rg -n "line.length|wrap|break|docstring" ./src/datamodel_code_generator/model/base.py | head -20Repository: koxudaxi/datamodel-code-generator
Length of output: 1338
🏁 Script executed:
# Search for any line length handling or wrapping logic
rg -n "MAX_LINE_LENGTH|line_length|wrap.*line|break.*line" ./src/datamodel_code_generator/model/base.pyRepository: koxudaxi/datamodel-code-generator
Length of output: 59
🏁 Script executed:
# Check if Black formatting is applied to handle line length after code generation
rg -n "black|format.*code" ./src/datamodel_code_generator/format.py | head -15Repository: koxudaxi/datamodel-code-generator
Length of output: 888
🏁 Script executed:
# Check how tests use the expected output files and whether Black is applied
rg -n "custom_formatters\|expected.*main.*graphql" ./tests -l | head -10Repository: koxudaxi/datamodel-code-generator
Length of output: 59
Implement line length checking in format_docstring() to wrap long docstrings.
The ID scalar docstring on line 18 is 339 characters, significantly exceeding PEP 8 recommendations (79-120 characters). The format_docstring() function in src/datamodel_code_generator/model/base.py:70 currently only checks if the input is a single line, then wraps it in triple quotes without validating character length. This generates docstrings that violate line length guidelines and are difficult to read without horizontal scrolling.
The function should check the final docstring length and wrap to multiple lines when it exceeds a reasonable limit (e.g., 79-120 characters), following PEP 8 and best practices for code readability.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/data/expected/main/graphql/custom_formatters.py` at line 18, The
docstring generator format_docstring() currently emits single-line triple-quoted
docstrings regardless of length; update format_docstring() to detect when the
resulting docstring (after escaping) exceeds a target line length (e.g., 79 or
configurable max_line_length) and then reformat it into a multi-line
triple-quoted block wrapped to that width (use a wrapper like textwrap.fill or
equivalent), preserving existing indentation and any leading/trailing blank
lines and ensuring existing multi-line inputs are not double-wrapped; locate
format_docstring() in base.py and implement the length check + wrapping logic so
long single-line inputs like the ID scalar docstring are split into readable
lines inside the triple quotes.
There was a problem hiding this comment.
As the output didn't wrap text like this before I think this is fine. If we wanted to do this is could be done in a separate pull request.
There was a problem hiding this comment.
@kevin-paulson-mindbridge-ai, that's a fair point — this PR focuses on collapsing short docstrings to a single line per PEP 257/D200, not on introducing line-length wrapping, which was never done before. Deferring the wrapping concern to a follow-up PR makes sense. I'll drop this suggestion.
| """ | ||
| The `String` scalar type represents textual data, represented as UTF-8 character sequences. The String type is most often used by GraphQL to represent free-form human-readable text. | ||
| """ | ||
| """The `String` scalar type represents textual data, represented as UTF-8 character sequences. The String type is most often used by GraphQL to represent free-form human-readable text.""" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Check format_docstring implementation for line length handling
# Search for format_docstring function and line length logic
rg -A 20 -B 2 'def format_docstring' --type=py
# Also check for any line length constants or configuration
rg -n 'max.*length|line.*length|MAX.*LINE' --type=py -g '**/model/base.py'Repository: koxudaxi/datamodel-code-generator
Length of output: 1966
🏁 Script executed:
# Read the full format_docstring implementation
sed -n '/^def format_docstring/,/^def /p' src/datamodel_code_generator/model/base.py | head -60Repository: koxudaxi/datamodel-code-generator
Length of output: 1525
🏁 Script executed:
# Check the test file at line 20
sed -n '18,22p' tests/data/expected/main/graphql/field_aliases.pyRepository: koxudaxi/datamodel-code-generator
Length of output: 306
🏁 Script executed:
# Search for any line length related constants or configuration
rg -n 'MAX_LINE|max_line|LINE_LENGTH|line_length|MAX.*CHAR|max.*char' src/datamodel_code_generator/model/base.pyRepository: koxudaxi/datamodel-code-generator
Length of output: 59
The format_docstring implementation does not include line length handling for single-line docstrings.
The function at line 20 in tests/data/expected/main/graphql/field_aliases.py contains a 172-character single-line docstring. The format_docstring implementation strictly follows PEP 257, which mandates that docstrings fitting on one line remain on a single line. The function checks only if len(value.splitlines()) == 1 and does not enforce any character-based limits. While this is PEP 257 compliant, the resulting 172-character line will likely exceed common linter configurations (typically 79–120 characters) and may trigger style warnings.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/data/expected/main/graphql/field_aliases.py` at line 20, The
format_docstring function currently treats any single-line docstring (checked
via value.splitlines()) as exempt from wrapping, which allows very long
single-line docstrings (e.g., 172 chars) to bypass style limits; update
format_docstring to enforce a maximum line length for single-line docstrings by
detecting when len(value) (or a configurable max_length) is exceeded and then
wrap/split that single-line into a short-summary first line plus subsequent
wrapped lines (using the same wrapping logic used for multi-line docstrings),
ensuring the resulting output remains PEP 257 compatible while respecting common
linter width limits; reference the format_docstring function and its single-line
branch that currently checks value.splitlines().
There was a problem hiding this comment.
As the output didn't wrap text like this before I think this is fine. If we wanted to do this is could be done in a separate pull request.
(Same comment as #3107 (comment))
There was a problem hiding this comment.
@kevin-paulson-mindbridge-ai — That's a fair point. Since line wrapping wasn't applied before this PR either, maintaining the same behaviour here is consistent. If line-length enforcement for single-line docstrings is ever desired, it can be addressed in a follow-up PR. I'll close this comment.
🧠 Learnings used
Learnt from: kevin-paulson-mindbridge-ai
Repo: koxudaxi/datamodel-code-generator PR: 3107
File: src/datamodel_code_generator/model/base.py:95-97
Timestamp: 2026-04-28T21:48:20.138Z
Learning: In `src/datamodel_code_generator/model/base.py`, the `format_docstring` function intentionally adds a trailing space to single-line docstrings whose escaped content ends with `"` (e.g., `"""some text\\"\\"\\" """`). This matches Black's handling of the quote-edge case and is not a content mutation bug — it prevents a four-consecutive-quote syntax issue. Do not flag this as a defect.
Learnt from: kevin-paulson-mindbridge-ai
Repo: koxudaxi/datamodel-code-generator PR: 3107
File: src/datamodel_code_generator/model/base.py:95-97
Timestamp: 2026-04-28T21:48:20.138Z
Learning: In `src/datamodel_code_generator/model/base.py`, `format_docstring` intentionally appends a trailing space to single-line docstrings whose escaped content ends with `"` (producing e.g. `"""text\\"\\"\\" """`). This prevents a parse error from four consecutive quotes and mirrors the leading-space pattern that Black uses when a docstring begins with `"`. This is intentional and should not be flagged as a defect or content mutation.
Learnt from: koxudaxi
Repo: koxudaxi/datamodel-code-generator PR: 2890
File: tests/data/expected/main/jsonschema/ref_nullable_with_constraint.py:14-15
Timestamp: 2026-01-02T08:25:22.111Z
Learning: The datamodel-code-generator currently generates RootModel subclasses with an explicit `root` field annotation (e.g., `class StringType(RootModel[str]): root: str`). This is existing behavior of the code generator and should not be flagged as an issue introduced by new changes.
Inspired by unnecessary-multiline-docstring (D200):
I think this makes sense to change the output in this way for the project, but if not preferred I'm happy to abandon this pull request. Thank you!
Summary by CodeRabbit
New Features
Tests