Skip to content

Commit f6d4cbd

Browse files
butvinmclaude
andauthored
Add --allow-remote-refs to disable HTTP fetching of $ref by default (#3051)
* Add --allow-remote-refs flag to gate HTTP fetching of $ref targets Remote $ref fetching over HTTP/HTTPS is now disabled by default. When a $ref resolves to an HTTP(S) URL and --allow-remote-refs is not set, a clear error message is shown instead of silently fetching remote content. file:// URLs are still allowed without the flag since they are local. --url input implicitly enables --allow-remote-refs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Improve error message for missing local $ref files When a $ref points to a local file that doesn't exist, the error now clearly states "$ref file not found: <path>" instead of raising a raw FileNotFoundError. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Validate HTTP responses before parsing as schema content Check status codes and Content-Type headers when fetching remote $ref targets over HTTP. Returns clear error messages for HTTP errors (4xx/5xx) and unexpected HTML responses instead of cryptic YAML parse errors. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Remove redundant local `import json` in test functions The module-level import already covers these usages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add @pytest.mark.cli_doc marker for --allow-remote-refs CLI reference docs will be auto-generated by CI from this marker. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Set status_code and headers on httpx mock responses Remove isinstance/hasattr guards from http.get_body() and instead fix all test mocks to set status_code=200 and headers={}, matching real httpx.Response behavior. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Use SchemaFetchError instead of bare Exception in HTTP responses Add SchemaFetchError(Error) for HTTP error status codes and unexpected HTML responses. This ensures errors are caught by the existing `except Error` handler in __main__.py and shown as clean messages instead of unhandled tracebacks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Wrap transport errors and normalize Content-Type check Catch httpx transport exceptions (DNS, timeout, connection errors) and wrap them in SchemaFetchError. Normalize Content-Type to lowercase before checking for text/html. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Harden test assertions for blocked and missing refs Assert httpx.get is not called when remote refs are blocked, preventing real HTTP leaks if the gate regresses. Assert the specific file path in the missing local ref error, not just the generic prefix. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Change --allow-remote-refs default to warn instead of block Per maintainer feedback, keep backward compatibility by allowing remote $ref fetching by default but emit a FutureWarning when it happens without explicit --allow-remote-refs. The flag becomes a three-state: True (explicit opt-in, no warning), False (blocks), None (default, allows with deprecation warning). The default will flip to False in a future major version. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Support --no-allow-remote-refs via BooleanOptionalAction Use BooleanOptionalAction (like --use-annotated) so users can explicitly opt out with --no-allow-remote-refs from the CLI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add --no-allow-remote-refs to CLI_OPTION_META BooleanOptionalAction registers both --allow-remote-refs and --no-allow-remote-refs in argparse; the sync test requires both to be in CLI_OPTION_META. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 7e1a5c7 commit f6d4cbd

19 files changed

Lines changed: 314 additions & 18 deletions

src/datamodel_code_generator/__init__.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -371,6 +371,10 @@ def _format_message(self, message: str) -> str:
371371
return message
372372

373373

374+
class SchemaFetchError(Error):
375+
"""Raised when fetching a remote schema fails (HTTP error, unexpected content type)."""
376+
377+
374378
def get_first_file(path: Path) -> Path: # pragma: no cover
375379
"""Find and return the first file in a path (file or directory)."""
376380
if path.is_file():

src/datamodel_code_generator/__main__.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -491,6 +491,7 @@ def validate_class_name_affix_scope(cls, v: str | ClassNameAffixScope | None) ->
491491
use_closed_typed_dict: bool = True
492492
allof_merge_mode: AllOfMergeMode = AllOfMergeMode.Constraints
493493
allof_class_hierarchy: AllOfClassHierarchy = AllOfClassHierarchy.IfNoConflict
494+
allow_remote_refs: Optional[bool] = None # noqa: UP045
494495
http_headers: Optional[Sequence[tuple[str, str]]] = None # noqa: UP045
495496
http_ignore_tls: bool = False
496497
http_timeout: Optional[float] = None # noqa: UP045
@@ -934,6 +935,7 @@ def run_generate_from_config( # noqa: PLR0913, PLR0917
934935
use_closed_typed_dict=config.use_closed_typed_dict,
935936
allof_merge_mode=config.allof_merge_mode,
936937
allof_class_hierarchy=config.allof_class_hierarchy,
938+
allow_remote_refs=config.allow_remote_refs,
937939
http_headers=config.http_headers,
938940
http_ignore_tls=config.http_ignore_tls,
939941
http_timeout=config.http_timeout,
@@ -1080,6 +1082,10 @@ def main(args: Sequence[str] | None = None) -> Exit: # noqa: PLR0911, PLR0912,
10801082
)
10811083
return Exit.ERROR
10821084

1085+
# --url implies --allow-remote-refs since the user is explicitly fetching a remote schema
1086+
if config.url:
1087+
config.allow_remote_refs = True
1088+
10831089
if config.check and config.output is None:
10841090
print( # noqa: T201
10851091
"Error: --check cannot be used with stdout output (no --output specified)",

src/datamodel_code_generator/_types/generate_config_dict.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -116,6 +116,7 @@ class GenerateConfigDict(TypedDict, closed=True):
116116
use_closed_typed_dict: NotRequired[bool]
117117
allof_merge_mode: NotRequired[AllOfMergeMode]
118118
allof_class_hierarchy: NotRequired[AllOfClassHierarchy]
119+
allow_remote_refs: NotRequired[bool | None]
119120
http_headers: NotRequired[Sequence[tuple[str, str]] | None]
120121
http_ignore_tls: NotRequired[bool]
121122
http_timeout: NotRequired[float | None]

src/datamodel_code_generator/_types/parser_config_dicts.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,7 @@ class ParserConfigDict(TypedDict):
111111
use_closed_typed_dict: NotRequired[bool]
112112
allof_merge_mode: NotRequired[AllOfMergeMode]
113113
allof_class_hierarchy: NotRequired[AllOfClassHierarchy]
114+
allow_remote_refs: NotRequired[bool | None]
114115
http_headers: NotRequired[Sequence[tuple[str, str]] | None]
115116
http_ignore_tls: NotRequired[bool]
116117
http_timeout: NotRequired[float | None]

src/datamodel_code_generator/arguments.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,6 +125,16 @@ def start_section(self, heading: str | None) -> None:
125125
# ======================================================================================
126126
# Base options for input/output
127127
# ======================================================================================
128+
base_options.add_argument(
129+
"--allow-remote-refs",
130+
help="Allow fetching remote $ref references over HTTP/HTTPS. "
131+
"Currently remote fetching is allowed by default but emits a deprecation warning. "
132+
"Pass --allow-remote-refs to opt in without warning, "
133+
"or --no-allow-remote-refs to block remote fetching. "
134+
"In a future version, remote fetching will be disabled by default.",
135+
action=BooleanOptionalAction,
136+
default=None,
137+
)
128138
base_options.add_argument(
129139
"--http-headers",
130140
nargs="+",

src/datamodel_code_generator/cli_options.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -265,6 +265,8 @@ class CLIOptionMeta:
265265
# General Options
266266
# ==========================================================================
267267
"--check": CLIOptionMeta(name="--check", category=OptionCategory.GENERAL),
268+
"--allow-remote-refs": CLIOptionMeta(name="--allow-remote-refs", category=OptionCategory.GENERAL),
269+
"--no-allow-remote-refs": CLIOptionMeta(name="--no-allow-remote-refs", category=OptionCategory.GENERAL),
268270
"--http-headers": CLIOptionMeta(name="--http-headers", category=OptionCategory.GENERAL),
269271
"--http-ignore-tls": CLIOptionMeta(name="--http-ignore-tls", category=OptionCategory.GENERAL),
270272
"--http-query-parameters": CLIOptionMeta(name="--http-query-parameters", category=OptionCategory.GENERAL),

src/datamodel_code_generator/config.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -141,6 +141,7 @@ class GenerateConfig(BaseModel):
141141
use_closed_typed_dict: bool = True
142142
allof_merge_mode: AllOfMergeMode = AllOfMergeMode.Constraints
143143
allof_class_hierarchy: AllOfClassHierarchy = AllOfClassHierarchy.IfNoConflict
144+
allow_remote_refs: bool | None = None
144145
http_headers: Sequence[tuple[str, str]] | None = None
145146
http_ignore_tls: bool = False
146147
http_timeout: float | None = None
@@ -273,6 +274,7 @@ class ParserConfig(BaseModel):
273274
use_closed_typed_dict: bool = True
274275
allof_merge_mode: AllOfMergeMode = AllOfMergeMode.Constraints
275276
allof_class_hierarchy: AllOfClassHierarchy = AllOfClassHierarchy.IfNoConflict
277+
allow_remote_refs: bool | None = None
276278
http_headers: Sequence[tuple[str, str]] | None = None
277279
http_ignore_tls: bool = False
278280
http_timeout: float | None = None

src/datamodel_code_generator/http.py

Lines changed: 24 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@
99

1010
from typing import TYPE_CHECKING, Any
1111

12+
from datamodel_code_generator import SchemaFetchError
13+
1214
if TYPE_CHECKING:
1315
from collections.abc import Sequence
1416

@@ -35,14 +37,28 @@ def get_body(
3537
) -> str:
3638
"""Fetch content from a URL with optional headers and query parameters."""
3739
httpx = _get_httpx()
38-
return httpx.get(
39-
url,
40-
headers=headers,
41-
verify=not ignore_tls,
42-
follow_redirects=True,
43-
params=query_parameters, # ty: ignore
44-
timeout=timeout,
45-
).text
40+
try:
41+
response = httpx.get(
42+
url,
43+
headers=headers,
44+
verify=not ignore_tls,
45+
follow_redirects=True,
46+
params=query_parameters, # ty: ignore
47+
timeout=timeout,
48+
)
49+
except Exception as e:
50+
msg = f"Failed to fetch {url}: {e}"
51+
raise SchemaFetchError(msg) from e
52+
if response.status_code >= 400: # noqa: PLR2004
53+
msg = f"HTTP {response.status_code} error fetching {url}"
54+
raise SchemaFetchError(msg)
55+
content_type = response.headers.get("content-type", "").lower()
56+
if "text/html" in content_type:
57+
msg = (
58+
f"Unexpected HTML response from {url} (Content-Type: {content_type}). Expected JSON or YAML schema content."
59+
)
60+
raise SchemaFetchError(msg)
61+
return response.text
4662

4763

4864
def join_url(url: str, ref: str = ".") -> str: # noqa: PLR0912

src/datamodel_code_generator/parser/base.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1101,6 +1101,7 @@ def __init__( # noqa: PLR0912, PLR0915
11011101
)
11021102
self.class_name: str | None = config.class_name
11031103
self.wrap_string_literal: bool | None = config.wrap_string_literal
1104+
self.allow_remote_refs: bool | None = config.allow_remote_refs
11041105
self.http_headers: Sequence[tuple[str, str]] | None = config.http_headers
11051106
self.http_query_parameters: Sequence[tuple[str, str]] | None = config.http_query_parameters
11061107
self.http_ignore_tls: bool = config.http_ignore_tls

src/datamodel_code_generator/parser/jsonschema.py

Lines changed: 26 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@
3030
from datamodel_code_generator import (
3131
AllOfClassHierarchy,
3232
AllOfMergeMode,
33+
Error,
3334
InvalidClassNameError,
3435
JsonSchemaVersion,
3536
ReadOnlyWriteOnlyModelType,
@@ -3817,6 +3818,23 @@ def create_enum(reference_: Reference) -> DataType:
38173818
def _get_ref_body(self, resolved_ref: str) -> dict[str, YamlValue]:
38183819
"""Get the body of a reference from URL or remote file."""
38193820
if is_url(resolved_ref):
3821+
if not resolved_ref.startswith("file://"):
3822+
if self.allow_remote_refs is False:
3823+
msg = (
3824+
f"Fetching remote $ref is disabled: {resolved_ref}\n"
3825+
"Use --allow-remote-refs to enable HTTP fetching of remote references."
3826+
)
3827+
raise Error(msg)
3828+
if self.allow_remote_refs is None:
3829+
import warnings # noqa: PLC0415
3830+
3831+
warnings.warn(
3832+
f"Fetching remote $ref without --allow-remote-refs: {resolved_ref}\n"
3833+
"In a future version, remote $ref fetching will be disabled by default. "
3834+
"Pass --allow-remote-refs explicitly to silence this warning.",
3835+
FutureWarning,
3836+
stacklevel=2,
3837+
)
38203838
return self._get_ref_body_from_url(resolved_ref)
38213839
return self._get_ref_body_from_remote(resolved_ref)
38223840

@@ -3844,10 +3862,14 @@ def _get_ref_body_from_remote(self, resolved_ref: str) -> dict[str, YamlValue]:
38443862
"""Get reference body from a remote file path."""
38453863
full_path = self.base_path / resolved_ref
38463864

3847-
return self.remote_object_cache.get_or_put(
3848-
str(full_path),
3849-
default_factory=lambda _: load_data_from_path(full_path, self.encoding),
3850-
)
3865+
try:
3866+
return self.remote_object_cache.get_or_put(
3867+
str(full_path),
3868+
default_factory=lambda _: load_data_from_path(full_path, self.encoding),
3869+
)
3870+
except FileNotFoundError:
3871+
msg = f"$ref file not found: {full_path}"
3872+
raise Error(msg) from None
38513873

38523874
def resolve_ref(self, object_ref: str) -> Reference:
38533875
"""Resolve a reference by loading and parsing the referenced schema."""

0 commit comments

Comments
 (0)