Skip to content

Potential fix for code scanning alert no. 141: Full server-side request forgery#8093

Open
grantfitzsimmons wants to merge 4 commits into
mainfrom
alert-autofix-141
Open

Potential fix for code scanning alert no. 141: Full server-side request forgery#8093
grantfitzsimmons wants to merge 4 commits into
mainfrom
alert-autofix-141

Conversation

@grantfitzsimmons
Copy link
Copy Markdown
Member

@grantfitzsimmons grantfitzsimmons commented May 19, 2026

Potential fix for https://github.com/specify/specify7/security/code-scanning/141

General fix: enforce a server-side allowlist for any remotely fetched mapping URL, and reject all other remote URLs before calling requests.get. Keep existing local-path behavior unchanged.

Best fix in this code: in specifyweb/backend/trees/default_tree_files.py, add a validation helper that only permits HTTPS URLs to files.specifysoftware.org and only for known mapping paths under /treerows/ that are already handled by local mapping logic (KNOWN_REMOTE_DEFAULT_TREE_PATHS plus the existing /treerows/*.json discipline mapping pattern). Then call this validator in load_default_tree_json right before requests.get(source). If invalid, raise ValueError so existing callers’ except Exception paths return their current 404 error response without broader functional changes.

This single change covers both alert variants because both flows call load_default_tree_json(mapping_url).

Suggested fixes powered by Copilot Autofix. Review carefully before merging.

Summary by CodeRabbit

  • Bug Fixes
    • Tightened validation for remote default-tree sources: only HTTPS on the official file host and a fixed set of known CSV/JSON and tree-row paths are accepted.
    • Disallowed local/non-remote sources now return a clear error instead of previous behavior.
    • Remote fetches use a bounded timeout to avoid hanging requests.

Review Change Stack

…st forgery

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
@coderabbitai

This comment was marked as off-topic.

@grantfitzsimmons grantfitzsimmons marked this pull request as ready for review May 19, 2026 04:43
@grantfitzsimmons grantfitzsimmons added this to the 7.12.1 milestone May 19, 2026
@grantfitzsimmons grantfitzsimmons requested a review from Copilot May 19, 2026 04:44
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses code scanning alert #141 (full SSRF) by restricting which remote URLs can be fetched when loading default tree mapping JSON, introducing a server-side allowlist and rejecting other remote sources before issuing requests.get.

Changes:

  • Added _is_allowed_remote_mapping_url() to allow only https://files.specifysoftware.org and specific mapping-style paths.
  • Enforced the allowlist in load_default_tree_json() by rejecting disallowed sources prior to requests.get.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread specifyweb/backend/trees/default_tree_files.py
Comment thread specifyweb/backend/trees/default_tree_files.py
Comment on lines 132 to 133
response = requests.get(source)
response.raise_for_status()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an explicit timeout to this request call to avoid hanging workers (requests.get(source, timeout=(5, 30))) in commit cc978b4.

Comment thread specifyweb/backend/trees/default_tree_files.py
Copilot AI and others added 2 commits May 20, 2026 20:53
Agent-Logs-Url: https://github.com/specify/specify7/sessions/00412ec3-6e06-4895-86a3-9a68e24bfcca

Co-authored-by: grantfitzsimmons <37256050+grantfitzsimmons@users.noreply.github.com>
Agent-Logs-Url: https://github.com/specify/specify7/sessions/a3063a59-24c3-4060-b750-f79ae732ba54

Co-authored-by: grantfitzsimmons <37256050+grantfitzsimmons@users.noreply.github.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
specifyweb/backend/trees/default_tree_files.py (1)

197-207: ⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy lift

Add allowlist validation before remote CSV fetch in stream_default_tree_csv.

The url parameter is user-controlled (from request body in views.py) and flows directly to _stream_remote_default_tree_csv, which makes a requests.get() call without the allowlist validation. The load_default_tree_json function already validates URLs with _is_allowed_remote_mapping_url() before fetching—apply the same check in stream_default_tree_csv before calling _stream_remote_default_tree_csv(source).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@specifyweb/backend/trees/default_tree_files.py` around lines 197 - 207, In
stream_default_tree_csv, add the same allowlist check used in
load_default_tree_json before delegating to _stream_remote_default_tree_csv:
call _is_allowed_remote_mapping_url(source) after the local-file attempt fails
and if it returns False raise a ValueError (or the same exception type used by
load_default_tree_json) instead of proceeding to
_stream_remote_default_tree_csv; this ensures unallowed user-controlled URLs are
rejected prior to any requests.get() in _stream_remote_default_tree_csv.
🧹 Nitpick comments (1)
specifyweb/backend/trees/default_tree_files.py (1)

135-137: ⚡ Quick win

Consider disabling redirect following for defense in depth.

requests.get() follows HTTP redirects by default. If the allowed host ever served an open redirect, an attacker could bypass the allowlist validation. Disabling redirects would strengthen this SSRF mitigation.

Suggested change
-    response = requests.get(source)
+    response = requests.get(source, allow_redirects=False)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@specifyweb/backend/trees/default_tree_files.py` around lines 135 - 137, The
code uses requests.get(source) which follows redirects by default; change the
fetch to disable redirect following (call requests.get with
allow_redirects=False) in the same function where response and
response.raise_for_status() are used, and after the request explicitly detect
3xx responses (e.g., inspect response.status_code or
response.is_redirect/response.history) and treat them as errors (raise or return
a clear failure) so an open redirect cannot bypass the allowlist validation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@specifyweb/backend/trees/default_tree_files.py`:
- Around line 197-207: In stream_default_tree_csv, add the same allowlist check
used in load_default_tree_json before delegating to
_stream_remote_default_tree_csv: call _is_allowed_remote_mapping_url(source)
after the local-file attempt fails and if it returns False raise a ValueError
(or the same exception type used by load_default_tree_json) instead of
proceeding to _stream_remote_default_tree_csv; this ensures unallowed
user-controlled URLs are rejected prior to any requests.get() in
_stream_remote_default_tree_csv.

---

Nitpick comments:
In `@specifyweb/backend/trees/default_tree_files.py`:
- Around line 135-137: The code uses requests.get(source) which follows
redirects by default; change the fetch to disable redirect following (call
requests.get with allow_redirects=False) in the same function where response and
response.raise_for_status() are used, and after the request explicitly detect
3xx responses (e.g., inspect response.status_code or
response.is_redirect/response.history) and treat them as errors (raise or return
a clear failure) so an open redirect cannot bypass the allowlist validation.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 9c9c7e7e-91bf-4525-926c-d6c6efbc7824

📥 Commits

Reviewing files that changed from the base of the PR and between 0ac2370 and ecfcf4b.

📒 Files selected for processing (1)
  • specifyweb/backend/trees/default_tree_files.py

if not _is_allowed_remote_mapping_url(source):
raise ValueError('Remote mapping URL is not allowed.')

response = requests.get(source, timeout=(5, 30))
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
specifyweb/backend/trees/default_tree_files.py (1)

48-54: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Tighten /treerows/ path matching.

This branch only validates the decoded stem, so paths like /treerows/%2e%2e/secret.txt or /treerows/foo.csv still pass if the final stem is alphanumeric. That weakens the allowlist from /treerows/<discipline>.json to “some filename on the host.” Require an exact decoded path shape and .json suffix before alias normalization.

Suggested fix
     if parsed.path.startswith('/treerows/'):
-        stem = Path(unquote(parsed.path)).stem.lower()
+        decoded_path = unquote(parsed.path)
+        if not re.fullmatch(r'/treerows/(?:col\d+_)?[a-z0-9_]+\.json', decoded_path.lower()):
+            return False
+        stem = Path(decoded_path).stem.lower()
         match = re.fullmatch(r'col\d+_(.+)', stem)
         if match is not None:
             stem = match.group(1)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@specifyweb/backend/trees/default_tree_files.py` around lines 48 - 54, The
current `/treerows/` branch only validates the decoded filename stem which
allows malicious paths like `/treerows/%2e%2e/secret.txt`; change the logic to
first decode the full path and require an exact shape with a .json suffix (e.g.
match decoded_path against a regex that enforces
^/treerows/(col\d+_)?([a-z0-9_]+)\.json$), capture the discipline stem from that
match, then apply TREE_ROW_DISCIPLINE_ALIASES lookup and finally validate the
normalized stem with the existing [a-z0-9_]+ check; use the captured groups from
re.fullmatch on unquote(parsed.path) rather than Path(...).stem to avoid
accepting directory traversal or other unexpected filenames.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@specifyweb/backend/trees/default_tree_files.py`:
- Line 135: The requests.get call currently follows redirects which allows SSRF
via redirect; update the calls in load_default_tree_json and
_stream_remote_default_tree_csv (the requests.get(...) call that sets
timeout=(5, 30)) to disable automatic redirects (set allow_redirects=False) and
then explicitly handle redirect responses by rejecting any 3xx status or
validating the final destination (response.headers["Location"] or response.url)
against your allowlist before proceeding; if you must follow a redirect, perform
a secondary validation of the resolved URL/domain and only allow it if it passes
the same allowlist check, otherwise return an error.

---

Outside diff comments:
In `@specifyweb/backend/trees/default_tree_files.py`:
- Around line 48-54: The current `/treerows/` branch only validates the decoded
filename stem which allows malicious paths like `/treerows/%2e%2e/secret.txt`;
change the logic to first decode the full path and require an exact shape with a
.json suffix (e.g. match decoded_path against a regex that enforces
^/treerows/(col\d+_)?([a-z0-9_]+)\.json$), capture the discipline stem from that
match, then apply TREE_ROW_DISCIPLINE_ALIASES lookup and finally validate the
normalized stem with the existing [a-z0-9_]+ check; use the captured groups from
re.fullmatch on unquote(parsed.path) rather than Path(...).stem to avoid
accepting directory traversal or other unexpected filenames.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 85118d45-ae4c-4cf7-ac8d-80333e072092

📥 Commits

Reviewing files that changed from the base of the PR and between ecfcf4b and 837970e.

📒 Files selected for processing (1)
  • specifyweb/backend/trees/default_tree_files.py

Comment thread specifyweb/backend/trees/default_tree_files.py
@github-project-automation github-project-automation Bot moved this from 📋Back Log to Dev Attention Needed in General Tester Board May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Dev Attention Needed

Development

Successfully merging this pull request may close these issues.

4 participants