Potential fix for code scanning alert no. 141: Full server-side request forgery#8093
Potential fix for code scanning alert no. 141: Full server-side request forgery#8093grantfitzsimmons wants to merge 4 commits into
Conversation
…st forgery Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
This comment was marked as off-topic.
This comment was marked as off-topic.
There was a problem hiding this comment.
Pull request overview
This PR addresses code scanning alert #141 (full SSRF) by restricting which remote URLs can be fetched when loading default tree mapping JSON, introducing a server-side allowlist and rejecting other remote sources before issuing requests.get.
Changes:
- Added
_is_allowed_remote_mapping_url()to allow onlyhttps://files.specifysoftware.organd specific mapping-style paths. - Enforced the allowlist in
load_default_tree_json()by rejecting disallowed sources prior torequests.get.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| response = requests.get(source) | ||
| response.raise_for_status() |
There was a problem hiding this comment.
Added an explicit timeout to this request call to avoid hanging workers (requests.get(source, timeout=(5, 30))) in commit cc978b4.
Agent-Logs-Url: https://github.com/specify/specify7/sessions/00412ec3-6e06-4895-86a3-9a68e24bfcca Co-authored-by: grantfitzsimmons <37256050+grantfitzsimmons@users.noreply.github.com>
Agent-Logs-Url: https://github.com/specify/specify7/sessions/a3063a59-24c3-4060-b750-f79ae732ba54 Co-authored-by: grantfitzsimmons <37256050+grantfitzsimmons@users.noreply.github.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
specifyweb/backend/trees/default_tree_files.py (1)
197-207:⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy liftAdd allowlist validation before remote CSV fetch in
stream_default_tree_csv.The
urlparameter is user-controlled (from request body in views.py) and flows directly to_stream_remote_default_tree_csv, which makes arequests.get()call without the allowlist validation. Theload_default_tree_jsonfunction already validates URLs with_is_allowed_remote_mapping_url()before fetching—apply the same check instream_default_tree_csvbefore calling_stream_remote_default_tree_csv(source).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@specifyweb/backend/trees/default_tree_files.py` around lines 197 - 207, In stream_default_tree_csv, add the same allowlist check used in load_default_tree_json before delegating to _stream_remote_default_tree_csv: call _is_allowed_remote_mapping_url(source) after the local-file attempt fails and if it returns False raise a ValueError (or the same exception type used by load_default_tree_json) instead of proceeding to _stream_remote_default_tree_csv; this ensures unallowed user-controlled URLs are rejected prior to any requests.get() in _stream_remote_default_tree_csv.
🧹 Nitpick comments (1)
specifyweb/backend/trees/default_tree_files.py (1)
135-137: ⚡ Quick winConsider disabling redirect following for defense in depth.
requests.get()follows HTTP redirects by default. If the allowed host ever served an open redirect, an attacker could bypass the allowlist validation. Disabling redirects would strengthen this SSRF mitigation.Suggested change
- response = requests.get(source) + response = requests.get(source, allow_redirects=False)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@specifyweb/backend/trees/default_tree_files.py` around lines 135 - 137, The code uses requests.get(source) which follows redirects by default; change the fetch to disable redirect following (call requests.get with allow_redirects=False) in the same function where response and response.raise_for_status() are used, and after the request explicitly detect 3xx responses (e.g., inspect response.status_code or response.is_redirect/response.history) and treat them as errors (raise or return a clear failure) so an open redirect cannot bypass the allowlist validation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@specifyweb/backend/trees/default_tree_files.py`:
- Around line 197-207: In stream_default_tree_csv, add the same allowlist check
used in load_default_tree_json before delegating to
_stream_remote_default_tree_csv: call _is_allowed_remote_mapping_url(source)
after the local-file attempt fails and if it returns False raise a ValueError
(or the same exception type used by load_default_tree_json) instead of
proceeding to _stream_remote_default_tree_csv; this ensures unallowed
user-controlled URLs are rejected prior to any requests.get() in
_stream_remote_default_tree_csv.
---
Nitpick comments:
In `@specifyweb/backend/trees/default_tree_files.py`:
- Around line 135-137: The code uses requests.get(source) which follows
redirects by default; change the fetch to disable redirect following (call
requests.get with allow_redirects=False) in the same function where response and
response.raise_for_status() are used, and after the request explicitly detect
3xx responses (e.g., inspect response.status_code or
response.is_redirect/response.history) and treat them as errors (raise or return
a clear failure) so an open redirect cannot bypass the allowlist validation.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 9c9c7e7e-91bf-4525-926c-d6c6efbc7824
📒 Files selected for processing (1)
specifyweb/backend/trees/default_tree_files.py
| if not _is_allowed_remote_mapping_url(source): | ||
| raise ValueError('Remote mapping URL is not allowed.') | ||
|
|
||
| response = requests.get(source, timeout=(5, 30)) |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
specifyweb/backend/trees/default_tree_files.py (1)
48-54:⚠️ Potential issue | 🟠 Major | ⚡ Quick winTighten
/treerows/path matching.This branch only validates the decoded stem, so paths like
/treerows/%2e%2e/secret.txtor/treerows/foo.csvstill pass if the final stem is alphanumeric. That weakens the allowlist from/treerows/<discipline>.jsonto “some filename on the host.” Require an exact decoded path shape and.jsonsuffix before alias normalization.Suggested fix
if parsed.path.startswith('/treerows/'): - stem = Path(unquote(parsed.path)).stem.lower() + decoded_path = unquote(parsed.path) + if not re.fullmatch(r'/treerows/(?:col\d+_)?[a-z0-9_]+\.json', decoded_path.lower()): + return False + stem = Path(decoded_path).stem.lower() match = re.fullmatch(r'col\d+_(.+)', stem) if match is not None: stem = match.group(1)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@specifyweb/backend/trees/default_tree_files.py` around lines 48 - 54, The current `/treerows/` branch only validates the decoded filename stem which allows malicious paths like `/treerows/%2e%2e/secret.txt`; change the logic to first decode the full path and require an exact shape with a .json suffix (e.g. match decoded_path against a regex that enforces ^/treerows/(col\d+_)?([a-z0-9_]+)\.json$), capture the discipline stem from that match, then apply TREE_ROW_DISCIPLINE_ALIASES lookup and finally validate the normalized stem with the existing [a-z0-9_]+ check; use the captured groups from re.fullmatch on unquote(parsed.path) rather than Path(...).stem to avoid accepting directory traversal or other unexpected filenames.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@specifyweb/backend/trees/default_tree_files.py`:
- Line 135: The requests.get call currently follows redirects which allows SSRF
via redirect; update the calls in load_default_tree_json and
_stream_remote_default_tree_csv (the requests.get(...) call that sets
timeout=(5, 30)) to disable automatic redirects (set allow_redirects=False) and
then explicitly handle redirect responses by rejecting any 3xx status or
validating the final destination (response.headers["Location"] or response.url)
against your allowlist before proceeding; if you must follow a redirect, perform
a secondary validation of the resolved URL/domain and only allow it if it passes
the same allowlist check, otherwise return an error.
---
Outside diff comments:
In `@specifyweb/backend/trees/default_tree_files.py`:
- Around line 48-54: The current `/treerows/` branch only validates the decoded
filename stem which allows malicious paths like `/treerows/%2e%2e/secret.txt`;
change the logic to first decode the full path and require an exact shape with a
.json suffix (e.g. match decoded_path against a regex that enforces
^/treerows/(col\d+_)?([a-z0-9_]+)\.json$), capture the discipline stem from that
match, then apply TREE_ROW_DISCIPLINE_ALIASES lookup and finally validate the
normalized stem with the existing [a-z0-9_]+ check; use the captured groups from
re.fullmatch on unquote(parsed.path) rather than Path(...).stem to avoid
accepting directory traversal or other unexpected filenames.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 85118d45-ae4c-4cf7-ac8d-80333e072092
📒 Files selected for processing (1)
specifyweb/backend/trees/default_tree_files.py
Potential fix for https://github.com/specify/specify7/security/code-scanning/141
General fix: enforce a server-side allowlist for any remotely fetched mapping URL, and reject all other remote URLs before calling
requests.get. Keep existing local-path behavior unchanged.Best fix in this code: in
specifyweb/backend/trees/default_tree_files.py, add a validation helper that only permits HTTPS URLs tofiles.specifysoftware.organd only for known mapping paths under/treerows/that are already handled by local mapping logic (KNOWN_REMOTE_DEFAULT_TREE_PATHSplus the existing/treerows/*.jsondiscipline mapping pattern). Then call this validator inload_default_tree_jsonright beforerequests.get(source). If invalid, raiseValueErrorso existing callers’except Exceptionpaths return their current 404 error response without broader functional changes.This single change covers both alert variants because both flows call
load_default_tree_json(mapping_url).Suggested fixes powered by Copilot Autofix. Review carefully before merging.
Summary by CodeRabbit