fix(mcp): coerce Role ORM objects to role names in UserInfo schema#40746
fix(mcp): coerce Role ORM objects to role names in UserInfo schema#40746goingforstudying-ctrl wants to merge 10 commits into
Conversation
Code Review Agent Run #819c93Actionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
There was a problem hiding this comment.
Pull request overview
This PR addresses MCP tool response serialization failures by ensuring UserInfo.roles can be validated when SQLAlchemy provides Role ORM objects (instead of strings), preventing Pydantic validation errors in MCP tool outputs.
Changes:
- Adds a
field_validator("roles", mode="before")onUserInfoto coerce role-like objects to role-name strings.
| @field_validator("roles", mode="before") | ||
| @classmethod | ||
| def _extract_role_names(cls, v: Any) -> list[str] | None: | ||
| """Coerce Role ORM objects to their .name strings.""" | ||
| if v is None: | ||
| return None | ||
| result: list[str] = [] | ||
| for item in v: | ||
| if isinstance(item, str): | ||
| result.append(item) | ||
| elif hasattr(item, "name"): | ||
| result.append(str(item.name)) | ||
| return result if result else None |
There was a problem hiding this comment.
Thanks for the review. The validator already rejects bare strings (line 114-116) and handles DetachedInstanceError (line 124-125). The test file also covers the from_attributes round-trip path. Let me know if there is a specific scenario you think is still missing.
| @field_validator("roles", mode="before") | ||
| @classmethod | ||
| def _extract_role_names(cls, v: Any) -> list[str] | None: |
There was a problem hiding this comment.
Thanks for the review. The validator already rejects bare strings (line 114-116) and handles DetachedInstanceError (line 124-125). The test file also covers the from_attributes round-trip path. Let me know if there is a specific scenario you think is still missing.
| result.append(item) | ||
| elif hasattr(item, "name"): | ||
| result.append(str(item.name)) | ||
| return result if result else None |
There was a problem hiding this comment.
Suggestion: The validator collapses an explicitly empty role list to None via return result if result else None. This breaks the existing contract in serialize_user_object, which intentionally distinguishes "roles included but empty" ([]) from "roles omitted/redacted" (None). Preserve an empty list when the input is an empty iterable so callers can still tell those states apart. [logic error]
Severity Level: Major ⚠️
- ⚠️ get_user_info conflates empty roles with redacted roles.
- ⚠️ Clients cannot distinguish unprivileged users from redacted metadata.Steps of Reproduction ✅
1. In `superset/mcp_service/user/schemas.py:58-67`, note that `serialize_user_object(user,
include_sensitive: bool, include_roles: bool)` is the centralized serializer used by both
`list_users` and `get_user_info` tools (see imports in
`superset/mcp_service/user/tool/list_users.py:29-39` and
`superset/mcp_service/user/tool/get_user_info.py:28-33`).
2. Consider the metadata-allowed path in `get_user_info` where
`user_can_view_data_model_metadata()` returns `True` (default in tests via fixture
`allow_data_model_metadata` at
`tests/unit_tests/mcp_service/user/tool/test_user_tools.py:84-99`); `get_user_info` then
calls `serialize_user_object(obj, include_sensitive=can_view_sensitive)` with
`include_sensitive=True` (`get_user_info.py:68-77`).
3. Construct or obtain a FAB user object whose `roles` relationship is an empty iterable
(e.g., via the same pattern as `create_mock_user` in
`tests/unit_tests/mcp_service/user/tool/test_user_tools.py:38-65`, but with `roles=[]` so
`user.roles` is `[]`). When passed to `serialize_user_object` with
`include_sensitive=True` and default `include_roles=True`, the block at
`user/schemas.py:71-77` sets local `roles` to an empty list `[]` (comprehension over an
empty iterable) instead of `None`, preserving the "roles included but empty" meaning at
this stage.
4. The serializer then instantiates `UserInfo` with `roles=[sanitize_for_llm_context(r,
field_path=("roles",)) for r in roles] if roles is not None else None`
(`user/schemas.py:80-95`). Because `roles` is `[]`, the argument passed into the
`UserInfo.roles` field is an empty list `[]`. The `@field_validator("roles",
mode="before")` `_extract_role_names` (`user/schemas.py:108-120`) receives `v=[]`, skips
the `v is None` branch (line 112), leaves `result` empty, and finally executes `return
result if result else None` (line 120), returning `None`. As a result, the constructed
`UserInfo` now has `roles=None`—indistinguishable from the "roles redacted / not included"
paths where `include_sensitive=False` (email and roles intentionally omitted), breaking
the previous semantic distinction between "no roles" (`[]`) and "redacted/omitted roles"
(`None`).Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** superset/mcp_service/user/schemas.py
**Line:** 120:120
**Comment:**
*Logic Error: The validator collapses an explicitly empty role list to `None` via `return result if result else None`. This breaks the existing contract in `serialize_user_object`, which intentionally distinguishes "roles included but empty" (`[]`) from "roles omitted/redacted" (`None`). Preserve an empty list when the input is an empty iterable so callers can still tell those states apart.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #40746 +/- ##
==========================================
- Coverage 64.17% 63.58% -0.59%
==========================================
Files 2654 2654
Lines 143763 143783 +20
Branches 33161 33166 +5
==========================================
- Hits 92260 91427 -833
- Misses 49887 50738 +851
- Partials 1616 1618 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
Pushed an update to the
Let me know if anything else looks off. |
✅ Deploy Preview for superset-docs-preview ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Code Review Agent Run #22db82
Actionable Suggestions - 1
-
superset/mcp_service/user/schemas.py - 1
- Duplicate role extraction logic · Line 118-125
Review Details
-
Files reviewed - 2 · Commit Range:
b7aa1f3..c397f7c- superset/mcp_service/user/schemas.py
- tests/unit_tests/mcp_service/user/test_schemas.py
-
Files skipped - 0
-
Tools
- Whispers (Secret Scanner) - ✔︎ Successful
- Detect-secrets (Secret Scanner) - ✔︎ Successful
- MyPy (Static Code Analysis) - ✔︎ Successful
- Astral Ruff (Static Code Analysis) - ✔︎ Successful
Bito Usage Guide
Commands
Type the following command in the pull request comment and save the comment.
-
/review- Manually triggers a full AI review. -
/pause- Pauses automatic reviews on this pull request. -
/resume- Resumes automatic reviews. -
/resolve- Marks all Bito-posted review comments as resolved. -
/abort- Cancels all in-progress reviews.
Refer to the documentation for additional commands.
Configuration
This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.
Documentation & Help
| for item in v: | ||
| if isinstance(item, str): | ||
| result.append(item) | ||
| elif hasattr(item, "name"): | ||
| try: | ||
| result.append(str(item.name)) | ||
| except DetachedInstanceError: | ||
| continue |
There was a problem hiding this comment.
The _extract_role_names validator duplicates the role-extraction logic already implemented in serialize_user_object (lines 298-303). This creates maintenance risk: if the extraction logic changes (e.g., adding error handling for edge cases), both locations must be updated in sync. Consider having the validator only handle type coercion (None/string validation) and let serialize_user_object remain the single source of truth for ORM-to-string conversion.
Code Review Run #22db82
Should Bito avoid suggestions like this for future reviews? (Manage Rules)
- Yes, avoid them
|
Good catch on the spacing. Added the missing blank line between the validator and the next field. |
Code Review Agent Run #718800Actionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
| elif hasattr(item, "name"): | ||
| try: | ||
| result.append(str(item.name)) | ||
| except DetachedInstanceError: | ||
| continue |
There was a problem hiding this comment.
hasattr is outside the try block. Python 3 hasattr only suppresses AttributeError — not DetachedInstanceError. If item is a real detached SQLAlchemy Role with name expired (e.g., after session.commit() with the default expire_on_commit=True), hasattr(item, "name") raises before the try block ever runs.
The existing serialize_user_object wraps both the hasattr call and the attribute access in try/except (AttributeError, DetachedInstanceError). The validator should follow the same pattern:
| elif hasattr(item, "name"): | |
| try: | |
| result.append(str(item.name)) | |
| except DetachedInstanceError: | |
| continue | |
| else: | |
| try: | |
| if hasattr(item, "name"): | |
| result.append(str(item.name)) | |
| except DetachedInstanceError: | |
| continue |
There was a problem hiding this comment.
Fixed, thanks for the review.
| role_good = MagicMock() | ||
| role_good.name = "Admin" | ||
| role_detached = MagicMock() | ||
| role_detached.name = MagicMock(side_effect=DetachedInstanceError()) |
There was a problem hiding this comment.
This mock fires the error at str(item.name) — MagicMock's __str__ goes through _mock_call which checks side_effect, so that branch is exercised. For a real detached SQLAlchemy instance, the error fires at hasattr(item, "name") before the try block (Python 3 hasattr only catches AttributeError). A property-based stub or __getattribute__ override raising DetachedInstanceError would cover the actual production failure point.
There was a problem hiding this comment.
hasattr is now inside the try block, so both paths are covered.
| elif hasattr(item, "name"): | ||
| try: | ||
| result.append(str(item.name)) | ||
| except DetachedInstanceError: |
There was a problem hiding this comment.
MEDIUM — Silent suppression with no diagnostic output. When a session closes before roles are loaded, the role is silently dropped. roles == [] from a user with three roles and an expired session is indistinguishable from roles == [] from a user with no roles. A logger.debug(...) before continue would make session-management issues diagnosable. (Prefer debug over warning here since the validator runs per-request.)
There was a problem hiding this comment.
Added logger.debug() call before continue.
| for item in v: | ||
| if isinstance(item, str): | ||
| result.append(item) | ||
| elif hasattr(item, "name"): |
There was a problem hiding this comment.
NIT — hasattr(item, "name") matches any object with a .name attribute (Python module, enum member, namedtuple). Low risk in practice since roles is only populated from the ORM relationship, but an isinstance(item.name, str) check inside the restructured try block (see adjacent comment) would prevent silently coercing unexpected types.
There was a problem hiding this comment.
Done, added isinstance(item.name, str) check.
Code Review Agent Run #ca073bActionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
ec7ac64 to
b784651
Compare
Code Review Agent Run #8f094eActionable Suggestions - 0Additional Suggestions - 1
Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
| from datetime import datetime, timezone | ||
| from typing import Annotated, Any, List, Literal | ||
|
|
||
| logger = logging.getLogger(__name__) |
There was a problem hiding this comment.
HIGH — logger = logging.getLogger(__name__) sits between the stdlib imports and the third-party imports. Ruff E402 fires on the from pydantic import ( block at line 28. Pre-commit CI is failing on this branch because of this ordering.
Move the logger assignment to after all imports (after the last from superset.mcp_service.utils.schema_utils import ... line).
| logger = logging.getLogger(__name__) |
Delete this line here and add logger = logging.getLogger(__name__) after line 47 (after the last import).
There was a problem hiding this comment.
Fixed — moved the logger after all imports.
There was a problem hiding this comment.
Fixed — moved logger = logging.getLogger(name) after all imports.
| logger.debug( | ||
| "Skipping role with detached instance in UserInfo.roles coercion" | ||
| ) | ||
| continue |
There was a problem hiding this comment.
NIT — continue is the last statement in the loop body here; there's nothing after the try/except block within the for item in v loop, so this is a no-op. Remove it.
There was a problem hiding this comment.
Removed the no-op continue.
There was a problem hiding this comment.
Fixed — removed the no-op continue.
There was a problem hiding this comment.
Removed — good catch, the continue was indeed a no-op at the end of the loop body.
Code Review Agent Run #ecceb6Actionable Suggestions - 0Filtered by Review RulesBito filtered these suggestions based on rules created automatically for your feedback. Manage rules.
Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
11c49f1 to
f2d9a88
Compare
Code Review Agent Run #e07374Actionable Suggestions - 0Filtered by Review RulesBito filtered these suggestions based on rules created automatically for your feedback. Manage rules.
Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
f2d9a88 to
40f9abf
Compare
Code Review Agent Run #715273Actionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
5327c63 to
a259d32
Compare
There was a problem hiding this comment.
Code Review Agent Run #64cd4a
Actionable Suggestions - 2
-
tests/unit_tests/mcp_service/user/test_schemas.py - 2
- Incomplete test assertions · Line 65-80
- Incomplete test assertions · Line 83-101
Review Details
-
Files reviewed - 2 · Commit Range:
f4d11ac..899150b- superset/mcp_service/user/schemas.py
- tests/unit_tests/mcp_service/user/test_schemas.py
-
Files skipped - 0
-
Tools
- Whispers (Secret Scanner) - ✔︎ Successful
- Detect-secrets (Secret Scanner) - ✔︎ Successful
- MyPy (Static Code Analysis) - ✔︎ Successful
- Astral Ruff (Static Code Analysis) - ✔︎ Successful
Bito Usage Guide
Commands
Type the following command in the pull request comment and save the comment.
-
/review- Manually triggers a full AI review. -
/pause- Pauses automatic reviews on this pull request. -
/resume- Resumes automatic reviews. -
/resolve- Marks all Bito-posted review comments as resolved. -
/abort- Cancels all in-progress reviews.
Refer to the documentation for additional commands.
Configuration
This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.
Documentation & Help
| def test_serialize_user_object_round_trip_with_empty_roles() -> None: | ||
| """serialize_user_object must produce UserInfo.roles == [] for empty roles.""" | ||
| user = MagicMock() | ||
| user.id = 1 | ||
| user.username = "admin" | ||
| user.first_name = "Admin" | ||
| user.last_name = "User" | ||
| user.active = True | ||
| user.email = "admin@example.com" | ||
| user.changed_on = None | ||
| user.roles = [] | ||
|
|
||
| info = serialize_user_object(user, include_sensitive=True, include_roles=True) | ||
|
|
||
| assert info is not None | ||
| assert info.roles == [] |
There was a problem hiding this comment.
Per rule [6262], this test claims to verify 'serialize_user_object round trip' but only asserts info.roles == []. The function transforms username, first_name, last_name, email, and changed_on via escape_llm_context_delimiters and sanitize_for_llm_context (schemas.py lines 314-328), which can alter string values. Missing assertions allow bugs in those transformations to go undetected.
Code suggestion
Check the AI-generated fix before applying
--- a/tests/unit_tests/mcp_service/user/test_schemas.py
+++ b/tests/unit_tests/mcp_service/user/test_schemas.py
@@ -74,9 +74,18 @@ def test_serialize_user_object_round_trip_with_empty_roles() -> None:
user.changed_on = None
user.roles = []
info = serialize_user_object(user, include_sensitive=True, include_roles=True)
assert info is not None
assert info.roles == []
+ assert info.username == "admin"
+ assert info.first_name == "Admin"
+ assert info.last_name == "User"
+ assert info.active is True
+ assert info.email == "admin@example.com"
+ assert info.changed_on is None
def test_serialize_user_object_round_trip_with_role_objects() -> None:
"""Full from_attributes path through serialize_user_object -> UserInfo."""
@@ -96,6 +105,11 @@ def test_serialize_user_object_round_trip_with_role_objects() -> None:
info = serialize_user_object(user, include_sensitive=True, include_roles=True)
assert info is not None
assert info.roles == ["Admin"]
+ assert info.username == "admin"
+ assert info.first_name == "Admin"
+ assert info.last_name == "User"
+ assert info.active is True
+ assert info.email == "admin@example.com"
+ assert info.changed_on is None
Code Review Run #64cd4a
The two serialize_user_object round-trip tests share ~15 lines of identical MagicMock setup for user attributes (id, username, first_name, last_name, active, email, changed_on). Extracting this to a helper or fixture would reduce maintenance burden and prevent divergence if attributes need to change.
Code Review Run #aa437d
Should Bito avoid suggestions like this for future reviews? (Manage Rules)
- Yes, avoid them
There was a problem hiding this comment.
Added assertions for all output fields in the round-trip tests — username, first_name, last_name, active, email, and changed_on.
There was a problem hiding this comment.
The added assertions in the round-trip tests are correct and address the reviewer's concern regarding incomplete test coverage. By explicitly asserting the values for username, first_name, last_name, active, email, and changed_on, the tests now properly verify that the serialization and transformation logic (including escape_llm_context_delimiters and sanitize_for_llm_context) is functioning as expected without introducing regressions.
tests/unit_tests/mcp_service/user/test_schemas.py
assert info.username == "admin"
assert info.first_name == "Admin"
assert info.last_name == "User"
assert info.active is True
assert info.email == "admin@example.com"
assert info.changed_on is None
| def test_serialize_user_object_round_trip_with_role_objects() -> None: | ||
| """Full from_attributes path through serialize_user_object -> UserInfo.""" | ||
| role_admin = MagicMock() | ||
| role_admin.name = "Admin" | ||
|
|
||
| user = MagicMock() | ||
| user.id = 1 | ||
| user.username = "admin" | ||
| user.first_name = "Admin" | ||
| user.last_name = "User" | ||
| user.active = True | ||
| user.email = "admin@example.com" | ||
| user.changed_on = None | ||
| user.roles = [role_admin] | ||
|
|
||
| info = serialize_user_object(user, include_sensitive=True, include_roles=True) | ||
|
|
||
| assert info is not None | ||
| assert info.roles == ["Admin"] |
There was a problem hiding this comment.
Per rule [6262], the test name 'round_trip_with_role_objects' promises full round-trip verification, but only roles is asserted. This mirrors the gap in test 5 and should be fixed simultaneously for consistency.
Code suggestion
Check the AI-generated fix before applying
--- a/tests/unit_tests/mcp_service/user/test_schemas.py
+++ b/tests/unit_tests/mcp_service/user/test_schemas.py
@@ -96,6 +96,11 @@ def test_serialize_user_object_round_trip_with_role_objects() -> None:
info = serialize_user_object(user, include_sensitive=True, include_roles=True)
assert info is not None
assert info.roles == ["Admin"]
+ assert info.username == "admin"
+ assert info.first_name == "Admin"
+ assert info.last_name == "User"
+ assert info.active is True
+ assert info.email == "admin@example.com"
+ assert info.changed_on is None
Code Review Run #64cd4a
Should Bito avoid suggestions like this for future reviews? (Manage Rules)
- Yes, avoid them
There was a problem hiding this comment.
Added assertions for all output fields in the round-trip tests — username, first_name, last_name, active, email, and changed_on.
There was a problem hiding this comment.
The suggestion to add assertions for all output fields in the round-trip test is appropriate. It ensures that the test fully verifies the serialized user object, consistent with the expectations set by the test name.
tests/unit_tests/mcp_service/user/test_schemas.py
assert info.roles == ["Admin"]
+ assert info.username == "admin"
+ assert info.first_name == "Admin"
+ assert info.last_name == "User"
+ assert info.active is True
+ assert info.email == "admin@example.com"
+ assert info.changed_on is None
Code Review Agent Run #f7560eActionable Suggestions - 0Filtered by Review RulesBito filtered these suggestions based on rules created automatically for your feedback. Manage rules.
Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
|
Hi @aminghadersohi, gentle follow-up — the CI workflows are still awaiting maintainer approval. Could you please approve them so the tests can run? Thanks for reviewing! |
aec6f67 to
0057159
Compare
| role_detached = MagicMock() | ||
| role_detached.name = MagicMock(side_effect=DetachedInstanceError()) |
There was a problem hiding this comment.
Suggestion: This mock setup does not actually raise DetachedInstanceError when item.name is accessed, so the test does not exercise the new exception-handling branch and can pass even if that logic breaks. Use a mock object/property that raises on attribute access to validate the intended detached-instance behavior. [code quality]
Severity Level: Major ⚠️
- ⚠️ DetachedInstanceError path in roles validator untested.
- ⚠️ Future regressions in detached-role handling may slip through.Steps of Reproduction ✅
1. Open `tests/unit_tests/mcp_service/user/test_schemas.py` and locate
`test_user_info_ignores_role_with_detached_instance` (lines 53–62), which is intended to
exercise the `DetachedInstanceError` handling in `_extract_role_names` at
`superset/mcp_service/user/schemas.py:125–132`.
2. Note that the test configures `role_detached = MagicMock()` and then
`role_detached.name = MagicMock(side_effect=DetachedInstanceError())` at lines 57–58,
meaning `role_detached.name` is a callable MagicMock whose side_effect fires only when it
is called.
3. In the validator `_extract_role_names` (lines 120–133 in
`superset/mcp_service/user/schemas.py`), the code path for non-string roles does `if
hasattr(item, "name") and isinstance(item.name, str): result.append(item.name)`; this
accesses `item.name` but never calls it, so the MagicMock side_effect is not triggered and
`DetachedInstanceError` is never raised during the test.
4. Run `pytest
tests/unit_tests/mcp_service/user/test_schemas.py::test_user_info_ignores_role_with_detached_instance`
and observe that the test passes even if you remove or break the `except
DetachedInstanceError` block in `_extract_role_names`, demonstrating that the
detached-instance exception branch is not actually covered by this test and regressions
there would go unnoticed.Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** tests/unit_tests/mcp_service/user/test_schemas.py
**Line:** 57:58
**Comment:**
*Code Quality: This mock setup does not actually raise `DetachedInstanceError` when `item.name` is accessed, so the test does not exercise the new exception-handling branch and can pass even if that logic breaks. Use a mock object/property that raises on attribute access to validate the intended detached-instance behavior.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fixThere was a problem hiding this comment.
Fixed — replaced the MagicMock with a small class whose property raises directly, so the exception now fires on attribute access rather than on conversion.
e9cf3cf to
e3a1878
Compare
|
|
||
| @field_validator("roles", mode="before") | ||
| @classmethod | ||
| def _extract_role_names(cls, v: Any) -> list[str] | None: |
There was a problem hiding this comment.
Suggestion: Add an explicit type hint for the cls parameter in the new validator method so all parameters in newly added method signatures are fully typed. [custom_rule]
Severity Level: Minor
Why it matters? 🤔
This is a newly added method, and its signature omits a type hint for cls. The custom rule requires new Python functions and methods to be fully typed, so this is a real violation.
Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** superset/mcp_service/user/schemas.py
**Line:** 113:113
**Comment:**
*Custom Rule: Add an explicit type hint for the `cls` parameter in the new validator method so all parameters in newly added method signatures are fully typed.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fixThere was a problem hiding this comment.
Fixed — added type hint for cls parameter, docstring for DetachedRole stub, and return type annotation on the name property.
| role_good = MagicMock() | ||
| role_good.name = "Admin" | ||
|
|
||
| class DetachedRole: |
There was a problem hiding this comment.
Suggestion: Add an inline docstring to the newly introduced class so the class-level behavior is documented. [custom_rule]
Severity Level: Minor
Why it matters? 🤔
The class is newly introduced in the changed file and has no docstring. This matches the custom rule requiring newly added Python classes to include inline docstrings.
Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** tests/unit_tests/mcp_service/user/test_schemas.py
**Line:** 58:58
**Comment:**
*Custom Rule: Add an inline docstring to the newly introduced class so the class-level behavior is documented.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix| @property | ||
| def name(self): |
There was a problem hiding this comment.
Suggestion: Add a return type annotation to this new property method so the method signature is fully typed. [custom_rule]
Severity Level: Minor
Why it matters? 🤔
The newly added property method omits a return type annotation. Since the surrounding code is newly added, the custom rule requires new Python methods to be fully typed, so this is a real violation.
Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** tests/unit_tests/mcp_service/user/test_schemas.py
**Line:** 59:60
**Comment:**
*Custom Rule: Add a return type annotation to this new property method so the method signature is fully typed.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix| if isinstance(item, str): | ||
| result.append(item) | ||
| continue | ||
| try: | ||
| if hasattr(item, "name") and isinstance(item.name, str): | ||
| result.append(item.name) |
There was a problem hiding this comment.
Suggestion: Role names are now emitted in MCP responses, but they are passed through unchanged and are not sanitized/escaped for LLM context like other user-facing text fields in this serializer. A crafted role name can inject control delimiters or prompt text into downstream LLM context; sanitize/escape each role name before adding it to the output. [security]
Severity Level: Major ⚠️
- ⚠️ get_user_info MCP tool exposes unsanitized role names.
- ⚠️ list_users MCP tool returns raw role names to LLM clients.
- ⚠️ Role metadata can carry unwrapped, instruction-like text.Steps of Reproduction ✅
1. In `superset/mcp_service/user/schemas.py:94-109`, the `UserInfo` schema defines `roles:
list[str] | None`, and the `_extract_role_names` validator at
`superset/mcp_service/user/schemas.py:111-133` converts ORM Role objects into plain
strings but does not apply `escape_llm_context_delimiters` or `sanitize_for_llm_context`
to those strings.
2. The serializer `serialize_user_object` in `superset/mcp_service/user/schemas.py:32-68`
builds the UserInfo instance for MCP tools: it sanitizes `username` using
`escape_llm_context_delimiters` and wraps `first_name` and `last_name` with
`sanitize_for_llm_context` (lines 55-61), but it passes `roles` through unchanged as the
list of `r.name` values collected at lines 45-52.
3. The MCP tool `get_user_info`
(`superset/mcp_service/user/tool/get_user_info.py:24-27,52-60`) calls this serializer via
`_serializer` and returns the resulting `UserInfo` to the MCP client; similarly,
`list_users` (`superset/mcp_service/user/tool/list_users.py:12-19,21-33,35-65`) uses
`serialize_user_object` for each row and then emits the response via
`result.model_dump(mode="json", context=...)`, exposing `UserInfo.roles` directly in tool
JSON consumed by LLM agents.
4. Given the sanitization utilities in
`superset/mcp_service/utils/sanitization.py:38-41,80-93,98-181` are explicitly designed to
wrap and/or escape user-controlled strings before placing them in LLM context, the fact
that role names bypass these utilities means that a role whose `name` contains prompt-like
content or custom delimiters (e.g. long instructions or tokens intended to look like
`<UNTRUSTED-CONTENT>` blocks) will be surfaced verbatim in MCP tool JSON and into
downstream LLM prompts, unlike other user-facing fields on UserInfo that are sanitized.Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** superset/mcp_service/user/schemas.py
**Line:** 122:127
**Comment:**
*Security: Role names are now emitted in MCP responses, but they are passed through unchanged and are not sanitized/escaped for LLM context like other user-facing text fields in this serializer. A crafted role name can inject control delimiters or prompt text into downstream LLM context; sanitize/escape each role name before adding it to the output.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fixThere was a problem hiding this comment.
Fixed. Applied escape_llm_context_delimiters to each role name in serialize_user_object, matching the sanitization already applied to username and email fields.
|
Hi @aminghadersohi, gentle follow-up — the CI workflows are still awaiting maintainer approval. Could you please approve them so the tests can run? Thanks for reviewing! |
Code Review Agent Run #29fbd4Actionable Suggestions - 0Additional Suggestions - 1
Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
40d1aae to
75dc11e
Compare
|
Hi @aminghadersohi, gentle follow-up — the CI workflows are still awaiting maintainer approval. Could you please approve them so the tests can run? Thanks for reviewing! |
Code Review Agent Run #5a4084Actionable Suggestions - 0Additional Suggestions - 1
Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
75dc11e to
6fd5ed7
Compare
|
Fixed — roles now use sanitize_for_llm_context with the same field_path=("roles",) wrapping as first_name/last_name, so all user fields are consistent in LLM context output. |
|
Hi @aminghadersohi, gentle follow-up — the CI workflows are still awaiting maintainer approval. Could you please approve them so the tests can run? Thanks for reviewing! |
…pache#40733) The UserInfo Pydantic model in mcp_service/user/schemas.py has `from_attributes=True`, which causes Pydantic to map ORM attributes directly. When `user.roles` returns Role ORM objects instead of strings, the serializer fails with: "Input should be a valid string [type=string_type, input_value=Admin, input_type=Role]" This breaks MCP tools (get_dataset_info, get_chart_info, generate_chart) for any user with multiple roles assigned. Changes: - Add a `field_validator("roles", mode=before)` to UserInfo that coerces Role objects to their `.name` strings while preserving backward compatibility for already-string inputs Fixes apache#40733
…n UserInfo validator - Reject bare strings for roles so Pydantic's list[str] validation is preserved. - Preserve empty role lists as [] instead of collapsing to None. - Handle DetachedInstanceError when reading Role.name. - Add unit tests for the coercion path including from_attributes scenario.
- Move hasattr(item, 'name') inside try/except to catch DetachedInstanceError that can fire during attribute access on detached SQLAlchemy instances - Add isinstance(item.name, str) guard to avoid matching non-string .name attrs - Add logger.debug() before skipping detached instances for observability - Refactor to early-continue pattern for plain strings
…user fields Bito review pointed out that roles used escape_llm_context_delimiters while first_name/last_name used sanitize_for_llm_context, creating inconsistent LLM context wrapping. Role names now get the same delimiter wrapping as other user fields.
6fd5ed7 to
3e5ad43
Compare
|
Hi @aminghadersohi, gentle follow-up — the CI workflows are still awaiting maintainer approval. Could you please approve the workflows when you have a moment? Thanks! |
| if user_roles is not None: | ||
| try: | ||
| roles = [r.name for r in user_roles if hasattr(r, "name")] | ||
| roles = [sanitize_for_llm_context(r.name, field_path=("roles",)) for r in user_roles if hasattr(r, "name") and isinstance(r.name, str)] |
There was a problem hiding this comment.
high: sanitize_for_llm_context(r.name, field_path=("roles",)) wraps each name as "<UNTRUSTED-CONTENT>\nAdmin\n</UNTRUSTED-CONTENT>" — so test_serialize_user_object_round_trip_with_role_objects (test_schemas.py:106) now asserts info.roles == ["Admin"] against the wrapped string and will fail. Pre-commit/CI will break.
Either update the test assertion to match the wrapped output, or move the sanitize_for_llm_context call into _extract_role_names so both construction paths produce consistently sanitized values and the test expectation is clear.
There was a problem hiding this comment.
Good catch. Role names are operational identifiers like username and email, so escape_llm_context_delimiters fits better here — it escapes the delimiter tokens without wrapping the string content. Switched to that in the latest push.
… sanitize_for_llm_context Role names are operational identifiers (like username and email), not free-form user text. Using sanitize_for_llm_context wraps each name in UNTRUSTED-CONTENT delimiters, breaking round-trip tests. escape_llm_context_delimiters provides defense-in-depth without changing string identity.
The continue in the except DetachedInstanceError block was the last statement of the loop body, making it a no-op.
Code Review Agent Run #3d2130Actionable Suggestions - 0Filtered by Review RulesBito filtered these suggestions based on rules created automatically for your feedback. Manage rules.
Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
|
Hi @aminghadersohi, gentle follow-up — the CI workflows are still awaiting maintainer approval. I've addressed all review feedback (ruff E402 logger placement, vacuous ternary, role sanitization, etc.). Could you please approve the workflows so the tests can run when you have a moment? Thanks! |
Fixes #40733
Problem
Multiple MCP tools (
get_dataset_info,get_chart_info,generate_chart) fail with a Pydantic validation error when the calling user has Superset roles assigned:The
UserInfomodel usesfrom_attributes=True, so Pydantic mapsuser.rolesdirectly. With the SQLAlchemy ORM,rolesreturnsRoleobjects—not strings—causing serialization to fail.Fix
Add a
field_validator("roles", mode=before)toUserInfothat coerces each item:str, keep it.nameattribute (Role ORM object), extractstr(item.name)This is idempotent and backward-compatible.
Impact
generate_chartwithsave_chart=trueno longer silently saves while reporting failure