Skip to content

Hide reasoning traces from IORails output#2088

Open
fallintoplace wants to merge 1 commit into
NVIDIA-NeMo:developfrom
fallintoplace:fix/iorails-hide-reasoning-content
Open

Hide reasoning traces from IORails output#2088
fallintoplace wants to merge 1 commit into
NVIDIA-NeMo:developfrom
fallintoplace:fix/iorails-hide-reasoning-content

Conversation

@fallintoplace

Copy link
Copy Markdown

Summary

This closes a safety gap in non-streaming IORails where reasoning traces could be reattached to the returned assistant message after output rails had already run.

Root cause

IORails.generate_async() extracted reasoning, ran output rails on the stripped answer text, and then prepended the reasoning back into the returned content as <think>...</think>. That allowed reasoning traces to bypass output rails and still be returned to the caller.

What changed

  • stop reattaching reasoning traces into non-streaming IORails response content
  • always strip leaked <think>...</think> blocks from returned content before output rails run
  • add focused regressions for both provider-side reasoning fields and embedded think tags

Why this approach

This is the smallest safety fix with the lowest review surface. It also aligns non-streaming IORails with the existing streaming behavior, which already drops reasoning from caller-visible output.

Validation

  • ./.venv/bin/python -m pytest tests/guardrails/test_iorails.py tests/guardrails/test_iorails_streaming.py

@github-actions github-actions Bot added size: S status: needs triage New issues that have not yet been reviewed or categorized. labels Jun 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size: S status: needs triage New issues that have not yet been reviewed or categorized.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant