Skip to content

fix(analyzer): reduce instructional-prose false positives in static scans (#103)#232

Open
rodboev wants to merge 14 commits into
NVIDIA:mainfrom
rodboev:pr/static-prose-false-positive-103
Open

fix(analyzer): reduce instructional-prose false positives in static scans (#103)#232
rodboev wants to merge 14 commits into
NVIDIA:mainfrom
rodboev:pr/static-prose-false-positive-103

Conversation

@rodboev

@rodboev rodboev commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Summary

--no-llm static scans currently over-fire on benign documentation and layout-only content. This narrows anti-refusal emission to executable or ambiguous instructions, and filters MP2 layout-only spans that carry no semantic stuffing content.

Closes #103

Attribution: issue follow-up from @M8seven on 2026-06-25 sharpened the surviving scope with the whitespace and box-drawing MP2 repro plus the Never skip the corpus check warning prose case.

Root cause

static_patterns_anti_refusal.py emits AR findings after only a generic example penalty, so deny-lists, anti-examples, protective warnings, and defensive fixtures can look like active jailbreak instructions. static_patterns_memory_poisoning.py filters only one narrow repeated-capture case, so whitespace and box-drawing layout can still emit Context Window Stuffing.

Diff Notes

  • Add a private AR post-filter for benign documentation, deny-lists, anti-examples, tool declarations, and protective warning contexts.
  • Add a private MP2 post-filter for whitespace-only and box-drawing layout spans.
  • Convert existing anti-refusal false-positive xfails into passing regression tests and add focused MP2 layout coverage.
  • Preserve true positives for direct malicious instructions and semantic stuffing commands.

Scope

This stays in the analyzer layer. It does not change prompt-injection logic, CLI behavior, graph orchestration, report or SARIF schemas, provider code, or LLM-side mitigation.

Verification

  • ./.venv/Scripts/python.exe -m pytest tests/nodes/analyzers/test_static_patterns_anti_refusal.py tests/nodes/analyzers/test_static_patterns.py
  • uv run ruff check src/ tests/
  • uv run ruff format --check src/ tests/

rodboev added 14 commits June 29, 2026 11:43
…irectives (NVIDIA#103)

Signed-off-by: Rod Boev <rod.boev@gmail.com>
…ives (NVIDIA#103)

Signed-off-by: Rod Boev <rod.boev@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

False positives: --no-llm static pass flags documentation/teaching content as vulnerabilities (polarity-blind matching)

1 participant