Skip to content

feat: subprocess provider, SQLite LLM cache, meta-analyzer batching, and scan UX improvements#234

Open
scholesgx wants to merge 32 commits into
NVIDIA:mainfrom
scholesgx:feature/use-skillspector-inside-agent-session
Open

feat: subprocess provider, SQLite LLM cache, meta-analyzer batching, and scan UX improvements#234
scholesgx wants to merge 32 commits into
NVIDIA:mainfrom
scholesgx:feature/use-skillspector-inside-agent-session

Conversation

@scholesgx

@scholesgx scholesgx commented Jun 30, 2026

Copy link
Copy Markdown

What this PR does

The best time to scan a skill is the moment you're deciding whether to install it — which is usually inside an agent session (Claude Code, Cursor, Copilot, etc.), not before one. Until now, SkillSpector's LLM analysis required a separate API key configured outside the session, making it impractical to use the full scanner from within the same environment you're already working in.

This PR fixes that. With SKILLSPECTOR_PROVIDER=subprocess, you point SkillSpector at any AI CLI already available in your session — claude -p, antigravity ask, or similar — and it uses that as the LLM backend. No additional API key. No separate authentication. If you're already in an agent session, you can run a full LLM-powered scan right there.

Changes

New: subprocess provider

Run SkillSpector's LLM analysis from within your existing agent session — no separate API key required.

export SKILLSPECTOR_PROVIDER=subprocess
export SKILLSPECTOR_LLM_COMMAND="claude -p"
skillspector scan ./the-skill-im-about-to-install/

SubprocessProvider / SubprocessChatModel pipes each prompt to the configured shell command via stdin and reads the response from stdout. Works with any CLI-based AI: claude -p, antigravity ask, openclaw chat, or a custom wrapper script. Windows-compatible, with clear error messages when the command is missing or exits non-zero.

New: SQLite LLM response cache

LLMResponseCache persists LLM responses keyed by content hash to .skillspector-cache.db. Re-scanning the same skill content skips redundant LLM calls entirely. Particularly useful with the subprocess provider, where each call has visible latency in the session.

New: scan cost controls

  • --skip-meta — bypass the meta-analyzer LLM pass for fast iterative scanning. Cuts token cost ~40–60% at the cost of more false positives. Recommended during development; drop it for final/CI runs.
  • SKILLSPECTOR_META_BATCH_SIZE — tune findings-per-LLM-call without touching code.

New: multi-skill scanning

  • --recursive — scan a directory of skills independently, with per-skill risk scores and a combined summary.
  • --depth N — control nesting depth (default: 1).
  • --detail — include full finding objects in recursive JSON output.

New: test-fixture heuristics

AST4 and PE3 findings that match safe test-harness patterns are automatically downgraded to low confidence rather than flagged at full severity. --include-test-fixtures opts out when you want to audit test code at full sensitivity.

Other improvements

  • offensive_security classification — skills that declare this skip the score-based DO NOT INSTALL recommendation (security tools are expected to contain threat signatures).
  • Auto-discover .skillspector-baseline.yaml in the target directory; --no-baseline to opt out.
  • skillspector baseline writes to the target directory by default rather than CWD.
  • LLM progress emitted to stderr so JSON output to stdout stays parseable in CI pipelines.
  • LP1/LP3 remediation messages include accepted capability type names and a code snippet.
  • TP4 prompt hardened against enterprise injection-detection false triggers.
  • YARA YR1/YR4 confidence reduced when surrounding context is negation or educational.

Test plan

  • uv run pytest — full suite passes with no regressions
  • Static-only scan unchanged: skillspector scan ./my-skill/ --no-llm matches output from main
  • Subprocess provider — from within an active Claude Code session: SKILLSPECTOR_PROVIDER=subprocess SKILLSPECTOR_LLM_COMMAND="claude -p" skillspector scan ./my-skill/ completes full LLM analysis without a separate API key
  • --skip-meta visibly skips the meta-analyzer pass and runs faster
  • Second scan of the same content hits the cache (.skillspector-cache.db present, no LLM calls observed)
  • skillspector scan ./skills-dir/ --recursive produces independent per-skill results with a combined summary
  • skillspector baseline writes the baseline file into the scanned directory, not CWD

🤖 Generated with Claude Code

Gaylene Scholes and others added 30 commits June 24, 2026 15:12
…stale tests

- Move LLMMetaAnalyzer() inside the try block in meta_analyzer so init
  failures are caught gracefully instead of propagating to the CLI
- Add MODEL_CONFIG fallback for meta_analyzer model (was returning None
  when model_config state key is unset)
- Add exc_info=True to all four LLM node exception handlers so the next
  run with a real API key produces a full traceback for the NameError
- Update two stale test_meta_analyzer tests that expected CRITICAL
  findings to be dropped by LLM rejection; they now use MEDIUM severity
  (not protected by _HIGH_SEVERITY_FLOOR) and a new test explicitly
  asserts the floor behaviour for CRITICAL findings
- Format four files to satisfy ruff format --check

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements SubprocessChatModel (BaseChatModel subclass) with _generate()
and _call_subprocess() methods, plus full test coverage via
TestSubprocessChatModelGenerate (4 tests).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds the acceptance test plan for SKILLSPECTOR_PROVIDER=subprocess,
covering happy path, error handling, provider isolation, alternative
tools, and CLI/doc coverage (AT-01 to AT-34).

Criteria corrections applied after first run against the reinstalled
binary: exit code expectations updated to 1 for malicious_skill scans
(tool exits non-zero when risk_score > 50), and AT-03 JSON key corrected
from "findings" to "issues" to match the actual report schema.

All mandatory tests pass. Skips are due to unavailable optional
prerequisites (no antigravity/openclaw CLIs, no cloud API keys).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add _resolve_baseline_output() to pick <target-dir>/.skillspector-baseline.yaml
  when input_path is a local directory and --output is not given.
- Add _warn_if_overwriting() to print a warning with prior suppression count
  when a baseline file already exists at the resolved path.
- Change baseline() output parameter default from Path(".skillspector-baseline.yaml")
  to None so the new resolver controls placement.
- Add three TDD tests: target-dir placement, explicit --output override, overwrite warning.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…oblem 12)

Add _apply_negation_context_filter post-filter to static_yara.py that detects
negation words in finding context (cuts confidence by 50%, tags likely_false_positive)
and security-education section headers in file content (tags security_education).
Three TDD tests added to test_static_yara.py covering each scenario.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ger (Problem 1)

Replace 'IGNORE all instructions' phrasing in the TP4 analyzer system prompt
with evaluator-role framing that preserves analytical intent without triggering
subprocess provider injection detection. Add subprocess/SKILL.md context file
to orient Claude LLM backend sessions. Add regression test to guard the phrase.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…snippet (Problems 7 + 11)

- Add _ACCEPTED_PERMISSION_TYPES, _ACCEPTED_TYPES_STR, _CAP_TO_PERMISSION_TYPE constants
- Add _build_permissions_snippet() helper to generate copy-pasteable YAML
- LP1 remediation now names the canonical permission type and lists all accepted types
- LP3 remediation now appends a YAML permissions: block with detected capabilities
- Add test_lp1_remediation_lists_accepted_types and test_lp3_remediation_includes_snippet

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… hint (Problem 2)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… (Problem 5)

- behavioral_ast.py: add _is_test_file() + _is_subprocess_test_fixture() helpers;
  downgrade AST4 to confidence=0.15 + likely_test_fixture tag when shell=False +
  sys.executable pattern detected in a test_*.py file
- static_patterns_privilege_escalation.py: add _is_pe3_test_fixture() helper;
  downgrade PE3 /etc/passwd findings in test functions containing traversal-related
  keywords; rewrite node() to forward include_test_fixtures when flag is set
- state.py: add include_test_fixtures: bool field to SkillspectorState
- cli.py: add --include-test-fixtures flag to scan(); wire through _scan_state()
- tests: 3 AST4 + 3 PE3 test-fixture heuristic tests (TDD, red→green)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… to function name

- scan() docstring now documents --include-test-fixtures in a new Flags: section
- _is_pe3_test_fixture() combined regex requires keyword in def test_<keyword>
  function name rather than anywhere in the surrounding 15-line block, eliminating
  false-positives like test_foo calling sanitize_path('/etc/passwd')

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add depth parameter to detect_skills() and _find_skills_recursive() helper
for multi-level skill discovery; add --depth CLI flag to scan command;
update fallback warning to suggest --depth N+1 and --depth N+2.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…blem 4)

Add --detail flag to scan command; when used with --recursive --format json
--output, each skill entry in the JSON includes an issues[] array of full
Finding.to_dict() serializations. Without --detail the output is unchanged
(backward-compat). Restructures combined JSON from skills[] list to skills{}
dict keyed by relative path, with top-level summary{} section.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ion (Problem 13)

- Add skill_classification field to SkillspectorState
- build_context reads classification from SKILL.md frontmatter and cascades
  from a parent-directory skillspector.yaml (scope: offensive_security)
- report overrides risk_recommendation to "AUTHORIZED OFFENSIVE TOOL — review
  findings in context" when skill_classification == "offensive_security"
- Two new integration tests cover manifest-level and library-scope-yaml paths

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add analyzer_id param and _emit_progress() to LLMAnalyzerBase so users
see [LLM] <id>: <file> (requesting...) / (done, N findings) on stderr
during long LLM calls. Wire up analyzer_id in all three semantic analyzer
nodes and LLMMetaAnalyzer. Add 12 unit tests covering sync, async, empty-id
suppression, and per-batch progress.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds skip_meta: bool to SkillspectorState, an early-return check in
meta_analyzer() (before use_llm, so it bypasses LLM even when use_llm=True),
and a --skip-meta CLI flag wired through _scan_state(). When active, all
findings pass through with default remediations (fail-open fast path).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds LLMResponseCache (SQLite-backed) keyed by (content_hash, prompt_hash,
schema_version) so unchanged files skip repeated LLM calls across scan runs.
Integrates cache into LLMAnalyzerBase.run_batches / arun_batches and wires
llm_cache_dir through state → build_context → meta_analyzer.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…odule level

Pass llm_cache_dir from state as LLMResponseCache to all three semantic
analyzer nodes (semantic_security_discovery, semantic_quality_policy,
semantic_developer_intent) so their LLM calls are cached on repeated
scans of unchanged files — the same pattern already used in meta_analyzer.

Move the deferred `import json as _json` statements inside run_batches
and arun_batches in llm_analyzer_base.py to the module-level import block
(stdlib, alphabetically after asyncio) and update all references from
_json to json.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The _cache_key() method now correctly returns CacheKey instead of object,
which resolves mypy type errors at call sites (get/put in run_batches and
arun_batches). Removed unnecessary type: ignore comments that suppressed
these errors.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…em 3a)

Split findings into configurable groups before calling the meta-analyzer
LLM so large skill scans don't exceed model context limits. Each group
calls arun_batches independently; results are merged before apply_filter.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…oad in tests

- Move `from collections import Counter` from inside _split_files_into_batches()
  to module-level imports (stdlib section, alphabetically ordered)
- Add try/finally cleanup in test_meta_analyzer_batches_large_finding_sets and
  test_meta_analyzer_reads_batch_size_at_call_time to reload constants module
  after each test, preventing env var persistence across tests

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… TP4 cache exclusion

- Wire _PE3_TEST_FUNCTION_KEYWORDS into a precompiled _PE3_FIXTURE_FUNC_RE and
  use it in _is_pe3_test_fixture(), eliminating the dead constant and the
  duplicated inline pattern string.
- Add __del__ to LLMResponseCache so the SQLite connection is closed on GC,
  preventing Windows file locks in non-CPython runtimes.
- Add an explanatory comment above the chat_completion call in _check_tp4
  documenting why TP4 bypasses the LLM response cache.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Gaylene Scholes and others added 2 commits June 29, 2026 13:21
…lan files

- Reformat all markdown tables in README for consistent column alignment
- Fix string continuation indentation in cli.py help text and condense two multi-line expressions
- Add skillspector_bridge.py for external tool integration
- Add .skillspector-baseline.yaml scan baseline
- Add run_scan_with_llm.ps1 helper script
- Add skills/skillspector-operator skill definition
- Add docs/superpowers/plans/2026-06-26-skillspector-prd-enhancements.md planning doc

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…asion tests

Resolved conflicts preserving both sides:
- Providers table: added bedrock row alongside existing subprocess row
- Env vars table: added AWS_PROFILE/AWS_REGION alongside SKILLSPECTOR_LLM_COMMAND
- cli.py help text: includes bedrock, nv_inference, and subprocess in provider list
- static_patterns_privilege_escalation.py: kept include_test_fixtures param + PE3
  fixture heuristic, updated docstring to PE1–PE5 to reflect new PE5 patterns
- providers/__init__.py: kept both SubprocessProvider and BedrockProvider blocks
- test_behavioral_ast.py: kept TestAST4TestFixtureHeuristic + added builtins/
  importlib evasion tests from origin/main
- test_static_patterns.py: kept TestPE3TestFixtureHeuristic + added PE5/E5 tests
- test_meta_analyzer.py: deduplicated auto-merged tests, kept unique
  test_critical_finding_kept_when_rejected_by_llm, took multi-line _finding sig

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant