feat: subprocess provider, SQLite LLM cache, meta-analyzer batching, and scan UX improvements by scholesgx · Pull Request #234 · NVIDIA/SkillSpector

scholesgx · 2026-06-30T00:10:05Z

What this PR does

The best time to scan a skill is the moment you're deciding whether to install it — which is usually inside an agent session (Claude Code, Cursor, Copilot, etc.), not before one. Until now, SkillSpector's LLM analysis required a separate API key configured outside the session, making it impractical to use the full scanner from within the same environment you're already working in.

This PR fixes that. With SKILLSPECTOR_PROVIDER=subprocess, you point SkillSpector at any AI CLI already available in your session — claude -p, antigravity ask, or similar — and it uses that as the LLM backend. No additional API key. No separate authentication. If you're already in an agent session, you can run a full LLM-powered scan right there.

Changes

New: subprocess provider

Run SkillSpector's LLM analysis from within your existing agent session — no separate API key required.

export SKILLSPECTOR_PROVIDER=subprocess
export SKILLSPECTOR_LLM_COMMAND="claude -p"
skillspector scan ./the-skill-im-about-to-install/

SubprocessProvider / SubprocessChatModel pipes each prompt to the configured shell command via stdin and reads the response from stdout. Works with any CLI-based AI: claude -p, antigravity ask, openclaw chat, or a custom wrapper script. Windows-compatible, with clear error messages when the command is missing or exits non-zero.

New: SQLite LLM response cache

LLMResponseCache persists LLM responses keyed by content hash to .skillspector-cache.db. Re-scanning the same skill content skips redundant LLM calls entirely. Particularly useful with the subprocess provider, where each call has visible latency in the session.

New: scan cost controls

--skip-meta — bypass the meta-analyzer LLM pass for fast iterative scanning. Cuts token cost ~40–60% at the cost of more false positives. Recommended during development; drop it for final/CI runs.
SKILLSPECTOR_META_BATCH_SIZE — tune findings-per-LLM-call without touching code.

New: multi-skill scanning

--recursive — scan a directory of skills independently, with per-skill risk scores and a combined summary.
--depth N — control nesting depth (default: 1).
--detail — include full finding objects in recursive JSON output.

New: test-fixture heuristics

AST4 and PE3 findings that match safe test-harness patterns are automatically downgraded to low confidence rather than flagged at full severity. --include-test-fixtures opts out when you want to audit test code at full sensitivity.

Other improvements

offensive_security classification — skills that declare this skip the score-based DO NOT INSTALL recommendation (security tools are expected to contain threat signatures).
Auto-discover .skillspector-baseline.yaml in the target directory; --no-baseline to opt out.
skillspector baseline writes to the target directory by default rather than CWD.
LLM progress emitted to stderr so JSON output to stdout stays parseable in CI pipelines.
LP1/LP3 remediation messages include accepted capability type names and a code snippet.
TP4 prompt hardened against enterprise injection-detection false triggers.
YARA YR1/YR4 confidence reduced when surrounding context is negation or educational.

Test plan

uv run pytest — full suite passes with no regressions
Static-only scan unchanged: skillspector scan ./my-skill/ --no-llm matches output from main
Subprocess provider — from within an active Claude Code session: SKILLSPECTOR_PROVIDER=subprocess SKILLSPECTOR_LLM_COMMAND="claude -p" skillspector scan ./my-skill/ completes full LLM analysis without a separate API key
--skip-meta visibly skips the meta-analyzer pass and runs faster
Second scan of the same content hits the cache (.skillspector-cache.db present, no LLM calls observed)
skillspector scan ./skills-dir/ --recursive produces independent per-skill results with a combined summary
skillspector baseline writes the baseline file into the scanned directory, not CWD

🤖 Generated with Claude Code

…stale tests - Move LLMMetaAnalyzer() inside the try block in meta_analyzer so init failures are caught gracefully instead of propagating to the CLI - Add MODEL_CONFIG fallback for meta_analyzer model (was returning None when model_config state key is unset) - Add exc_info=True to all four LLM node exception handlers so the next run with a real API key produces a full traceback for the NameError - Update two stale test_meta_analyzer tests that expected CRITICAL findings to be dropped by LLM rejection; they now use MEDIUM severity (not protected by _HIGH_SEVERITY_FLOOR) and a new test explicitly asserts the floor behaviour for CRITICAL findings - Format four files to satisfy ruff format --check Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Implements SubprocessChatModel (BaseChatModel subclass) with _generate() and _call_subprocess() methods, plus full test coverage via TestSubprocessChatModelGenerate (4 tests). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…, single-strip

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…nv.example

…t, timeout handling

…vider

…cstrings, bandit nosec, 99% coverage

….md, and CLI help

Adds the acceptance test plan for SKILLSPECTOR_PROVIDER=subprocess, covering happy path, error handling, provider isolation, alternative tools, and CLI/doc coverage (AT-01 to AT-34). Criteria corrections applied after first run against the reinstalled binary: exit code expectations updated to 1 for malicious_skill scans (tool exits non-zero when risk_score > 50), and AT-03 JSON key corrected from "findings" to "issues" to match the actual report schema. All mandatory tests pass. Skips are due to unavailable optional prerequisites (no antigravity/openclaw CLIs, no cloud API keys). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Add _resolve_baseline_output() to pick <target-dir>/.skillspector-baseline.yaml when input_path is a local directory and --output is not given. - Add _warn_if_overwriting() to print a warning with prior suppression count when a baseline file already exists at the resolved path. - Change baseline() output parameter default from Path(".skillspector-baseline.yaml") to None so the new resolver controls placement. - Add three TDD tests: target-dir placement, explicit --output override, overwrite warning. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…oblem 12) Add _apply_negation_context_filter post-filter to static_yara.py that detects negation words in finding context (cuts confidence by 50%, tags likely_false_positive) and security-education section headers in file content (tags security_education). Three TDD tests added to test_static_yara.py covering each scenario. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ger (Problem 1) Replace 'IGNORE all instructions' phrasing in the TP4 analyzer system prompt with evaluator-role framing that preserves analytical intent without triggering subprocess provider injection detection. Add subprocess/SKILL.md context file to orient Claude LLM backend sessions. Add regression test to guard the phrase. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…snippet (Problems 7 + 11) - Add _ACCEPTED_PERMISSION_TYPES, _ACCEPTED_TYPES_STR, _CAP_TO_PERMISSION_TYPE constants - Add _build_permissions_snippet() helper to generate copy-pasteable YAML - LP1 remediation now names the canonical permission type and lists all accepted types - LP3 remediation now appends a YAML permissions: block with detected capabilities - Add test_lp1_remediation_lists_accepted_types and test_lp3_remediation_includes_snippet Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… hint (Problem 2) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… (Problem 5) - behavioral_ast.py: add _is_test_file() + _is_subprocess_test_fixture() helpers; downgrade AST4 to confidence=0.15 + likely_test_fixture tag when shell=False + sys.executable pattern detected in a test_*.py file - static_patterns_privilege_escalation.py: add _is_pe3_test_fixture() helper; downgrade PE3 /etc/passwd findings in test functions containing traversal-related keywords; rewrite node() to forward include_test_fixtures when flag is set - state.py: add include_test_fixtures: bool field to SkillspectorState - cli.py: add --include-test-fixtures flag to scan(); wire through _scan_state() - tests: 3 AST4 + 3 PE3 test-fixture heuristic tests (TDD, red→green) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… to function name - scan() docstring now documents --include-test-fixtures in a new Flags: section - _is_pe3_test_fixture() combined regex requires keyword in def test_<keyword> function name rather than anywhere in the surrounding 15-line block, eliminating false-positives like test_foo calling sanitize_path('/etc/passwd') Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…(Problem 10)

Add depth parameter to detect_skills() and _find_skills_recursive() helper for multi-level skill discovery; add --depth CLI flag to scan command; update fallback warning to suggest --depth N+1 and --depth N+2. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…blem 4) Add --detail flag to scan command; when used with --recursive --format json --output, each skill entry in the JSON includes an issues[] array of full Finding.to_dict() serializations. Without --detail the output is unchanged (backward-compat). Restructures combined JSON from skills[] list to skills{} dict keyed by relative path, with top-level summary{} section. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ion (Problem 13) - Add skill_classification field to SkillspectorState - build_context reads classification from SKILL.md frontmatter and cascades from a parent-directory skillspector.yaml (scope: offensive_security) - report overrides risk_recommendation to "AUTHORIZED OFFENSIVE TOOL — review findings in context" when skill_classification == "offensive_security" - Two new integration tests cover manifest-level and library-scope-yaml paths Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add analyzer_id param and _emit_progress() to LLMAnalyzerBase so users see [LLM] <id>: <file> (requesting...) / (done, N findings) on stderr during long LLM calls. Wire up analyzer_id in all three semantic analyzer nodes and LLMMetaAnalyzer. Add 12 unit tests covering sync, async, empty-id suppression, and per-batch progress. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds skip_meta: bool to SkillspectorState, an early-return check in meta_analyzer() (before use_llm, so it bypasses LLM even when use_llm=True), and a --skip-meta CLI flag wired through _scan_state(). When active, all findings pass through with default remediations (fail-open fast path). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds LLMResponseCache (SQLite-backed) keyed by (content_hash, prompt_hash, schema_version) so unchanged files skip repeated LLM calls across scan runs. Integrates cache into LLMAnalyzerBase.run_batches / arun_batches and wires llm_cache_dir through state → build_context → meta_analyzer. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…odule level Pass llm_cache_dir from state as LLMResponseCache to all three semantic analyzer nodes (semantic_security_discovery, semantic_quality_policy, semantic_developer_intent) so their LLM calls are cached on repeated scans of unchanged files — the same pattern already used in meta_analyzer. Move the deferred `import json as _json` statements inside run_batches and arun_batches in llm_analyzer_base.py to the module-level import block (stdlib, alphabetically after asyncio) and update all references from _json to json. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The _cache_key() method now correctly returns CacheKey instead of object, which resolves mypy type errors at call sites (get/put in run_batches and arun_batches). Removed unnecessary type: ignore comments that suppressed these errors. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…em 3a) Split findings into configurable groups before calling the meta-analyzer LLM so large skill scans don't exceed model context limits. Each group calls arun_batches independently; results are merged before apply_filter. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…oad in tests - Move `from collections import Counter` from inside _split_files_into_batches() to module-level imports (stdlib section, alphabetically ordered) - Add try/finally cleanup in test_meta_analyzer_batches_large_finding_sets and test_meta_analyzer_reads_batch_size_at_call_time to reload constants module after each test, preventing env var persistence across tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… TP4 cache exclusion - Wire _PE3_TEST_FUNCTION_KEYWORDS into a precompiled _PE3_FIXTURE_FUNC_RE and use it in _is_pe3_test_fixture(), eliminating the dead constant and the duplicated inline pattern string. - Add __del__ to LLMResponseCache so the SQLite connection is closed on GC, preventing Windows file locks in non-CPython runtimes. - Add an explanatory comment above the chat_completion call in _check_tp4 documenting why TP4 bypasses the LLM response cache. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…lan files - Reformat all markdown tables in README for consistent column alignment - Fix string continuation indentation in cli.py help text and condense two multi-line expressions - Add skillspector_bridge.py for external tool integration - Add .skillspector-baseline.yaml scan baseline - Add run_scan_with_llm.ps1 helper script - Add skills/skillspector-operator skill definition - Add docs/superpowers/plans/2026-06-26-skillspector-prd-enhancements.md planning doc Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…asion tests Resolved conflicts preserving both sides: - Providers table: added bedrock row alongside existing subprocess row - Env vars table: added AWS_PROFILE/AWS_REGION alongside SKILLSPECTOR_LLM_COMMAND - cli.py help text: includes bedrock, nv_inference, and subprocess in provider list - static_patterns_privilege_escalation.py: kept include_test_fixtures param + PE3 fixture heuristic, updated docstring to PE1–PE5 to reflect new PE5 patterns - providers/__init__.py: kept both SubprocessProvider and BedrockProvider blocks - test_behavioral_ast.py: kept TestAST4TestFixtureHeuristic + added builtins/ importlib evasion tests from origin/main - test_static_patterns.py: kept TestPE3TestFixtureHeuristic + added PE5/E5 tests - test_meta_analyzer.py: deduplicated auto-merged tests, kept unique test_critical_finding_kept_when_rejected_by_llm, took multi-line _finding sig Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Gaylene Scholes and others added 30 commits June 24, 2026 15:12

fix: widen with_structured_output signature, fix multi-modal fallback…

202b7f6

…, single-strip

feat: add SubprocessProvider implementing LLMProvider protocol

952477d

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: register subprocess provider in provider selector

4cf507e

docs: document subprocess provider and SKILLSPECTOR_LLM_COMMAND in .e…

288735d

…nv.example

fix: Windows shlex, ValueError on missing command, dict schema suppor…

eb49c59

…t, timeout handling

fix: add DEFAULT_MODEL and SLOT_DEFAULTS class attrs to SubprocessPro…

e23b624

…vider

fix: standards compliance — ruff B904/F401, mypy types, pydocstyle do…

0369fca

…cstrings, bandit nosec, 99% coverage

docs: add subprocess provider to README, DEVELOPMENT.md, PI_EXTENSION…

2a166ef

….md, and CLI help

fix: subprocess exit-code-1 enterprise diagnostic + --no-llm fallback…

74d5a90

… hint (Problem 2) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: auto-discover .skillspector-baseline.yaml + --no-baseline flag …

7a61253

…(Problem 10)

Gaylene Scholes and others added 2 commits June 29, 2026 13:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: subprocess provider, SQLite LLM cache, meta-analyzer batching, and scan UX improvements#234

feat: subprocess provider, SQLite LLM cache, meta-analyzer batching, and scan UX improvements#234
scholesgx wants to merge 32 commits into
NVIDIA:mainfrom
scholesgx:feature/use-skillspector-inside-agent-session

scholesgx commented Jun 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

scholesgx commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does

Changes

New: subprocess provider

New: SQLite LLM response cache

New: scan cost controls

New: multi-skill scanning

New: test-fixture heuristics

Other improvements

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

scholesgx commented Jun 30, 2026 •

edited

Loading