feat: subprocess provider, SQLite LLM cache, meta-analyzer batching, and scan UX improvements#234
Open
scholesgx wants to merge 32 commits into
Open
Conversation
…stale tests - Move LLMMetaAnalyzer() inside the try block in meta_analyzer so init failures are caught gracefully instead of propagating to the CLI - Add MODEL_CONFIG fallback for meta_analyzer model (was returning None when model_config state key is unset) - Add exc_info=True to all four LLM node exception handlers so the next run with a real API key produces a full traceback for the NameError - Update two stale test_meta_analyzer tests that expected CRITICAL findings to be dropped by LLM rejection; they now use MEDIUM severity (not protected by _HIGH_SEVERITY_FLOOR) and a new test explicitly asserts the floor behaviour for CRITICAL findings - Format four files to satisfy ruff format --check Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements SubprocessChatModel (BaseChatModel subclass) with _generate() and _call_subprocess() methods, plus full test coverage via TestSubprocessChatModelGenerate (4 tests). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…t, timeout handling
…cstrings, bandit nosec, 99% coverage
….md, and CLI help
Adds the acceptance test plan for SKILLSPECTOR_PROVIDER=subprocess, covering happy path, error handling, provider isolation, alternative tools, and CLI/doc coverage (AT-01 to AT-34). Criteria corrections applied after first run against the reinstalled binary: exit code expectations updated to 1 for malicious_skill scans (tool exits non-zero when risk_score > 50), and AT-03 JSON key corrected from "findings" to "issues" to match the actual report schema. All mandatory tests pass. Skips are due to unavailable optional prerequisites (no antigravity/openclaw CLIs, no cloud API keys). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add _resolve_baseline_output() to pick <target-dir>/.skillspector-baseline.yaml
when input_path is a local directory and --output is not given.
- Add _warn_if_overwriting() to print a warning with prior suppression count
when a baseline file already exists at the resolved path.
- Change baseline() output parameter default from Path(".skillspector-baseline.yaml")
to None so the new resolver controls placement.
- Add three TDD tests: target-dir placement, explicit --output override, overwrite warning.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…oblem 12) Add _apply_negation_context_filter post-filter to static_yara.py that detects negation words in finding context (cuts confidence by 50%, tags likely_false_positive) and security-education section headers in file content (tags security_education). Three TDD tests added to test_static_yara.py covering each scenario. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ger (Problem 1) Replace 'IGNORE all instructions' phrasing in the TP4 analyzer system prompt with evaluator-role framing that preserves analytical intent without triggering subprocess provider injection detection. Add subprocess/SKILL.md context file to orient Claude LLM backend sessions. Add regression test to guard the phrase. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…snippet (Problems 7 + 11) - Add _ACCEPTED_PERMISSION_TYPES, _ACCEPTED_TYPES_STR, _CAP_TO_PERMISSION_TYPE constants - Add _build_permissions_snippet() helper to generate copy-pasteable YAML - LP1 remediation now names the canonical permission type and lists all accepted types - LP3 remediation now appends a YAML permissions: block with detected capabilities - Add test_lp1_remediation_lists_accepted_types and test_lp3_remediation_includes_snippet Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… hint (Problem 2) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… (Problem 5) - behavioral_ast.py: add _is_test_file() + _is_subprocess_test_fixture() helpers; downgrade AST4 to confidence=0.15 + likely_test_fixture tag when shell=False + sys.executable pattern detected in a test_*.py file - static_patterns_privilege_escalation.py: add _is_pe3_test_fixture() helper; downgrade PE3 /etc/passwd findings in test functions containing traversal-related keywords; rewrite node() to forward include_test_fixtures when flag is set - state.py: add include_test_fixtures: bool field to SkillspectorState - cli.py: add --include-test-fixtures flag to scan(); wire through _scan_state() - tests: 3 AST4 + 3 PE3 test-fixture heuristic tests (TDD, red→green) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… to function name
- scan() docstring now documents --include-test-fixtures in a new Flags: section
- _is_pe3_test_fixture() combined regex requires keyword in def test_<keyword>
function name rather than anywhere in the surrounding 15-line block, eliminating
false-positives like test_foo calling sanitize_path('/etc/passwd')
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add depth parameter to detect_skills() and _find_skills_recursive() helper for multi-level skill discovery; add --depth CLI flag to scan command; update fallback warning to suggest --depth N+1 and --depth N+2. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…blem 4)
Add --detail flag to scan command; when used with --recursive --format json
--output, each skill entry in the JSON includes an issues[] array of full
Finding.to_dict() serializations. Without --detail the output is unchanged
(backward-compat). Restructures combined JSON from skills[] list to skills{}
dict keyed by relative path, with top-level summary{} section.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ion (Problem 13) - Add skill_classification field to SkillspectorState - build_context reads classification from SKILL.md frontmatter and cascades from a parent-directory skillspector.yaml (scope: offensive_security) - report overrides risk_recommendation to "AUTHORIZED OFFENSIVE TOOL — review findings in context" when skill_classification == "offensive_security" - Two new integration tests cover manifest-level and library-scope-yaml paths Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add analyzer_id param and _emit_progress() to LLMAnalyzerBase so users see [LLM] <id>: <file> (requesting...) / (done, N findings) on stderr during long LLM calls. Wire up analyzer_id in all three semantic analyzer nodes and LLMMetaAnalyzer. Add 12 unit tests covering sync, async, empty-id suppression, and per-batch progress. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds skip_meta: bool to SkillspectorState, an early-return check in meta_analyzer() (before use_llm, so it bypasses LLM even when use_llm=True), and a --skip-meta CLI flag wired through _scan_state(). When active, all findings pass through with default remediations (fail-open fast path). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds LLMResponseCache (SQLite-backed) keyed by (content_hash, prompt_hash, schema_version) so unchanged files skip repeated LLM calls across scan runs. Integrates cache into LLMAnalyzerBase.run_batches / arun_batches and wires llm_cache_dir through state → build_context → meta_analyzer. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…odule level Pass llm_cache_dir from state as LLMResponseCache to all three semantic analyzer nodes (semantic_security_discovery, semantic_quality_policy, semantic_developer_intent) so their LLM calls are cached on repeated scans of unchanged files — the same pattern already used in meta_analyzer. Move the deferred `import json as _json` statements inside run_batches and arun_batches in llm_analyzer_base.py to the module-level import block (stdlib, alphabetically after asyncio) and update all references from _json to json. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The _cache_key() method now correctly returns CacheKey instead of object, which resolves mypy type errors at call sites (get/put in run_batches and arun_batches). Removed unnecessary type: ignore comments that suppressed these errors. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…em 3a) Split findings into configurable groups before calling the meta-analyzer LLM so large skill scans don't exceed model context limits. Each group calls arun_batches independently; results are merged before apply_filter. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…oad in tests - Move `from collections import Counter` from inside _split_files_into_batches() to module-level imports (stdlib section, alphabetically ordered) - Add try/finally cleanup in test_meta_analyzer_batches_large_finding_sets and test_meta_analyzer_reads_batch_size_at_call_time to reload constants module after each test, preventing env var persistence across tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… TP4 cache exclusion - Wire _PE3_TEST_FUNCTION_KEYWORDS into a precompiled _PE3_FIXTURE_FUNC_RE and use it in _is_pe3_test_fixture(), eliminating the dead constant and the duplicated inline pattern string. - Add __del__ to LLMResponseCache so the SQLite connection is closed on GC, preventing Windows file locks in non-CPython runtimes. - Add an explanatory comment above the chat_completion call in _check_tp4 documenting why TP4 bypasses the LLM response cache. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…lan files - Reformat all markdown tables in README for consistent column alignment - Fix string continuation indentation in cli.py help text and condense two multi-line expressions - Add skillspector_bridge.py for external tool integration - Add .skillspector-baseline.yaml scan baseline - Add run_scan_with_llm.ps1 helper script - Add skills/skillspector-operator skill definition - Add docs/superpowers/plans/2026-06-26-skillspector-prd-enhancements.md planning doc Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…asion tests Resolved conflicts preserving both sides: - Providers table: added bedrock row alongside existing subprocess row - Env vars table: added AWS_PROFILE/AWS_REGION alongside SKILLSPECTOR_LLM_COMMAND - cli.py help text: includes bedrock, nv_inference, and subprocess in provider list - static_patterns_privilege_escalation.py: kept include_test_fixtures param + PE3 fixture heuristic, updated docstring to PE1–PE5 to reflect new PE5 patterns - providers/__init__.py: kept both SubprocessProvider and BedrockProvider blocks - test_behavioral_ast.py: kept TestAST4TestFixtureHeuristic + added builtins/ importlib evasion tests from origin/main - test_static_patterns.py: kept TestPE3TestFixtureHeuristic + added PE5/E5 tests - test_meta_analyzer.py: deduplicated auto-merged tests, kept unique test_critical_finding_kept_when_rejected_by_llm, took multi-line _finding sig Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does
The best time to scan a skill is the moment you're deciding whether to install it — which is usually inside an agent session (Claude Code, Cursor, Copilot, etc.), not before one. Until now, SkillSpector's LLM analysis required a separate API key configured outside the session, making it impractical to use the full scanner from within the same environment you're already working in.
This PR fixes that. With
SKILLSPECTOR_PROVIDER=subprocess, you point SkillSpector at any AI CLI already available in your session —claude -p,antigravity ask, or similar — and it uses that as the LLM backend. No additional API key. No separate authentication. If you're already in an agent session, you can run a full LLM-powered scan right there.Changes
New: subprocess provider
Run SkillSpector's LLM analysis from within your existing agent session — no separate API key required.
SubprocessProvider/SubprocessChatModelpipes each prompt to the configured shell command via stdin and reads the response from stdout. Works with any CLI-based AI:claude -p,antigravity ask,openclaw chat, or a custom wrapper script. Windows-compatible, with clear error messages when the command is missing or exits non-zero.New: SQLite LLM response cache
LLMResponseCachepersists LLM responses keyed by content hash to.skillspector-cache.db. Re-scanning the same skill content skips redundant LLM calls entirely. Particularly useful with the subprocess provider, where each call has visible latency in the session.New: scan cost controls
--skip-meta— bypass the meta-analyzer LLM pass for fast iterative scanning. Cuts token cost ~40–60% at the cost of more false positives. Recommended during development; drop it for final/CI runs.SKILLSPECTOR_META_BATCH_SIZE— tune findings-per-LLM-call without touching code.New: multi-skill scanning
--recursive— scan a directory of skills independently, with per-skill risk scores and a combined summary.--depth N— control nesting depth (default: 1).--detail— include full finding objects in recursive JSON output.New: test-fixture heuristics
AST4 and PE3 findings that match safe test-harness patterns are automatically downgraded to low confidence rather than flagged at full severity.
--include-test-fixturesopts out when you want to audit test code at full sensitivity.Other improvements
offensive_securityclassification — skills that declare this skip the score-based DO NOT INSTALL recommendation (security tools are expected to contain threat signatures)..skillspector-baseline.yamlin the target directory;--no-baselineto opt out.skillspector baselinewrites to the target directory by default rather than CWD.Test plan
uv run pytest— full suite passes with no regressionsskillspector scan ./my-skill/ --no-llmmatches output from mainSKILLSPECTOR_PROVIDER=subprocess SKILLSPECTOR_LLM_COMMAND="claude -p" skillspector scan ./my-skill/completes full LLM analysis without a separate API key--skip-metavisibly skips the meta-analyzer pass and runs faster.skillspector-cache.dbpresent, no LLM calls observed)skillspector scan ./skills-dir/ --recursiveproduces independent per-skill results with a combined summaryskillspector baselinewrites the baseline file into the scanned directory, not CWD🤖 Generated with Claude Code