feat(scan): add opt-in transitive reference scanning#225
Open
rodboev wants to merge 5 commits into
Open
Conversation
Signed-off-by: Rod Boev <rod.boev@gmail.com>
Signed-off-by: Rod Boev <rod.boev@gmail.com>
Signed-off-by: Rod Boev <rod.boev@gmail.com>
Signed-off-by: Rod Boev <rod.boev@gmail.com>
Signed-off-by: Rod Boev <rod.boev@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds opt-in transitive scanning for source references inside scanned skill content, including depth and allow or deny controls for follow-up traversal.
Closes #97
Root cause
The current scan pipeline resolves one input into one local directory, builds context from that tree, and writes the graph result from that single invocation. Recursive mode only repeats that one-hop scan for immediate local child skill directories, and report emitters assume every finding belongs to the directly requested source.
Diff Notes
file_cache, filters out adjacent non-source URLs, canonicalizes source identities, enforces the depth budget, and owns visited-set mutation.--transitive,--transitive-depth, and repeated allow or deny prefix controls toskillspector scan, then route both single-skill and recursive multi-skill entrypoints through the shared traversal helper.InputHandlerfor approved external targets so existing host allowlists, SSRF checks, clone or download handling, and archive protections stay authoritative, including archive URLs on allowed Git hosts.transitive_depthandsource_urltoFinding, preserve the root cleanup path through both the merged and zero-depth transitive paths, and include transitive provenance in baseline fingerprints so direct baselines do not suppress external findings with the same rule, file, and line span.Scope
This change stays on source types already supported by
InputHandler, visited-set safety, traversal depth, allow or deny prefix controls, and provenance in reports. It does not add a web crawler, new allowed hosts, MCP behavior, or any default behavior change when--transitiveis absent.Verification
.\.venv\Scripts\python.exe -m pytest tests\unit\test_transitive.py tests\unit\test_cli.py tests\nodes\test_report.py tests\nodes\test_sarif_rules_and_empty_findings.py -v(via the required headless command block; 89 passed)
uv run ruff check src/ tests/uv run ruff format --check src/ tests/