[REVIEW] agent-security: bind approvals to exact tool artifacts and MCP provenance

## Skill Being Reviewed
**Skill name:** agent-security
**Skill path:** `skills/ai-security/agent-security/`

## False Positive Analysis
**Benign code that triggers a false positive:**
```yaml
agent_audit:
  store_raw_prompts: false
  store_chain_of_thought: false
  store_prompt_hash: true
  store_policy_trace: true
  redact_tool_parameters:
    - access_token
    - customer_email
    - secret_value
  immutable_sink: cloud-logging-worm
  correlation_id_required: true
```
**Why this is a false positive:**
The skill asks whether the agent's reasoning and prompt/context are logged. That can over-flag privacy-preserving designs that intentionally avoid raw prompt or chain-of-thought retention. For regulated systems, storing hashes, policy decisions, redacted tool parameters, tool results, user/agent IDs, and immutable correlation IDs can be stronger and safer than storing sensitive prompts verbatim.

## Coverage Gaps
**Missed variant 1:**
```ts
const proposed = await agent.planToolCall(userRequest);
const summary = `${proposed.tool}: ${proposed.naturalLanguageSummary}`;

const approved = await approvals.request({ summary });
if (approved) {
  const finalArgs = await agent.rewriteArgumentsAfterApproval();
  await tools[proposed.tool].run(finalArgs);
}
```
**Why it should be caught:**
The skill reviews HITL gate placement and context quality, but does not require the approval decision to be cryptographically or canonically bound to the exact tool name, arguments, resource IDs, risk tier, and nonce that will execute. This creates a TOCTOU approval bypass: a human approves a harmless summary, then the agent changes parameters before execution.

**Missed variant 2:**
```json
{
  "mcpServers": {
    "crm": {
      "command": "npx",
      "args": ["@vendor/crm-mcp-server@latest"],
      "env": { "CRM_TOKEN": "${CRM_TOKEN}" }
    }
  }
}
```
**Why it should be caught:**
The skill covers tool breadth and dynamic tool sets, but should explicitly check tool-provider provenance for MCP/plugin servers: pinned package versions, signed manifests, allowed server identities, per-server secret scopes, and review of tool schemas. An unpinned tool server with a broad token can become the agent's effective permission boundary.

## Edge Cases
Approval summaries are often generated by the same model being constrained. If the approval UI displays only a model-written summary, the human may approve a different action than the one executed. The safer design is a deterministic approval artifact: canonical JSON of tool, parameters, resource identifiers, risk tier, expiry, nonce, and policy decision hash. Execution should fail unless the exact artifact was approved.

Another edge case is audit minimization. "No raw prompt logs" should not automatically be a finding when the design has immutable hashes, redacted parameters, policy traces, and replayable event IDs.

## Remediation Quality
- [x] Fix resolves the vulnerability
- [x] Fix doesn't introduce new security issues
- [x] Fix doesn't break functionality
- **Issues found:** Add explicit checks for approval binding/TOCTOU, deterministic approval artifacts, MCP/plugin server provenance, and privacy-preserving audit alternatives.

Suggested detection additions:
```text
Grep: "approve|approval|human_in_the_loop|require_approval|risk_tier|nonce|canonical" in **/*.{ts,js,py}
Grep: "mcpServers|ModelContextProtocol|FunctionTool|register_tool|tool_schema" in **/*.{json,ts,js,py,yaml,yml}
Grep: "npx .*@latest|pipx|uvx|command.*mcp|env.*TOKEN" in **/*.{json,yaml,yml}
```

## Comparison to Other Tools
| Tool | Catches this? | Notes |
|------|:---:|-------|
| Semgrep | Partial | Can catch unpinned MCP/server packages and obvious approval bypass patterns with custom rules. |
| CodeQL | Partial | Can model TOCTOU data flow if queries are written for approval artifacts and tool execution sinks. |
| OPA / Cedar / Zanzibar-style policy | Partial | Strong for deterministic policy checks, but only if the exact approved artifact is enforced at execution time. |

## Overall Assessment
**Strengths:** Strong coverage of least privilege, HITL gates, blast radius, auditability, rollback, and multi-agent risk.

**Needs improvement:** Add approval artifact binding and MCP/plugin server supply-chain checks, and avoid requiring raw prompt/CoT logs where safer audit primitives exist.

**Priority recommendations:**
1. Require approvals to bind to canonical executable parameters, not summaries.
2. Treat MCP/plugin/tool servers as privileged dependencies with pinned versions, provenance, and scoped secrets.
3. Document privacy-preserving audit patterns as acceptable alternatives to raw prompt/CoT retention.

## Bounty Info
- [x] I have read and agree to the [CONTRIBUTING.md](../../CONTRIBUTING.md) bounty terms
- **Preferred payment method:** Crypto after maintainer acceptance


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REVIEW] agent-security: bind approvals to exact tool artifacts and MCP provenance #1595

Skill Being Reviewed

False Positive Analysis

Coverage Gaps

Edge Cases

Remediation Quality

Comparison to Other Tools

Overall Assessment

Bounty Info

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Tool	Catches this?	Notes
Semgrep	Partial	Can catch unpinned MCP/server packages and obvious approval bypass patterns with custom rules.
CodeQL	Partial	Can model TOCTOU data flow if queries are written for approval artifacts and tool execution sinks.
OPA / Cedar / Zanzibar-style policy	Partial	Strong for deterministic policy checks, but only if the exact approved artifact is enforced at execution time.

[REVIEW] agent-security: bind approvals to exact tool artifacts and MCP provenance #1595

Description

Skill Being Reviewed

False Positive Analysis

Coverage Gaps

Edge Cases

Remediation Quality

Comparison to Other Tools

Overall Assessment

Bounty Info

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions