Skip to content

[REVIEW] agent-security: bind approvals to exact tool artifacts and MCP provenance #1595

@sato820

Description

@sato820

Skill Being Reviewed

Skill name: agent-security
Skill path: skills/ai-security/agent-security/

False Positive Analysis

Benign code that triggers a false positive:

agent_audit:
  store_raw_prompts: false
  store_chain_of_thought: false
  store_prompt_hash: true
  store_policy_trace: true
  redact_tool_parameters:
    - access_token
    - customer_email
    - secret_value
  immutable_sink: cloud-logging-worm
  correlation_id_required: true

Why this is a false positive:
The skill asks whether the agent's reasoning and prompt/context are logged. That can over-flag privacy-preserving designs that intentionally avoid raw prompt or chain-of-thought retention. For regulated systems, storing hashes, policy decisions, redacted tool parameters, tool results, user/agent IDs, and immutable correlation IDs can be stronger and safer than storing sensitive prompts verbatim.

Coverage Gaps

Missed variant 1:

const proposed = await agent.planToolCall(userRequest);
const summary = `${proposed.tool}: ${proposed.naturalLanguageSummary}`;

const approved = await approvals.request({ summary });
if (approved) {
  const finalArgs = await agent.rewriteArgumentsAfterApproval();
  await tools[proposed.tool].run(finalArgs);
}

Why it should be caught:
The skill reviews HITL gate placement and context quality, but does not require the approval decision to be cryptographically or canonically bound to the exact tool name, arguments, resource IDs, risk tier, and nonce that will execute. This creates a TOCTOU approval bypass: a human approves a harmless summary, then the agent changes parameters before execution.

Missed variant 2:

{
  "mcpServers": {
    "crm": {
      "command": "npx",
      "args": ["@vendor/crm-mcp-server@latest"],
      "env": { "CRM_TOKEN": "${CRM_TOKEN}" }
    }
  }
}

Why it should be caught:
The skill covers tool breadth and dynamic tool sets, but should explicitly check tool-provider provenance for MCP/plugin servers: pinned package versions, signed manifests, allowed server identities, per-server secret scopes, and review of tool schemas. An unpinned tool server with a broad token can become the agent's effective permission boundary.

Edge Cases

Approval summaries are often generated by the same model being constrained. If the approval UI displays only a model-written summary, the human may approve a different action than the one executed. The safer design is a deterministic approval artifact: canonical JSON of tool, parameters, resource identifiers, risk tier, expiry, nonce, and policy decision hash. Execution should fail unless the exact artifact was approved.

Another edge case is audit minimization. "No raw prompt logs" should not automatically be a finding when the design has immutable hashes, redacted parameters, policy traces, and replayable event IDs.

Remediation Quality

  • Fix resolves the vulnerability
  • Fix doesn't introduce new security issues
  • Fix doesn't break functionality
  • Issues found: Add explicit checks for approval binding/TOCTOU, deterministic approval artifacts, MCP/plugin server provenance, and privacy-preserving audit alternatives.

Suggested detection additions:

Grep: "approve|approval|human_in_the_loop|require_approval|risk_tier|nonce|canonical" in **/*.{ts,js,py}
Grep: "mcpServers|ModelContextProtocol|FunctionTool|register_tool|tool_schema" in **/*.{json,ts,js,py,yaml,yml}
Grep: "npx .*@latest|pipx|uvx|command.*mcp|env.*TOKEN" in **/*.{json,yaml,yml}

Comparison to Other Tools

Tool Catches this? Notes
Semgrep Partial Can catch unpinned MCP/server packages and obvious approval bypass patterns with custom rules.
CodeQL Partial Can model TOCTOU data flow if queries are written for approval artifacts and tool execution sinks.
OPA / Cedar / Zanzibar-style policy Partial Strong for deterministic policy checks, but only if the exact approved artifact is enforced at execution time.

Overall Assessment

Strengths: Strong coverage of least privilege, HITL gates, blast radius, auditability, rollback, and multi-agent risk.

Needs improvement: Add approval artifact binding and MCP/plugin server supply-chain checks, and avoid requiring raw prompt/CoT logs where safer audit primitives exist.

Priority recommendations:

  1. Require approvals to bind to canonical executable parameters, not summaries.
  2. Treat MCP/plugin/tool servers as privileged dependencies with pinned versions, provenance, and scoped secrets.
  3. Document privacy-preserving audit patterns as acceptable alternatives to raw prompt/CoT retention.

Bounty Info

  • I have read and agree to the CONTRIBUTING.md bounty terms
  • Preferred payment method: Crypto after maintainer acceptance

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions