Azure · placerda · Jun 9, 2026 · Jun 9, 2026 · Jun 9, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -5,6 +5,26 @@ This format follows [Keep a Changelog](https://keepachangelog.com/) and adheres
 
 ## [Unreleased]
 
+### Added
+- **`agentops assert run` orchestrates the open-source ASSERT framework.**
+  AgentOps now invokes the `assert-ai` CLI as an active CI step instead of only
+  consuming pre-generated artifacts via `assert_path:`. A new `assert:` block in
+  `agentops.yaml` (`config`, `results_dir`, `suite`, `run_id`,
+  `fail_on_violations`) drives subprocess invocation, locates the run output
+  under `<results_dir>/<suite>/<run>/`, parses `metrics.json` and
+  `scores.jsonl`, and writes a normalized summary at `.agentops/assert/latest.json`
+  that the release evidence pack ingests automatically. Exit code 2 when any
+  policy dimension reports violations.
+- **`agentops redteam run` orchestrates Foundry's AI Red Teaming agent (PyRIT).**
+  AgentOps now invokes `azure.ai.evaluation.red_team.RedTeam` against the
+  configured target (Azure OpenAI deployment, Foundry prompt agent, or HTTP
+  endpoint) and normalizes the per-category and per-strategy attack outcomes.
+  A new `redteam:` block in `agentops.yaml` (`target`, `risk_categories`,
+  `attack_strategies`, `num_objectives`, `fail_on_attack_success_rate`)
+  controls the scan; results land at `.agentops/redteam/latest.json` so the
+  evidence pack picks them up via `redteam_path:` automatically. Exit code 2
+  when attack-success-rate exceeds the configured threshold.
+
 ## [0.3.13] - 2026-06-09
 
 ### Fixed

diff --git a/README.md b/README.md
@@ -1,7 +1,9 @@
 <h1 align="center">AgentOps Accelerator</h1>
 
 <p align="center">
-Answer the release question for Microsoft Foundry agents: can we ship it, and where is the proof?
+<b>Open-source framework and CLI for continuous evaluation, safety testing, and release readiness of Microsoft Foundry agents.</b>
+<br/>
+Can we ship it, and where is the proof?
 </p>
 
 <p align="center">
@@ -19,25 +21,52 @@ Answer the release question for Microsoft Foundry agents: can we ship it, and wh
 
 ## Overview
 
-AgentOps Accelerator helps teams turn Foundry agent work into a clear release
-decision. Foundry is the agent control plane; AgentOps turns Foundry signals and
-repo checks into repeatable gates, Doctor readiness, release evidence, and
-trace-driven regression loops.
-
-The project enables:
-
-- Local and CI execution for release gates
-- Foundry prompt agent, Foundry hosted endpoint, HTTP/JSON agent, and raw model targets
-- Auto-selected evaluators for RAG, tools, and model quality
-- Stable `results.json` for automation
-- PR-friendly `report.md`
-- Baseline comparison for regression detection
-- Doctor checks for repo, CI/CD, telemetry, landing zones, and Foundry setup
-- Release evidence packs for promotion review
-- Optional `azd ai agent eval` execution with Rubric/custom metric binding
-- ASSERT, ACS, and red-team governance evidence references
-- Trace promotion into regression datasets
-- Cockpit navigation for AgentOps, Foundry, and Azure Monitor
+**AgentOps Accelerator is an open-source framework and CLI that standardizes
+continuous evaluation, safety testing, and release readiness for enterprise AI
+agents — with Microsoft Foundry as the agent runtime.**
+
+It is an *orchestrator*, not a reimplementation. AgentOps wires together the
+tools you already use — Foundry Evaluations, `azd ai agent eval`, the
+open-source ASSERT framework, the PyRIT-backed AI Red Teaming agent, Azure
+Monitor / Application Insights, and your CI/CD platform — into a single
+repeatable release loop:
+
+1. **Evaluate** the agent against datasets, rubrics, and policies — locally or
+   in the cloud — using auto-selected evaluators for RAG, tool use, model
+   quality, and safety.
+2. **Probe** the agent with adversarial inputs by orchestrating ASSERT
+   (`agentops assert run`) and the Foundry/PyRIT Red Teaming agent
+   (`agentops redteam run`) as active CI steps.
+3. **Diagnose** repo, telemetry, landing zone, and Foundry readiness with
+   `agentops doctor`.
+4. **Gate** the release with a deterministic exit-code contract that PRs and
+   pipelines can rely on.
+5. **Prove** the release with a stable evidence pack (`evidence.json` +
+   `evidence.md`) that bundles eval results, ASSERT verdicts, red-team
+   findings, telemetry readiness, and Doctor findings for promotion review.
+6. **Learn from production** by promoting reviewed traces into regression
+   datasets that feed the next eval cycle.
+
+The output is a clear answer to two questions reviewers actually ask:
+**can we ship it, and where is the proof?**
+
+### Core outputs
+
+| Artifact | Produced by | Audience |
+|---|---|---|
+| `results.json` | `agentops eval run` | CI / automation |
+| `report.md` | `agentops eval run` | PR reviewers |
+| `.agentops/assert/latest.json` | `agentops assert run` | Evidence pack, CI gate |
+| `.agentops/redteam/latest.json` | `agentops redteam run` | Evidence pack, CI gate |
+| `evidence.json` / `evidence.md` | `agentops doctor --evidence-pack` | Release approver |
+| Cockpit (localhost) | `agentops cockpit` | Engineer reviewing readiness |
+
+### Exit-code contract
+
+- `0` — execution succeeded and all gates passed
+- `2` — execution succeeded but a threshold, ASSERT violation, red-team rate,
+  or Doctor severity gate failed
+- `1` — runtime or configuration error
 
 ## AgentOps and Microsoft Foundry
 
@@ -50,26 +79,15 @@ ship/no-ship workflow.
 |---|---|---|
 | Build and version | Foundry portal, Foundry SDK/Toolkit, `microsoft-foundry` skill, azd | Pins the exact candidate in `agentops.yaml` and generates the PR/release gate around it |
 | Evaluate and compare | Foundry Evaluations, `azd ai agent eval`, Rubric evaluator, and official CI actions/extensions | Keeps datasets and thresholds in the repo, records evidence, normalizes azd/Rubric outputs, and provides local/fallback runs for non-prompt targets |
+| Probe safety | ASSERT framework, PyRIT-backed AI Red Teaming agent | Runs both as active CI steps via `agentops assert run` and `agentops redteam run`, normalizes verdicts, and gates the pipeline |
 | Observe and investigate | Foundry Monitor, Traces, Azure Monitor, App Insights | Surfaces deep links, telemetry readiness, Doctor findings, and Cockpit navigation |
 | Decide release | Branch protection, environments, approvals | Packages `evidence.json` / `evidence.md` for promotion review |
-| Govern controls | ASSERT, ACS, Foundry Guardrails, Foundry red-team scans | References reviewed artifacts by path/hash/status without executing or applying the external controls |
+| Govern controls | ACS, Foundry Guardrails | References reviewed artifacts by path/hash/status without executing or applying the external controls |
 | Improve from production | Production traces and Foundry datasets | Promotes reviewed trace learnings into regression candidates |
 
 The rhythm is simple: build and operate the agent in Foundry, keep the release
 contract in the repo, and let AgentOps connect the two into a clean review loop.
 
-Core outputs:
-
-- `results.json` (machine-readable)
-- `report.md` (human-readable)
-- `evidence.json` / `evidence.md` (from `agentops doctor --evidence-pack`)
-
-Exit code contract:
-
-- `0` execution succeeded and all thresholds passed
-- `2` execution succeeded but one or more thresholds failed
-- `1` runtime or configuration error
-
 ## Quickstart
 
 ### 1) Install

diff --git a/docs/tutorial-prompt-agent-quickstart.md b/docs/tutorial-prompt-agent-quickstart.md
@@ -1019,47 +1019,127 @@ bind to an emitted metric. Open `.agentops/results/latest/results.json` to see
 which rubric metric names actually appeared in the azd output; that is the
 authoritative list of values you can put under `thresholds:`.
 
-### Add ASSERT evidence to the release proof
+### Add ASSERT and Red Team to the release gate
 
 The normal AgentOps flow proves the release with evaluation results, Doctor
-findings, workflow runs, and release evidence. ASSERT fits into that same release
-proof as a governed artifact: run ASSERT in the tool or process your team uses
-for policy checks, keep the reviewed policy or result summary in the repo or CI
-artifact store, and point AgentOps at it.
+findings, workflow runs, and release evidence. Two release-readiness signals
+deserve to run inside the same loop:
 
-AgentOps does not execute ASSERT. It records the artifact path, status, and
-SHA-256 hash so Doctor and the evidence pack can show reviewers which ASSERT
-evidence was used for the release. Store only approved metadata in the repo; keep
-raw adversarial prompts, secrets, customer data, and detailed scan payloads in
-the approved secure system.
+- **ASSERT** (open-source `assert-ai`) — turns natural-language policies into
+  executable behavior tests (prompt injection, jailbreak, hallucination, PII
+  leak, unauthorized tool use). Repo: <https://github.com/responsibleai/ASSERT>.
+- **AI Red Teaming** (Foundry agent, PyRIT-backed) — generates adversarial
+  prompts across risk categories (violence, hate, self-harm, sexual) and applies
+  attack strategies (base64, rot13, morse) to surface safety regressions before
+  users do. Docs:
+  <https://learn.microsoft.com/azure/ai-foundry/concepts/ai-red-teaming-agent>.
+
+AgentOps does NOT reimplement either. It orchestrates them as active CI steps,
+gates the pipeline on their results, and writes normalized JSON summaries that
+the evidence pack ingests automatically.
+
+#### 10a. Run ASSERT against the Travel Agent
+
+Install ASSERT and scaffold a minimal eval config:
 
 ```powershell
-New-Item -ItemType Directory -Force .agentops\governance | Out-Null
+pip install assert-ai
+
+New-Item -ItemType Directory -Force .\assert | Out-Null
 @'
-# ASSERT evidence
-
-Status: reviewed
-Source: <link-to-approved-assert-run-or-policy>
-Scope: Travel Agent release readiness
-Notes: ASSERT execution remains in the owning ASSERT workflow; AgentOps records
-this artifact as release evidence only.
-'@ | Set-Content -Encoding utf8 .agentops\governance\assert-evidence.md
+suite_id: travel-agent-v1
+run_id: ci-tutorial
+target:
+  type: azure_openai
+  deployment: gpt-4o-mini
+dimensions:
+  - prompt_injection
+  - pii_leak
+  - jailbreak
+num_cases_per_dimension: 5
+'@ | Set-Content -Encoding utf8 .\assert\eval_config.yaml
 ```
 
-Then reference it from `agentops.yaml`:
+Add the `assert:` block to `agentops.yaml`:
 
 ```yaml
-assert_path: .agentops/governance/assert-evidence.md
+assert:
+  config: ./assert/eval_config.yaml
+  fail_on_violations: true
+```
+
+Run it through AgentOps:
+
+```powershell
+agentops assert run
 ```
 
-When you later run:
+What AgentOps does for you:
+
+1. Verifies `assert-ai` is installed.
+2. Invokes `assert-ai run --config ./assert/eval_config.yaml`.
+3. Locates the run output under `artifacts/results/<suite>/<run>/`.
+4. Parses `metrics.json` and `scores.jsonl` for per-dimension verdicts.
+5. Writes a normalized summary at `.agentops/assert/latest.json`.
+6. Exits non-zero (code 2) when ASSERT reports any policy violation, unless
+   you pass `--no-gate` or set `assert.fail_on_violations: false`.
+
+#### 10b. Run the AI Red Teaming agent
+
+Install Foundry's Red Team SDK (it ships under an extra of `azure-ai-evaluation`):
+
+```powershell
+pip install "azure-ai-evaluation[redteam]"
+```
+
+Add the `redteam:` block to `agentops.yaml`:
+
+```yaml
+redteam:
+  target:
+    model_deployment: gpt-4o-mini
+  risk_categories: [violence, hate_unfairness, self_harm, sexual]
+  attack_strategies: [base64, rot13, morse]
+  num_objectives: 5
+  fail_on_attack_success_rate: 0.2  # fail if >20% of attacks succeed
+```
+
+Run it:
+
+```powershell
+agentops redteam run
+```
+
+What AgentOps does for you:
+
+1. Verifies the `RedTeam` Python API is importable.
+2. Resolves the target (deployment / agent / endpoint) from the YAML.
+3. Calls `RedTeam.scan(...)` with the configured risk categories, strategies,
+   and objective count.
+4. Aggregates per-category and per-strategy attack-success-rate.
+5. Writes a normalized summary at `.agentops/redteam/latest.json` plus the
+   raw SDK payload at `.agentops/redteam/raw_summary.json`.
+6. Exits non-zero (code 2) when overall attack-success-rate exceeds
+   `fail_on_attack_success_rate`, unless you pass `--no-gate`.
+
+> **Heads-up.** Both commands hit live Azure services. Run them against a
+> non-production deployment and budget for the cost of the configured
+> objective count.
+
+#### 10c. Pull both into the release evidence pack
+
+Both runners write to well-known paths the evidence pack already auto-discovers
+(via `assert_path` and `redteam_path` resolution). When you produce the
+evidence pack:
 
 ```powershell
 agentops doctor --workspace . --evidence-pack
 ```
 
-the release evidence includes the ASSERT path, status, and SHA-256 hash without
-claiming that AgentOps executed ASSERT.
+`evidence.json` and `evidence.md` now include the suite/run id, total cases,
+violation counts, attack-success-rate, and SHA-256 hashes for both artifacts —
+without claiming AgentOps invented the verdicts. The verdicts come from ASSERT
+and PyRIT; AgentOps owns orchestration, normalization, and gating.
 
 ## 11. Generate the PR + dev deploy workflows
 
@@ -1646,10 +1726,12 @@ You are done when:
 - `agentops doctor --evidence-pack` writes
   `.agentops/release/latest/evidence.md`, and the GitHub run summary
   shows its Doctor finding summary.
-- Optional governance artifacts are either absent (no Doctor noise) or wired as
-  evidence-only paths in `agentops.yaml` (`assert_path`, `acs_path`,
-  `redteam_path`) so the evidence pack can cite their hash/status without
-  claiming AgentOps executed ASSERT, applied ACS, or ran red-team scans.
+- Optional safety runners are either skipped (no Doctor noise) or wired in:
+  `assert:` to run `agentops assert run`, and `redteam:` to run
+  `agentops redteam run`. Both write normalized JSON under `.agentops/` that
+  the evidence pack ingests automatically. Pre-existing `assert_path`,
+  `acs_path`, `redteam_path` references for evidence-only hash/status are
+  still honored.
 - Cockpit opens and links the repo-side readiness view back to Foundry
   for both sandbox and dev.