Skip to content

feat: agentops assert run + agentops redteam run as active CI gates#283

Merged
placerda merged 2 commits into
developfrom
feature/assert-runner-integration
Jun 9, 2026
Merged

feat: agentops assert run + agentops redteam run as active CI gates#283
placerda merged 2 commits into
developfrom
feature/assert-runner-integration

Conversation

@placerda

@placerda placerda commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

What

Turns ASSERT (open-source assert-ai framework) and the Foundry/PyRIT AI Red Teaming agent into active, gated CI steps that AgentOps orchestrates end-to-end — instead of just consuming pre-generated artifacts via assert_path: / redteam_path:.

Also rewrites the README Overview to position AgentOps as an open-source orchestration framework + CLI rather than a feature bullet list.

New commands

  • agentops assert run — subprocess wrapper around the assert-ai CLI. Parses metrics.json + scores.jsonl, writes a normalized summary at .agentops/assert/latest.json, and exits 2 on any policy violation (unless --no-gate).
  • agentops redteam run — invokes azure.ai.evaluation.red_team.RedTeam against an Azure OpenAI deployment, Foundry agent, or HTTP endpoint. Aggregates per-category and per-strategy attack-success-rate into .agentops/redteam/latest.json and exits 2 when ASR exceeds the configured threshold (unless --no-gate).

Both runners produce artifacts that agentops doctor --evidence-pack ingests automatically.

Schema additions (agentops.yaml)

assert:
  cli: assert-ai
  config: assert/eval_config.yaml
  fail_on_violations: true

redteam:
  target:
    type: model_deployment
    deployment: gpt-4o-mini
  risk_categories: [violence, hate_unfairness, sexual, self_harm]
  attack_strategies: [base64, rot13]
  num_objectives: 5
  fail_on_attack_success_rate: 0.2

assert / redteam are YAML aliases for the Python fields assert_run / redteam_run (the keyword assert cannot be a Python identifier).

Docs

  • Tutorial: docs/tutorial-prompt-agent-quickstart.md step 10 rewritten as 10a (ASSERT) / 10b (Red Team) / 10c (evidence pack pickup), with install steps for assert-ai and azure-ai-evaluation[redteam].
  • README: new tagline, new Overview as a six-step orchestration narrative (Evaluate → Probe → Diagnose → Gate → Prove → Learn-from-production), updated Foundry boundary table with a new "Probe safety" row.

Tests

  • New file tests/unit/test_assert_and_redteam_runners.py — 24 tests covering schema aliases, run-output discovery, dimension summarization, totals aggregation, target-callback shapes, normalized JSON writing, gating, and CLI smoke (missing config / missing dependency / explain manuals).
  • Full suite: 921 passed, 1 skipped.

Exit-code contract preserved

  • 0 — all gates passed
  • 2 — threshold, ASSERT violation, or red-team ASR gate failed
  • 1 — runtime / configuration error

Scope notes

  • Active runners are additive. Existing assert_path: / redteam_path: evidence-only paths still work.
  • AgentOps does not reimplement ASSERT or PyRIT — it shells out to / imports the upstream tools and normalizes their output.
  • No changes to agentops doctor, cockpit, or eval runner internals beyond the new normalized JSON inputs.

Follow-ups (out of scope)

  • Cockpit cards for ASSERT / Red Team latest runs.
  • Patch release to PyPI (currently v0.3.13; user tutorial workspace will need v0.3.14 to consume).

placerda and others added 2 commits June 9, 2026 14:06
…ates

Turn ASSERT (open-source assert-ai framework) and the Foundry/PyRIT AI Red
Teaming agent from passive evidence-only references into active, gated CI
steps that AgentOps orchestrates end-to-end.

New commands

- agentops assert run invokes the assert-ai CLI as a subprocess, locates
  the run output, parses metrics.json and scores.jsonl, and writes a
  normalized summary at .agentops/assert/latest.json. Exits 2 on any
  policy violation unless --no-gate or assert.fail_on_violations: false.
- agentops redteam run invokes azure.ai.evaluation.red_team.RedTeam
  against an Azure OpenAI deployment, Foundry agent, or HTTP endpoint,
  then aggregates per-category and per-strategy attack-success-rate into
  .agentops/redteam/latest.json. Exits 2 when ASR exceeds
  redteam.fail_on_attack_success_rate unless --no-gate.

Schema

- Adds AssertRunConfig and RedTeamRunConfig Pydantic models.
- Adds assert_run / redteam_run fields on AgentOpsConfig with aliases
  assert / redteam so YAML stays natural while Python avoids the
  reserved keyword. Enables populate_by_name on the root model.

Services

- src/agentops/services/assert_runner.py: subprocess wrapper, run-output
  locator with suite/run/most-recent fallback, dimension summarizer,
  normalized JSON writer.
- src/agentops/services/redteam_runner.py: lazy import of the Foundry
  Red Team SDK, target callback builder for deployment/agent/endpoint
  shapes, per-category and per-strategy aggregation, normalized JSON
  writer.

CLI

- New assert_app and redteam_app Typer groups with run and explain
  subcommands.
- Long-form manuals added to EXPLAIN_PAGES for both groups and surfaced
  via agentops explain.
- Fixes a stale loaded.config access in the new command handlers.

Tutorial

- docs/tutorial-prompt-agent-quickstart.md replaces the passive
  assert_path evidence section with active 10a/10b/10c subsections that
  install assert-ai and azure-ai-evaluation[redteam], scaffold
  assert/eval_config.yaml and the redteam block, and pull both runners
  into the evidence pack.
- Success criteria updated accordingly.

README

- Repositions the accelerator as an open-source framework + CLI that
  orchestrates continuous evaluation, safety testing, and release
  readiness (rather than reinventing them).
- Tagline, six-step release loop, core-outputs table, and exit-code
  contract reworked. Foundry boundary table now lists ASSERT and the
  AI Red Teaming agent under "Probe safety" with active commands.

Tests

- tests/unit/test_assert_and_redteam_runners.py covers schema aliases,
  run-output discovery, dimension summarization, totals aggregation,
  target callback resolution, normalized JSON writing, gating, and CLI
  smoke (missing config block, missing dependency, explain manuals).
- Full suite: 921 passed, 1 skipped.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- assert_runner._aggregate_totals: narrow Optional dict from metrics.get
  totals before subscripting, by binding the result to a typed local.
- redteam_runner.run_redteam: validate azure_ai_project is not None
  before passing it to the RedTeam SDK (raises RedTeamRunnerError with
  a clear hint when project metadata is missing).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@placerda placerda merged commit f3647d3 into develop Jun 9, 2026
12 checks passed
@placerda placerda deleted the feature/assert-runner-integration branch June 9, 2026 17:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant