feat: agentops assert run + agentops redteam run as active CI gates#283
Merged
Conversation
…ates Turn ASSERT (open-source assert-ai framework) and the Foundry/PyRIT AI Red Teaming agent from passive evidence-only references into active, gated CI steps that AgentOps orchestrates end-to-end. New commands - agentops assert run invokes the assert-ai CLI as a subprocess, locates the run output, parses metrics.json and scores.jsonl, and writes a normalized summary at .agentops/assert/latest.json. Exits 2 on any policy violation unless --no-gate or assert.fail_on_violations: false. - agentops redteam run invokes azure.ai.evaluation.red_team.RedTeam against an Azure OpenAI deployment, Foundry agent, or HTTP endpoint, then aggregates per-category and per-strategy attack-success-rate into .agentops/redteam/latest.json. Exits 2 when ASR exceeds redteam.fail_on_attack_success_rate unless --no-gate. Schema - Adds AssertRunConfig and RedTeamRunConfig Pydantic models. - Adds assert_run / redteam_run fields on AgentOpsConfig with aliases assert / redteam so YAML stays natural while Python avoids the reserved keyword. Enables populate_by_name on the root model. Services - src/agentops/services/assert_runner.py: subprocess wrapper, run-output locator with suite/run/most-recent fallback, dimension summarizer, normalized JSON writer. - src/agentops/services/redteam_runner.py: lazy import of the Foundry Red Team SDK, target callback builder for deployment/agent/endpoint shapes, per-category and per-strategy aggregation, normalized JSON writer. CLI - New assert_app and redteam_app Typer groups with run and explain subcommands. - Long-form manuals added to EXPLAIN_PAGES for both groups and surfaced via agentops explain. - Fixes a stale loaded.config access in the new command handlers. Tutorial - docs/tutorial-prompt-agent-quickstart.md replaces the passive assert_path evidence section with active 10a/10b/10c subsections that install assert-ai and azure-ai-evaluation[redteam], scaffold assert/eval_config.yaml and the redteam block, and pull both runners into the evidence pack. - Success criteria updated accordingly. README - Repositions the accelerator as an open-source framework + CLI that orchestrates continuous evaluation, safety testing, and release readiness (rather than reinventing them). - Tagline, six-step release loop, core-outputs table, and exit-code contract reworked. Foundry boundary table now lists ASSERT and the AI Red Teaming agent under "Probe safety" with active commands. Tests - tests/unit/test_assert_and_redteam_runners.py covers schema aliases, run-output discovery, dimension summarization, totals aggregation, target callback resolution, normalized JSON writing, gating, and CLI smoke (missing config block, missing dependency, explain manuals). - Full suite: 921 passed, 1 skipped. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- assert_runner._aggregate_totals: narrow Optional dict from metrics.get totals before subscripting, by binding the result to a typed local. - redteam_runner.run_redteam: validate azure_ai_project is not None before passing it to the RedTeam SDK (raises RedTeamRunnerError with a clear hint when project metadata is missing). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Turns ASSERT (open-source
assert-aiframework) and the Foundry/PyRIT AI Red Teaming agent into active, gated CI steps that AgentOps orchestrates end-to-end — instead of just consuming pre-generated artifacts viaassert_path:/redteam_path:.Also rewrites the README Overview to position AgentOps as an open-source orchestration framework + CLI rather than a feature bullet list.
New commands
agentops assert run— subprocess wrapper around theassert-aiCLI. Parsesmetrics.json+scores.jsonl, writes a normalized summary at.agentops/assert/latest.json, and exits2on any policy violation (unless--no-gate).agentops redteam run— invokesazure.ai.evaluation.red_team.RedTeamagainst an Azure OpenAI deployment, Foundry agent, or HTTP endpoint. Aggregates per-category and per-strategy attack-success-rate into.agentops/redteam/latest.jsonand exits2when ASR exceeds the configured threshold (unless--no-gate).Both runners produce artifacts that
agentops doctor --evidence-packingests automatically.Schema additions (
agentops.yaml)assert/redteamare YAML aliases for the Python fieldsassert_run/redteam_run(the keywordassertcannot be a Python identifier).Docs
docs/tutorial-prompt-agent-quickstart.mdstep 10 rewritten as 10a (ASSERT) / 10b (Red Team) / 10c (evidence pack pickup), with install steps forassert-aiandazure-ai-evaluation[redteam].Tests
tests/unit/test_assert_and_redteam_runners.py— 24 tests covering schema aliases, run-output discovery, dimension summarization, totals aggregation, target-callback shapes, normalized JSON writing, gating, and CLI smoke (missing config / missing dependency / explain manuals).Exit-code contract preserved
0— all gates passed2— threshold, ASSERT violation, or red-team ASR gate failed1— runtime / configuration errorScope notes
assert_path:/redteam_path:evidence-only paths still work.agentops doctor, cockpit, or eval runner internals beyond the new normalized JSON inputs.Follow-ups (out of scope)
v0.3.13; user tutorial workspace will needv0.3.14to consume).