Draft: AdversarialConversationManager by rlundeen2 · Pull Request #2053 · microsoft/PyRIT

rlundeen2 · 2026-06-19T02:41:44Z

AdversarialConversationManager is a way to simplify managing adversarial conversations, letting us centrally give various kinds of context to the adversarial chat.

Today each multi-turn attack (Red Teaming, Crescendo, TAP, PAIR) hand-rolls the same mechanics: holding the adversarial system prompt, building the per-turn message, sending it on a stable conversation id, and parsing the reply. This manager owns one adversarial conversation and centralizes all of that in one place.

Because it's one place, it's also where we decide what context the adversarial chat gets each turn — the objective, the latest score, and the objective target's last response (bucketed by data type, e.g. {{ message.text.converted_value }}) — all via a single adversarial_prompt_template. Adding a new signal (a new score field, multimodal pieces, etc.) becomes a template change rather than per-attack code.

It also unifies the shared adversarial_chat JSON schema and is the single home for JSON retry, so schema-aware targets natively constrain the reply and every attack gets consistent parsing/retry behavior for free.

Add a shared `adversarial_chat` JSON schema (next_message, rationale, last_response_summary) and wire the Crescendo, TAP, and PAIR adversarial-chat prompts onto it so their prompts are interchangeable. Parsers now read/return next_message and forward the schema to schema-aware targets via prompt metadata. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The message normalizer already appends the response JSON schema when it is forwarded via prompt metadata, so the hand-written schema block in the Crescendo system prompts duplicated that instruction. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Routes Crescendo and TAP adversarial sends through a real PromptNormalizer and a MockPromptTarget (which lacks native JSON_SCHEMA support) to verify the shared adversarial_chat schema is forwarded via prompt metadata and rendered into the prompt the adversarial chat receives. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

into rlundeen2/unify-red-team-attack-schema

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…eam-attack-schema # Conflicts: # pyrit/datasets/executors/crescendo/crescendo_variant_1.yaml # pyrit/datasets/executors/crescendo/escalation_crisis.yaml # pyrit/datasets/executors/crescendo/therapist.yaml # pyrit/datasets/executors/pair/attacker_system_prompt.yaml # pyrit/datasets/executors/tree_of_attacks/adversarial_system_prompt.yaml # pyrit/datasets/executors/tree_of_attacks/image_generation.yaml

romanlutz

I love this! It solves lots of problems I've noticed but not really had a good handle on (yet!). That said, there are lots of conflicts with #1377. It's compatible if we adjust this or #1377. Some notes below.

API migration: #2053 renames seed_prompt → first_message and adds adversarial_prompt_template. This would probably need deprecation etc.
Responsibility boundary: #1377’s ModalityFeedbackRouter handles true multimodal forwarding (message pieces). #2053’s manager centralizes adversarial turn/schema logic but is text-message-centric in RedTeamingAttack. We'd need to generalize it to accept/build full Message objects (not just rendered text) so #1377 media forwarding remains intact.
First-turn placeholder flow: #1377 supports adversarial placeholders + seed media for edit-only starts; #2053’s red-teaming path bypasses that behavior unless explicitly reintroduced.
Rollout scope: apply manager only to red teaming first, or also refactor Crescendo/TAP to share the same adversarial-turn primitive.

Another thought that came to mind: Should there be an objective conversation manager then? It's worth considering, but as a follow-up if at all. There’s already ConversationManager; a new objective manager should only centralize objective-target send mechanics (rotation for single-turn targets, converter plumbing, execution context), not absorb attack-specific control flow (backtracking, tree branching, pruning).

So all in all: This PR is compatible with #1377 if integrated deliberately; a mechanical merge would likely regress #1377 multimodal behavior in red teaming. @rlundeen2

rlundeen2 and others added 6 commits June 17, 2026 17:04

Merge commit 'refs/pull/2039/head' of https://github.com/microsoft/PyRIT

37963ed

into rlundeen2/unify-red-team-attack-schema

WIP: AdversarialConversationManager redesign + config

caef853

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

romanlutz reviewed Jun 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Draft: AdversarialConversationManager#2053

Draft: AdversarialConversationManager#2053
rlundeen2 wants to merge 6 commits into
microsoft:mainfrom
rlundeen2:rlundeen2/unify-red-team-attack-schema

rlundeen2 commented Jun 19, 2026

Uh oh!

romanlutz left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

rlundeen2 commented Jun 19, 2026

Uh oh!

romanlutz left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants