Skip to content

DOC: Architecture Responsibilities#2089

Open
rlundeen2 wants to merge 6 commits into
microsoft:mainfrom
rlundeen2:rlundeen2-framework-architecture-doc
Open

DOC: Architecture Responsibilities#2089
rlundeen2 wants to merge 6 commits into
microsoft:mainfrom
rlundeen2:rlundeen2-framework-architecture-doc

Conversation

@rlundeen2

@rlundeen2 rlundeen2 commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Restructures the Architecture portion of doc/code/framework.md to clearly define each component's responsibilities

rlundeen2 and others added 6 commits June 26, 2026 16:31
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Restructure framework.md to clearly define each component's responsibilities using an Owns / Does NOT own template, fix structural inconsistencies, and correct typos.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Reorder and rename the landing-page cards to match the Core Components / Core library section order, add an Attack Techniques card, and drop the Attacks-and-Executors / Setup-and-Configuration labels in favor of the section names.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds a Framework Documentation card and links the section header to the notebooks contributing guide, treating it as a Core library item.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rters

Clarify that an attack may use defaults but should always accept its scorers, datasets/seeds (prepended_conversation and next_message), objective/adversarial targets, and converters as parameters so it can be packaged as an attack technique.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add a Core library Backend section (presentation-specific REST API; reuse pyrit.models and the registry). Add a Source path line to every component section so the doc can be pointed at for code reviews.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread doc/code/framework.md
::::

::::{card} ⚔️ Attacks & Executors
::::{card} ⚔️ Attacks

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how I feel about removing executors altogether from here. I suspect you removed them because they don't plug into scenarios as of right now. I don't see them becoming irrelevant, though. Quite the opposite.

Comment thread doc/code/framework.md
Run single-turn and multi-turn attacks — Crescendo, TAP, Skeleton Key, and more.
::::

::::{card} 🔌 Targets

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should reordering be reflected in the myst.yml as well?

Comment thread doc/code/framework.md
Connect to OpenAI, Azure, Anthropic, HuggingFace, HTTP endpoints, and custom targets.
::::{card} 🧩 Attack Techniques
:link: ./scenarios/0_attack_techniques
Package a configured attack — role-play, many-shot, crescendo, a jailbreak template — as a reusable, named recipe.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only tangentially related and feel free to resolve, but I finally managed to put in words what I dislike about attack techniques 😆 My example is single turn crescendo (STCA). It doesn't exist as "attack executor" but it does as attack technique. So you can't run it by itself in an intuitive straightforward way unless you use scenarios. Whenever I use GHCP for orchestrating my red teaming it just defaults to attack executors meaning I miss out of attack techniques (like STCA).

Is there some way we can tell GHCP to use attack techniques/scenarios as opposed to attack executors directly? I suspect a skill is the place to do that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can execute attack techniques without a scenario; good idea to document it and/or make it easier.

I suspect GHCP struggles with it because we have so many attack examples and so few going this route. But I want to make an effort to move more things to attack techniques because it's so much easier to bundle all the pieces

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YES

Comment thread doc/code/framework.md
:link: ./output/0_output
Render attack results, scenario results, conversations, and scores to terminal, files, or Jupyter.
::::{card} 📓 Framework Documentation
:link: ../contributing/7_notebooks

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

everything else links to 0_* but this one to 7_notebooks (?)

Comment thread doc/code/framework.md
@@ -1,3 +1,3 @@
# Framework

Learn how to use PyRIT's components to build red teaming workflows.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What this page needs is a diagram that shows how these are put together with an example showing what an instance of each of these would be. Note: not code, something visual!

Comment thread doc/code/framework.md
**Responsibility**: Provide a single place to define and manage the inputs to an attack — prompts, jailbreak templates, source images, attack strategies, and similar seeds.

- New datasets can be added in the dataset module.
- Dataset providers load seeds into memory; components then retrieve them from memory. Providers are not queried directly at attack time.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to talk about seed groups here? My hunch is YES. GHCP is terribly confused whenever I ask it to add a dataset because it can't figure out when something should be objective vs prompt

Comment thread doc/code/framework.md

**Contributing (difficulty: easy)**: Are there more prompts and jailbreak templates you can add for scenarios you're testing for? It is easy to add new dataset providers.

## [Attacks](./executor/0_executor)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Executor (?)

Comment thread doc/code/framework.md
- Other executors, like benchmarks, need better end-to-end support; potentially including an `ExpectedResult` seed and associated scorers.
- More flexible compound attacks should continue to be added.

**Contributing (difficulty: hard)**: The best way to contribute is likely opening issues if you run into limitations.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Contributing (difficulty: hard)**: The best way to contribute is likely opening issues if you run into limitations.
**Contributing (difficulty: hard)**: The best way to contribute is always opening issues if you run into limitations. Even if you can contribute a PR it requires input from maintainers.

Comment thread doc/code/framework.md

**Source**: `pyrit/executor/attack/`.

**Responsibility**: Own the *algorithm and control flow* of achieving a single objective — managing the conversation between objective and adversarial targets, and using datasets, converters, and scorers along the way.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've never explored this too much but there could be more than one adversarial target (think: multi-agent setup). I guess this can be tabled for later.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how my hackathon project works also! It uses a more powerful model to orchestrate techniques but it's safety aligned so used an adversarial model per attack.

Comment thread doc/code/framework.md

**Framework Plans**:

- We need to move some older attacks that don't belong here. Many (e.g. FlipAttack) should just be attack techniques.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be worth prioritizing as deprecation (?)

Comment thread doc/code/framework.md
**Framework Plans**:

- We need to move some older attacks that don't belong here. Many (e.g. FlipAttack) should just be attack techniques.
- There are potential ways we could combine different algorithms. Are Crescendo and TAP ultimately the same?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely, they are not the same 😆 But the underlying idea is perhaps the same, i.e., there could be branching, pruning, and backtracking based on specific criteria. With that,

  • TAP = branch every iteration with set factor, prune every iteration if > width branches
  • Crescendo = backtrack if refusal

But one could just as well imagine tree of crescendos which branches in every iteration, runs another crescendo step (including potentially backtracking), and then we prune the ones that didn't work well. Not sure if there's any promise in this but they feel like generic concepts.

Comment thread doc/code/framework.md

- We need to move some older attacks that don't belong here. Many (e.g. FlipAttack) should just be attack techniques.
- There are potential ways we could combine different algorithms. Are Crescendo and TAP ultimately the same?
- We need to support target capabilities more implicitly.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean?

Comment thread doc/code/framework.md

- Interpreting a raw target response — that is Scoring.
- The specific configuration of prompts, converters, and strategy used — that is an Attack Technique.
- Choosing which attacks or techniques to run, or running them at scale — that is a Scenario.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this limit us in some way? I have always thought it might be fun to have an LLM analyze all the things that have been tried so far. E.g., after 100 crescendos with the same objective you have a pretty decent overview of what worked/how far each approach got and what didn't go anywhere so there's no need to repeat the exact same thing. At the same time, perhaps it's worth taking the most promising 10 "checkpoints" (conv 3 after 5 turns, conv 17 after 3 turns, conv 36 after 8 turns, etc.) and aggressively continue from there (perhaps with TAP!) by prepending. With the isolation that exists right now, I don't know how this would be possible. Food for thought.

Comment thread doc/code/framework.md
**Responsibility**: A lightweight module where core types are defined — the **description** side of the framework. These types should be used wherever possible to prevent drift.

Attacks are responsible for putting all the other pieces together. They make use of all other components in PyRIT to execute an attack technique end-to-end.
PyRIT supports single-turn (e.g. Many Shot Jailbreaks [@anthropic2024manyshot], Role Play, Skeleton Key [@microsoft2024skeletonkey]) and multi-turn attack strategies (e.g. Tree of Attacks [@mehrotra2023tap], Crescendo [@russinovich2024crescendo]), and compound strategies (e.g. `SequentialAttack`) for chaining several techniques against a single objective.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be best not to lose the citations?

Comment thread doc/code/framework.md
**Does NOT own**:

## Target
- Live, in-run progress printing — that belongs to the scenario's own printer.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Comment thread doc/code/framework.md
**Responsibility**: The canonical store that components read from and write to — seeds, conversations, scores, and attack results. When a component needs more than what is passed in, it goes through memory.

One important thing to remember about this architecture is its swappable nature. Prompts and targets and converters and attacks and scorers should all be swappable. But sometimes one of these components needs additional information. If the target is an LLM, we need a way to look up previous messages sent to that session so we can properly construct the new message. If the target is a blob store, we need to know the URL to use for a future attack.
One important thing to remember about this architecture is its swappable nature. Prompts, targets, converters, attacks, and scorers should all be swappable. But sometimes one of these components needs additional information — if the target is an LLM, we need a way to look up previous messages sent to that session so we can construct the new message; if the target is a blob store, we need the URL to use for a future attack. Memory is where that shared state lives.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
One important thing to remember about this architecture is its swappable nature. Prompts, targets, converters, attacks, and scorers should all be swappable. But sometimes one of these components needs additional information — if the target is an LLM, we need a way to look up previous messages sent to that session so we can construct the new message; if the target is a blob store, we need the URL to use for a future attack. Memory is where that shared state lives.
One important thing to remember about this architecture is its swappable nature. Seeds, targets, converters, attacks, and scorers should all be swappable. But sometimes one of these components needs additional information — if the target is an LLM, we need a way to look up previous messages sent to that session so we can construct the new message; if the target is a blob store, we need the URL to use for a future attack. Memory is where that shared state lives.

Comment thread doc/code/framework.md

For all their power, attacks should still be generic. A lot of our front-end code and operators use Notebooks to interact with PyRIT. This is fantastic, but most new logic should not be notebooks. Notebooks should mostly be used for attack setup and documentation. For example, configuring the components and putting them together is a good use of a notebook, but new logic for an attack should be moved to one or more components.
- Notebooks that contain code should be executable.
- Notebooks should execute quickly.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we missing a few modules? analytics at least, auth maybe?

@romanlutz romanlutz left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am 100% in favor of these changes. The comments may improve/add things here and there but even as is it's a huge improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants