DOC: Architecture Responsibilities#2089
Conversation
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Restructure framework.md to clearly define each component's responsibilities using an Owns / Does NOT own template, fix structural inconsistencies, and correct typos. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Reorder and rename the landing-page cards to match the Core Components / Core library section order, add an Attack Techniques card, and drop the Attacks-and-Executors / Setup-and-Configuration labels in favor of the section names. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds a Framework Documentation card and links the section header to the notebooks contributing guide, treating it as a Core library item. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rters Clarify that an attack may use defaults but should always accept its scorers, datasets/seeds (prepended_conversation and next_message), objective/adversarial targets, and converters as parameters so it can be packaged as an attack technique. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add a Core library Backend section (presentation-specific REST API; reuse pyrit.models and the registry). Add a Source path line to every component section so the doc can be pointed at for code reviews. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| :::: | ||
|
|
||
| ::::{card} ⚔️ Attacks & Executors | ||
| ::::{card} ⚔️ Attacks |
There was a problem hiding this comment.
I'm not sure how I feel about removing executors altogether from here. I suspect you removed them because they don't plug into scenarios as of right now. I don't see them becoming irrelevant, though. Quite the opposite.
| Run single-turn and multi-turn attacks — Crescendo, TAP, Skeleton Key, and more. | ||
| :::: | ||
|
|
||
| ::::{card} 🔌 Targets |
There was a problem hiding this comment.
Should reordering be reflected in the myst.yml as well?
| Connect to OpenAI, Azure, Anthropic, HuggingFace, HTTP endpoints, and custom targets. | ||
| ::::{card} 🧩 Attack Techniques | ||
| :link: ./scenarios/0_attack_techniques | ||
| Package a configured attack — role-play, many-shot, crescendo, a jailbreak template — as a reusable, named recipe. |
There was a problem hiding this comment.
Only tangentially related and feel free to resolve, but I finally managed to put in words what I dislike about attack techniques 😆 My example is single turn crescendo (STCA). It doesn't exist as "attack executor" but it does as attack technique. So you can't run it by itself in an intuitive straightforward way unless you use scenarios. Whenever I use GHCP for orchestrating my red teaming it just defaults to attack executors meaning I miss out of attack techniques (like STCA).
Is there some way we can tell GHCP to use attack techniques/scenarios as opposed to attack executors directly? I suspect a skill is the place to do that.
There was a problem hiding this comment.
We can execute attack techniques without a scenario; good idea to document it and/or make it easier.
I suspect GHCP struggles with it because we have so many attack examples and so few going this route. But I want to make an effort to move more things to attack techniques because it's so much easier to bundle all the pieces
| :link: ./output/0_output | ||
| Render attack results, scenario results, conversations, and scores to terminal, files, or Jupyter. | ||
| ::::{card} 📓 Framework Documentation | ||
| :link: ../contributing/7_notebooks |
There was a problem hiding this comment.
everything else links to 0_* but this one to 7_notebooks (?)
| @@ -1,3 +1,3 @@ | |||
| # Framework | |||
|
|
|||
| Learn how to use PyRIT's components to build red teaming workflows. | |||
There was a problem hiding this comment.
What this page needs is a diagram that shows how these are put together with an example showing what an instance of each of these would be. Note: not code, something visual!
| **Responsibility**: Provide a single place to define and manage the inputs to an attack — prompts, jailbreak templates, source images, attack strategies, and similar seeds. | ||
|
|
||
| - New datasets can be added in the dataset module. | ||
| - Dataset providers load seeds into memory; components then retrieve them from memory. Providers are not queried directly at attack time. |
There was a problem hiding this comment.
do we want to talk about seed groups here? My hunch is YES. GHCP is terribly confused whenever I ask it to add a dataset because it can't figure out when something should be objective vs prompt
|
|
||
| **Contributing (difficulty: easy)**: Are there more prompts and jailbreak templates you can add for scenarios you're testing for? It is easy to add new dataset providers. | ||
|
|
||
| ## [Attacks](./executor/0_executor) |
| - Other executors, like benchmarks, need better end-to-end support; potentially including an `ExpectedResult` seed and associated scorers. | ||
| - More flexible compound attacks should continue to be added. | ||
|
|
||
| **Contributing (difficulty: hard)**: The best way to contribute is likely opening issues if you run into limitations. |
There was a problem hiding this comment.
| **Contributing (difficulty: hard)**: The best way to contribute is likely opening issues if you run into limitations. | |
| **Contributing (difficulty: hard)**: The best way to contribute is always opening issues if you run into limitations. Even if you can contribute a PR it requires input from maintainers. |
|
|
||
| **Source**: `pyrit/executor/attack/`. | ||
|
|
||
| **Responsibility**: Own the *algorithm and control flow* of achieving a single objective — managing the conversation between objective and adversarial targets, and using datasets, converters, and scorers along the way. |
There was a problem hiding this comment.
We've never explored this too much but there could be more than one adversarial target (think: multi-agent setup). I guess this can be tabled for later.
There was a problem hiding this comment.
This is how my hackathon project works also! It uses a more powerful model to orchestrate techniques but it's safety aligned so used an adversarial model per attack.
|
|
||
| **Framework Plans**: | ||
|
|
||
| - We need to move some older attacks that don't belong here. Many (e.g. FlipAttack) should just be attack techniques. |
There was a problem hiding this comment.
This would be worth prioritizing as deprecation (?)
| **Framework Plans**: | ||
|
|
||
| - We need to move some older attacks that don't belong here. Many (e.g. FlipAttack) should just be attack techniques. | ||
| - There are potential ways we could combine different algorithms. Are Crescendo and TAP ultimately the same? |
There was a problem hiding this comment.
Surely, they are not the same 😆 But the underlying idea is perhaps the same, i.e., there could be branching, pruning, and backtracking based on specific criteria. With that,
- TAP = branch every iteration with set factor, prune every iteration if > width branches
- Crescendo = backtrack if refusal
But one could just as well imagine tree of crescendos which branches in every iteration, runs another crescendo step (including potentially backtracking), and then we prune the ones that didn't work well. Not sure if there's any promise in this but they feel like generic concepts.
|
|
||
| - We need to move some older attacks that don't belong here. Many (e.g. FlipAttack) should just be attack techniques. | ||
| - There are potential ways we could combine different algorithms. Are Crescendo and TAP ultimately the same? | ||
| - We need to support target capabilities more implicitly. |
|
|
||
| - Interpreting a raw target response — that is Scoring. | ||
| - The specific configuration of prompts, converters, and strategy used — that is an Attack Technique. | ||
| - Choosing which attacks or techniques to run, or running them at scale — that is a Scenario. |
There was a problem hiding this comment.
Does this limit us in some way? I have always thought it might be fun to have an LLM analyze all the things that have been tried so far. E.g., after 100 crescendos with the same objective you have a pretty decent overview of what worked/how far each approach got and what didn't go anywhere so there's no need to repeat the exact same thing. At the same time, perhaps it's worth taking the most promising 10 "checkpoints" (conv 3 after 5 turns, conv 17 after 3 turns, conv 36 after 8 turns, etc.) and aggressively continue from there (perhaps with TAP!) by prepending. With the isolation that exists right now, I don't know how this would be possible. Food for thought.
| **Responsibility**: A lightweight module where core types are defined — the **description** side of the framework. These types should be used wherever possible to prevent drift. | ||
|
|
||
| Attacks are responsible for putting all the other pieces together. They make use of all other components in PyRIT to execute an attack technique end-to-end. | ||
| PyRIT supports single-turn (e.g. Many Shot Jailbreaks [@anthropic2024manyshot], Role Play, Skeleton Key [@microsoft2024skeletonkey]) and multi-turn attack strategies (e.g. Tree of Attacks [@mehrotra2023tap], Crescendo [@russinovich2024crescendo]), and compound strategies (e.g. `SequentialAttack`) for chaining several techniques against a single objective. |
There was a problem hiding this comment.
May be best not to lose the citations?
| **Does NOT own**: | ||
|
|
||
| ## Target | ||
| - Live, in-run progress printing — that belongs to the scenario's own printer. |
| **Responsibility**: The canonical store that components read from and write to — seeds, conversations, scores, and attack results. When a component needs more than what is passed in, it goes through memory. | ||
|
|
||
| One important thing to remember about this architecture is its swappable nature. Prompts and targets and converters and attacks and scorers should all be swappable. But sometimes one of these components needs additional information. If the target is an LLM, we need a way to look up previous messages sent to that session so we can properly construct the new message. If the target is a blob store, we need to know the URL to use for a future attack. | ||
| One important thing to remember about this architecture is its swappable nature. Prompts, targets, converters, attacks, and scorers should all be swappable. But sometimes one of these components needs additional information — if the target is an LLM, we need a way to look up previous messages sent to that session so we can construct the new message; if the target is a blob store, we need the URL to use for a future attack. Memory is where that shared state lives. |
There was a problem hiding this comment.
| One important thing to remember about this architecture is its swappable nature. Prompts, targets, converters, attacks, and scorers should all be swappable. But sometimes one of these components needs additional information — if the target is an LLM, we need a way to look up previous messages sent to that session so we can construct the new message; if the target is a blob store, we need the URL to use for a future attack. Memory is where that shared state lives. | |
| One important thing to remember about this architecture is its swappable nature. Seeds, targets, converters, attacks, and scorers should all be swappable. But sometimes one of these components needs additional information — if the target is an LLM, we need a way to look up previous messages sent to that session so we can construct the new message; if the target is a blob store, we need the URL to use for a future attack. Memory is where that shared state lives. |
|
|
||
| For all their power, attacks should still be generic. A lot of our front-end code and operators use Notebooks to interact with PyRIT. This is fantastic, but most new logic should not be notebooks. Notebooks should mostly be used for attack setup and documentation. For example, configuring the components and putting them together is a good use of a notebook, but new logic for an attack should be moved to one or more components. | ||
| - Notebooks that contain code should be executable. | ||
| - Notebooks should execute quickly. |
There was a problem hiding this comment.
Are we missing a few modules? analytics at least, auth maybe?
romanlutz
left a comment
There was a problem hiding this comment.
I am 100% in favor of these changes. The comments may improve/add things here and there but even as is it's a huge improvement.
Restructures the Architecture portion of
doc/code/framework.mdto clearly define each component's responsibilities