Skip to content

Add a code_attack technique variant that applies the code-completion system prompt #2088

Description

@u7k4rs6

Follow-up from #1960, which added CodeAttackConverter and registered code_attack as a converter-only technique in scenario_techniques.py. The converter wraps the objective in a code-completion template, the core mechanism from Ren et al. (CodeAttack, Findings of ACL 2024, arXiv:2403.07865).

Gap: the paper also frames the session with a code-completion system prompt. The converter-only technique drops that framing. Per discussion on #1960 this was non-blocking and deferred (rlundeen2: "we can add techniques that also do the system prompt/framing").

Proposal: a variant technique that also applies the framing system prompt, ideally via a generic way to attach an objective-target system prompt to a technique rather than a bespoke attack class. On targets without system-prompt support it would lean on GenericSystemSquashNormalizer and the capability-handling ADAPT path to fold the framing into the user turn.

The framing seed (pyrit/datasets/executors/code_attack.yaml, removed in #1960) can be restored from git history.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions