[DRAFT][Feat] Beam Search for OpenAIResponseTarget#1346

Open

riedgar-ms wants to merge 49 commits intoAzure:mainfrom

riedgar-ms:riedgar-ms/beam-search-01

Contributor

riedgar-ms commented Feb 2, 2026

Description

Use the Lark grammar feature of the OpenAIResponseTarget to create a beam search for PyRIT. This is a single turn attack, where a collection of candidate responses (the beams) are maintained. On each iteration, the model's response is allowed to extend a little for each beam. The beams are scored, with the worst performing ones discarded, and replaced with copies of higher scoring beams.

Tests and Documentation

TBD. Want to get the actual attack code checked

riedgar-ms added 30 commits

November 4, 2025 11:45


          Adding a method to create a fresh instance of an OpenAIResponseTarget

e7ff64c


          Doing some bludgeoning

9d9363e


          Throwing together the rest of the code (I hope)

913bc6f


          Hacking...

2aa63db


          A little more output for debugging

8b09ce9


          Starting to think about making this an attack instead


          A couple more options

3b43850


          Copy more implementations

0a861e3


          Lint

65ae3da


          Trying to puzzle out how things fit

4ea1495


          Applying some bludgeoning

91ed0c6


          Chipping away, trying to get scoring working

02a60a0


          Merge remote-tracking branch 'origin/main' into riedgar-ms/beam-searc…

112ceed

…h-01


          Temporarily add test script to branch

a793a6f


          Merge branch 'main' into riedgar-ms/beam-search-01

7c37ac1


          Tweaking due to all changes since this was started

1d98653


          Small change

afa2aa7


          More updates

f7f6aa5


          Work better?

bb1b6fd


          Something done for us now

0f1f485


          Merge remote-tracking branch 'origin/main' into riedgar-ms/beam-searc…

81b8e71

…h-01


          Cleaning up a little more

45cbe76


          More updates (still not quite working)

5fc3042


          Clean up some more errors

8c20d10


          Get the beams propagating

84d852d


          Merge remote-tracking branch 'origin/main' into riedgar-ms/beam-searc…

fdaeeb0

…h-01


          Go on to the scoring


          Use the Chat target for the refusal scorer.... seems more reliable

2f1e15e


          Getting things working

42f66b1


          Fiddling, trying to make stuff more robust

e0b0061

riedgar-ms added 12 commits

January 30, 2026 11:26


          Fiddling

25f4492


          Fiddle


          Don't need this any more

6b576cf


          Trying to draft the final portion

a200a71


          Putting in a little documentation


          Docstringing

1e72ca7


          Tweaking

87ee54b


          Merge remote-tracking branch 'origin/main' into riedgar-ms/beam-searc…

1a9a59c

…h-01


          Better output

c5edaef


          Bundle up all the final beams into related conversations

c9d388e


          Move to env variables

aeb5a42


          Docstring

fa785ba

riedgar-ms commented

View reviewed changes

Contributor Author

riedgar-ms left a comment

This is ready for preliminary review; there aren't any docs or (proper) tests yet. I'd like to make sure that I'm manipulating the database correctly before delving into those.

beam_search_test.py

		@@ -0,0 +1,72 @@
		import asyncio

Contributor Author

riedgar-ms Feb 2, 2026

This file will ultimately be converted into tests and/or a notebook. For now, it's the easiest way for me to test.

pyrit/prompt_target/openai/openai_response_target.py

+                          **deepcopy(kwargs),
+                      }
+                  def fresh_instance(self) -> "OpenAIResponseTarget":

Contributor Author

riedgar-ms Feb 2, 2026

These changes are required because the OpenAI API takes a grammar as a tool and PyRIT makes the tool list part of the object, not the send_prompt_async() API. Since we have multiple beams being managed asynchronously, each task needs its own copy of the OpenAIResponseTarget

pyrit/executor/attack/single_turn/beam_search.py

		logger = logging.getLogger(__name__)


		def _print_message(message: Message) -> None:

Contributor Author

riedgar-ms Feb 2, 2026

Will be deleted; this is for my debugging convenience.

pyrit/executor/attack/single_turn/beam_search.py

		return new_beams


		class BeamSearchAttack(SingleTurnAttackStrategy):

Contributor Author

riedgar-ms Feb 2, 2026

This is largely copied from the PromptSendingAttack

pyrit/executor/attack/single_turn/beam_search.py Outdated Show resolved Hide resolved

pyrit/executor/attack/single_turn/beam_search.py

+                      target = self._get_target_for_beam(beam)
+                      current_context = copy.deepcopy(self._start_context)
+                      await self._setup_async(context=current_context)

Contributor Author

riedgar-ms Feb 2, 2026

I'm not certain I'm handling the context correctly here. I end up making lots of copies of things, which is going to be filling up the database with fragmentary responses. Each time one is extended, it ends up being cloned and a new conversation started.

pyrit/executor/attack/single_turn/beam_search.py

+                          objective=context.objective,
+                      )
+                      aux_scores = scoring_results["auxiliary_scores"]

Contributor Author

riedgar-ms Feb 2, 2026

So the auxilliary scorer is required; it is used to assess the beams as they develop

pyrit/executor/attack/single_turn/beam_search.py Outdated

		pass


		class TopKBeamPruner(BeamPruner):

Contributor Author

riedgar-ms Feb 2, 2026

This is simple, bordering on simplistic, meant mainly as a concrete example of what is required.

pyrit/executor/attack/single_turn/beam_search.py

+                      new_beams = list(reversed(sorted_beams[: self.k]))
+                      for i in range(len(beams) - len(new_beams)):
+                          nxt = copy.deepcopy(new_beams[i % self.k])

Contributor Author

riedgar-ms Feb 2, 2026

We don't just duplicate the highest scoring beam, in the hope of maintaining some variety

pyrit/executor/attack/single_turn/beam_search.py

+                      Args:
+                          context (SingleTurnAttackContext): The attack context containing attack parameters.
+                      """
+                      self._start_context = copy.deepcopy(context)

Contributor Author

riedgar-ms Feb 2, 2026

See note below. I duplicate the context and the message for each beam on each iteration. I'm not certain that this is the best way to use the database.

riedgar-ms added 7 commits

February 3, 2026 08:40


          Merge remote-tracking branch 'origin/main' into riedgar-ms/beam-searc…

39a4bd3

…h-01


          Rename class

53f0347


          Various improvements

d6b544e


          Linting from pre-commit

225f74c


          Fix some mypy issues

2c085cb


          mypy and linting

8531fe4


          Merge remote-tracking branch 'origin/main' into riedgar-ms/beam-searc…

18653b0

…h-01

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet