FEAT: Anthropic Model-Written Evals Dataset by 0xm00n · Pull Request #1170 · Azure/PyRIT

0xm00n · 2025-11-05T18:01:33Z

Description

This PR adds support for the Anthropic model-written-evals dataset to PyRIT. The model-written-evals dataset contains 154 evaluation datasets designed to test LLM behaviors across 4 main categories: persona traits, sycophancy, advanced AI risks, and gender bias. The evaluations use language models to automatically generate test cases across multiple behavioral dimensions.

Dataset: https://github.com/anthropics/evals

Associated Paper: https://arxiv.org/abs/2212.09251

Work Completed

Implemented the fetch_anthropic_evals_dataset() function in anthropic_evals_dataset.py
Added unit tests in test_anthropic_evals_dataset.py (12 test cases)
Added integration test in test_fetch_datasets.py
Updated API documentation in api.rst
Registered function in pyrit/datasets/__init__.py

Related Issue

Contributes to issue #450

[pull] main from Azure:main

- Implement fetch_sorry_bench_dataset() for sorry-bench/sorry-bench-202503 on Hugging Face - Support filtering by 44 categories and 21 prompt style linguistic mutations - Default to base prompts only - Add unit tests (10 tests) - Add integration test - Update API documentation

…dation - Change prompt_styles (list) to prompt_style (string) for simpler API - Add input validation for categories and prompt_style - Add VALID_CATEGORIES and VALID_PROMPT_STYLES for input validation - Fix exception handling to preserve stack traces - Update tests to match new API (12 tests, all passing)

Implements support for Anthropic's model-written-evals dataset with 154 evaluation datasets across 4 categories. Changes: - Add fetch_anthropic_evals_dataset() function with category filtering - Support for persona, sycophancy, advanced-ai-risk, and winogenerated categories - Fetches directly from GitHub repository using API for file discovery - Add unit tests (12 tests) - Add integration test - Update API docs

…aset

pyrit/datasets/anthropic_evals_dataset.py

pyrit/datasets/sorry_bench_dataset.py

hannahwestra25 · 2025-11-07T16:49:41Z

Thanks for adding these! made a few stylistic suggestions!

Co-authored-by: hannahwestra25 <hannahwestra@microsoft.com>

romanlutz

Nice work! Looking forward to having this dataset fetcher in PyRIT 🙂

pyrit/datasets/anthropic_evals_dataset.py

pyrit/datasets/executors/question_answer/anthropic_evals_dataset.py

…rated loading has empty prompts winogenerated_occupations.jsonl???

romanlutz · 2025-11-14T14:37:58Z

@0xm00n btw I just saw Anthropic drop a new dataset: https://www.anthropic.com/news/political-even-handedness

Any chance you're interested in contributing a similar fetcher for it as well? It has two prompts per row so it would require a tiny bit of custom handling, but otherwise (more?) straightforward than this one.

0xm00n · 2025-11-15T00:25:54Z

@0xm00n btw I just saw Anthropic drop a new dataset: https://www.anthropic.com/news/political-even-handedness

Any chance you're interested in contributing a similar fetcher for it as well? It has two prompts per row so it would require a tiny bit of custom handling, but otherwise (more?) straightforward than this one.

Yup, seems pretty easy. will work on it soon

…cture Changed fetch_anthropic_evals_dataset() to return QuestionAnsweringDataset instead of SeedDataset, following the QA structure established in PR Azure#894. - Added _parse_answer_choices() to convert answer behaviors into structured QuestionChoice objects with proper indexing - Preserved letter prefixes in choice text - Used answer_matching_behavior as correct_answer for consistency - Special handling for winogenerated datasets with pronoun_options - Removed redundant tests, added critical tests for choice parsing edge cases - Fixed docstrings to comply with ruff linting rules This makes the dataset compatible with QuestionAnsweringBenchmark orchestrator and follows the pattern where choices are structured objects rather than metadata strings.

0xm00n · 2025-11-21T21:24:16Z

@romanlutz @AdrGav941 I've refactored into the QA structure in the latest commit. let me know :)

AdrGav941 · 2025-11-21T22:09:51Z

The latest iteration looks good to me! If the added unit tests pass this should be good. Thanks for refactoring @0xm00n !

0xm00n · 2025-11-21T22:11:57Z

Yes, all tests pass!

0xm00n · 2025-12-03T00:45:26Z

@romanlutz following up here so we can close the PR

Resolve conflicts by adopting main's new SeedDatasetProvider architecture while keeping fetch_anthropic_evals_dataset as a standalone function (returns QuestionAnsweringDataset, not SeedDataset).

0xm00n · 2025-12-19T01:45:37Z

@romanlutz refactored for conflicts given the new v0.10.0! should save you some work :)

romanlutz · 2025-12-19T18:25:29Z

Thank you! Will take a look today.

doc/api.rst

pyrit/datasets/__init__.py

pyrit/datasets/dataset_helper.py

pyrit/datasets/executors/question_answer/anthropic_evals_dataset.py

Co-authored-by: Roman Lutz <romanlutz13@gmail.com>

…emove dataset_helper

romanlutz · 2025-12-21T14:35:29Z

pyrit/datasets/executors/question_answer/anthropic_evals_dataset.py

+    return choices, correct_index
+
+
+def _fetch_jsonl_from_url(url: str, cache: bool = True) -> List[Dict]:


This functionality already exists in the RemoteDatasetLoader but the other QA dataset in this folder isn't implemented as a RDL yet either. @rlundeen2 any insights as to why? Should they become RemoteDatasetLoaders?

I think what would be good is to have a shared fetching/caching logic into a common utility module that both _RemoteDatasetLoader and a new _RemoteQADatasetLoader use. Thoughts @romanlutz ?

Yes that sounds right.

0xm00n and others added 5 commits November 2, 2025 20:10

Merge pull request #34 from Azure/main

2a44aa3

[pull] main from Azure:main

Merge branch 'Azure:main' into feat/anthropic-model-written-evals-dat…

cee6223

…aset

hannahwestra25 reviewed Nov 7, 2025

View reviewed changes

pyrit/datasets/anthropic_evals_dataset.py Outdated Show resolved Hide resolved

hannahwestra25 reviewed Nov 7, 2025

View reviewed changes

pyrit/datasets/anthropic_evals_dataset.py Outdated Show resolved Hide resolved

hannahwestra25 reviewed Nov 7, 2025

View reviewed changes

pyrit/datasets/sorry_bench_dataset.py Outdated Show resolved Hide resolved

0xm00n and others added 5 commits November 7, 2025 16:01

Update pyrit/datasets/anthropic_evals_dataset.py

3c3d27b

Co-authored-by: hannahwestra25 <hannahwestra@microsoft.com>

Update pyrit/datasets/sorry_bench_dataset.py

9f6bd77

Co-authored-by: hannahwestra25 <hannahwestra@microsoft.com>

get rid of redundant error block

246723e

Merge branch 'main' into feat/anthropic-model-written-evals-dataset

ebc3345

Merge branch 'main' into feat/anthropic-model-written-evals-dataset

7ef1b1d

romanlutz reviewed Nov 12, 2025

View reviewed changes

pyrit/datasets/anthropic_evals_dataset.py Outdated Show resolved Hide resolved

pyrit/datasets/anthropic_evals_dataset.py Outdated Show resolved Hide resolved

pyrit/datasets/anthropic_evals_dataset.py Outdated Show resolved Hide resolved

0xm00n and others added 4 commits November 12, 2025 10:52

precommit fixes

569033f

Merge branch 'main' into feat/anthropic-model-written-evals-dataset

c411244

Merge branch 'main' into feat/anthropic-model-written-evals-dataset

7f85e30

Merge branch 'main' into feat/anthropic-model-written-evals-dataset

15395bb

romanlutz approved these changes Nov 13, 2025

View reviewed changes

Merge branch 'main' into feat/anthropic-model-written-evals-dataset

8f4c44b

romanlutz reviewed Nov 13, 2025

View reviewed changes

pyrit/datasets/executors/question_answer/anthropic_evals_dataset.py Show resolved Hide resolved

romanlutz self-requested a review November 13, 2025 20:45

0xm00n and others added 2 commits November 13, 2025 23:25

fix category loading for winogenerated and advanced-ai-risk. winogene…

e6f3318

…rated loading has empty prompts winogenerated_occupations.jsonl???

Merge branch 'main' into feat/anthropic-model-written-evals-dataset

c9225a9

0xm00n and others added 3 commits November 21, 2025 12:39

Merge branch 'main' into feat/anthropic-model-written-evals-dataset

9037ee7

Merge branch 'main' into feat/anthropic-model-written-evals-dataset

5fd45a1

AdrGav941 approved these changes Nov 21, 2025

View reviewed changes

0xm00n added 2 commits November 26, 2025 12:32

Merge branch 'main' into feat/anthropic-model-written-evals-dataset

6fcaced

Merge branch 'main' into feat/anthropic-model-written-evals-dataset

7d9fe63

Merge branch 'main' into feat/anthropic-model-written-evals-dataset

edf8531

romanlutz self-assigned this Dec 9, 2025

Merge main into feat/anthropic-model-written-evals-dataset

320e5a9

Resolve conflicts by adopting main's new SeedDatasetProvider architecture while keeping fetch_anthropic_evals_dataset as a standalone function (returns QuestionAnsweringDataset, not SeedDataset).

0xm00n and others added 2 commits December 18, 2025 20:46

Merge branch 'main' into feat/anthropic-model-written-evals-dataset

cdc51ff

add new dataset_helper file for tests

a22ecaf

romanlutz reviewed Dec 19, 2025

View reviewed changes

0xm00n and others added 5 commits December 19, 2025 19:46

Update api.rst

cd17160

Co-authored-by: Roman Lutz <romanlutz13@gmail.com>

Update __init__.py

f8a576f

Co-authored-by: Roman Lutz <romanlutz13@gmail.com>

Update __init__.py

64185cd

Co-authored-by: Roman Lutz <romanlutz13@gmail.com>

Address PR review: move anthropic_evals_dataset to question_answer, r…

cac26b0

…emove dataset_helper

Merge branch 'main' into feat/anthropic-model-written-evals-dataset

7eb79cc

romanlutz reviewed Dec 21, 2025

View reviewed changes

0xm00n added 2 commits January 7, 2026 20:49

Merge branch 'main' into feat/anthropic-model-written-evals-dataset

afd2c79

Merge branch 'main' into feat/anthropic-model-written-evals-dataset

c31c141

		return choices, correct_index


		def _fetch_jsonl_from_url(url: str, cache: bool = True) -> List[Dict]:

Conversation

0xm00n commented Nov 5, 2025

Description

Work Completed

Related Issue

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hannahwestra25 commented Nov 7, 2025

Uh oh!

romanlutz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

romanlutz commented Nov 14, 2025

Uh oh!

0xm00n commented Nov 15, 2025

Uh oh!

0xm00n commented Nov 21, 2025

Uh oh!

AdrGav941 commented Nov 21, 2025

Uh oh!

0xm00n commented Nov 21, 2025

Uh oh!

0xm00n commented Dec 3, 2025

Uh oh!

0xm00n commented Dec 19, 2025

Uh oh!

romanlutz commented Dec 19, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

romanlutz Dec 21, 2025

Choose a reason for hiding this comment

Uh oh!

0xm00n Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

romanlutz Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

0xm00n Jan 7, 2026 •

edited

Loading