FEAT: Anthropic Model-Written Evals Dataset#1170
Conversation
[pull] main from Azure:main
- Implement fetch_sorry_bench_dataset() for sorry-bench/sorry-bench-202503 on Hugging Face - Support filtering by 44 categories and 21 prompt style linguistic mutations - Default to base prompts only - Add unit tests (10 tests) - Add integration test - Update API documentation
…dation - Change prompt_styles (list) to prompt_style (string) for simpler API - Add input validation for categories and prompt_style - Add VALID_CATEGORIES and VALID_PROMPT_STYLES for input validation - Fix exception handling to preserve stack traces - Update tests to match new API (12 tests, all passing)
Implements support for Anthropic's model-written-evals dataset with 154 evaluation datasets across 4 categories. Changes: - Add fetch_anthropic_evals_dataset() function with category filtering - Support for persona, sycophancy, advanced-ai-risk, and winogenerated categories - Fetches directly from GitHub repository using API for file discovery - Add unit tests (12 tests) - Add integration test - Update API docs
|
Thanks for adding these! made a few stylistic suggestions! |
Co-authored-by: hannahwestra25 <hannahwestra@microsoft.com>
Co-authored-by: hannahwestra25 <hannahwestra@microsoft.com>
romanlutz
left a comment
There was a problem hiding this comment.
Nice work! Looking forward to having this dataset fetcher in PyRIT 🙂
…rated loading has empty prompts winogenerated_occupations.jsonl???
|
@0xm00n btw I just saw Anthropic drop a new dataset: https://www.anthropic.com/news/political-even-handedness Any chance you're interested in contributing a similar fetcher for it as well? It has two prompts per row so it would require a tiny bit of custom handling, but otherwise (more?) straightforward than this one. |
Yup, seems pretty easy. will work on it soon |
…cture Changed fetch_anthropic_evals_dataset() to return QuestionAnsweringDataset instead of SeedDataset, following the QA structure established in PR Azure#894. - Added _parse_answer_choices() to convert answer behaviors into structured QuestionChoice objects with proper indexing - Preserved letter prefixes in choice text - Used answer_matching_behavior as correct_answer for consistency - Special handling for winogenerated datasets with pronoun_options - Removed redundant tests, added critical tests for choice parsing edge cases - Fixed docstrings to comply with ruff linting rules This makes the dataset compatible with QuestionAnsweringBenchmark orchestrator and follows the pattern where choices are structured objects rather than metadata strings.
|
@romanlutz @AdrGav941 I've refactored into the QA structure in the latest commit. let me know :) |
|
The latest iteration looks good to me! If the added unit tests pass this should be good. Thanks for refactoring @0xm00n ! |
|
Yes, all tests pass! |
|
@romanlutz following up here so we can close the PR |
Resolve conflicts by adopting main's new SeedDatasetProvider architecture while keeping fetch_anthropic_evals_dataset as a standalone function (returns QuestionAnsweringDataset, not SeedDataset).
|
@romanlutz refactored for conflicts given the new v0.10.0! should save you some work :) |
|
Thank you! Will take a look today. |
Co-authored-by: Roman Lutz <romanlutz13@gmail.com>
Co-authored-by: Roman Lutz <romanlutz13@gmail.com>
Co-authored-by: Roman Lutz <romanlutz13@gmail.com>
…emove dataset_helper
| return choices, correct_index | ||
|
|
||
|
|
||
| def _fetch_jsonl_from_url(url: str, cache: bool = True) -> List[Dict]: |
There was a problem hiding this comment.
This functionality already exists in the RemoteDatasetLoader but the other QA dataset in this folder isn't implemented as a RDL yet either. @rlundeen2 any insights as to why? Should they become RemoteDatasetLoaders?
There was a problem hiding this comment.
I think what would be good is to have a shared fetching/caching logic into a common utility module that both _RemoteDatasetLoader and a new _RemoteQADatasetLoader use. Thoughts @romanlutz ?
Description
This PR adds support for the Anthropic model-written-evals dataset to PyRIT. The model-written-evals dataset contains 154 evaluation datasets designed to test LLM behaviors across 4 main categories: persona traits, sycophancy, advanced AI risks, and gender bias. The evaluations use language models to automatically generate test cases across multiple behavioral dimensions.
Dataset: https://github.com/anthropics/evals
Associated Paper: https://arxiv.org/abs/2212.09251
Work Completed
fetch_anthropic_evals_dataset()function inanthropic_evals_dataset.pytest_anthropic_evals_dataset.py(12 test cases)test_fetch_datasets.pyapi.rstpyrit/datasets/__init__.pyRelated Issue
Contributes to issue #450