enable evaluation script to also evaluate remote models#294
enable evaluation script to also evaluate remote models#294
Conversation
Reviewer's GuideThis PR refactors the evaluate_best_checkpoint script to support evaluating remote Hugging Face models by introducing an optional hf_model parameter, overhauling the argument parsing and validation logic, and updating usage examples accordingly. Sequence diagram for evaluation logic with local and remote modelssequenceDiagram
actor User
participant Script as evaluate_best_checkpoint.py
participant Typer
participant Evaluator as LeaderboardV2Evaluator
User->>Script: Run evaluate command with --input-dir or --hf-model
Script->>Typer: Parse arguments
alt Only --input-dir provided
Script->>Script: Validate input_dir exists and is a directory
Script->>Evaluator: Instantiate with model_path = input_dir
else Only --hf-model provided
Script->>Evaluator: Instantiate with model_path = hf_model
else Both or neither provided
Script->>Typer: Print error and exit
end
Evaluator-->>Script: Evaluation results
Script-->>User: Output results
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey @RobotSail - I've reviewed your changes - here's some feedback:
- Consider refactoring the input validation logic into a helper function to reduce clutter inside the
evaluatecommand. - Add a log message indicating whether you’re using a local directory or a remote HF model before creating the evaluator for better traceability.
- You might leverage Typer’s mutually exclusive option patterns or callbacks to enforce
--input-dirvs--hf-modelexclusivity more declaratively rather than manual checks.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider refactoring the input validation logic into a helper function to reduce clutter inside the `evaluate` command.
- Add a log message indicating whether you’re using a local directory or a remote HF model before creating the evaluator for better traceability.
- You might leverage Typer’s mutually exclusive option patterns or callbacks to enforce `--input-dir` vs `--hf-model` exclusivity more declaratively rather than manual checks.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| typer.echo(f"Error: '{input_dir}' is not a directory") | ||
| raise typer.Exit(1) | ||
|
|
||
| model_path = hf_model if hf_model else str(input_dir) |
There was a problem hiding this comment.
suggestion (code-quality): Replace if-expression with or (or-if-exp-identity)
| model_path = hf_model if hf_model else str(input_dir) | |
| model_path = hf_model or str(input_dir) |
Explanation
Here we find ourselves setting a value if it evaluates toTrue, and otherwiseusing a default.
The 'After' case is a bit easier to read and avoids the duplication of
input_currency.
It works because the left-hand side is evaluated first. If it evaluates to
true then currency will be set to this and the right-hand side will not be
evaluated. If it evaluates to false the right-hand side will be evaluated and
currency will be set to DEFAULT_CURRENCY.
Summary by Sourcery
Enable the evaluation script to accept either a local checkpoint directory or a remote HuggingFace model by introducing a --hf-model option, enforcing mutual exclusivity, and updating usage instructions
New Features:
Enhancements: