😊Continuous Updates

Yinan Chen ^1★ · Jiangning Zhang ^1,2★ · Teng Hu ³ · Yuxiang Zeng ⁴ · Zhucun Xue ¹ ·
Qingdong He ² · Chengjie Wang ^2,3 · Yong Liu ^1† · Xiaobin Hu ² · Shuicheng Yan ⁵

¹Zhejiang University ²YouTu Lab, Tencent ³Shanghai Jiao Tong University
⁴University of Auckland ⁵National University of Singapore

😊Continuous Updates

This repository is a comprehensive collection of resources for IVEBench, If you find any work missing or have any suggestions, feel free to pull requests or contact us. We will promptly add the missing papers to this repository.

🔥 More up-to-date instruction-guided video editing methods will continue to be updated.

📝 Update:

[2026-01-27] IVEBench has been accepted by ICLR 2026.🎉🎉🎉
[2025-11-27] Supports adjusting weights for each dimension.
[2025-11-26] Update Evaluation Results: Ditto
[2025-10-23] Update Evaluation Results: Lucy-Edit-Dev, Omni-Video, ICVE
[2025-10-16] Update Evaluation Results: InsV2V, StableV2V, AnyV2V, VACE

🤓 You can view the scores and comparisons of each method at IVEBench LeaderBoard.

✨ Highlight!!!

Compared with existing video editing benchmarks, our proposed IVEBench offers the following key advantages:

Comprehensive support for IVE methods: IVEBench is specifically designed to evaluate instruction-guided video editing (IVE) models while remaining compatible with traditional source-target prompt-based methods, ensuring broad applicability across editing paradigms;
Diverse and semantically rich video corpus: The benchmark contains 600 high-quality source videos spanning seven semantic dimensions and thirty topics, with frame lengths ranging from 32 to 1,024, providing wide coverage of real-world scenarios;
Comprehensive editing taxonomy: IVEBench includes eight major editing categories and thirty-five subcategories, encompassing diverse editing types such as style, attribute, subject motion, camera motion, and visual effect editing, to fully represent instruction-guided behaviors;
Integration of MLLM-based and traditional metrics: The evaluation protocol combines conventional objective indicators with multimodal large language model (MLLM)-based assessments across three dimensions (video quality, instruction compliance, and video fidelity) for more human-aligned and holistic evaluation;
Extensive benchmarking of state-of-the-art models: We conduct a thorough quantitative and qualitative evaluation of leading IVE models—including InsV2V, AnyV2V, StableV2V, as well as the multi-conditional video editing framework VACE, establishing a unified and fair standard for future research.

📬Summary of Contents

Introduction
Highlight
Data Pipeline
Benchmark Statistics
Installation
Usage
Experiments
Citation
Contact

🎥 Data Pipeline

Data acquisition and processing pipeline of IVEBench. 1) Curation process to 600 high-quality diverse videos. 2) Well-designed pipeline for comprehensive editing prompts.

The playback of the source videos can be viewed on IVEBench website.

🌻Benchmark Statistics

Statistical distributions of IVEBench DB

🔨Installation

1. Install requirements

git clone git@github.com:RyanChenYN/IVEBench.git
cd IVEBench
conda create -n ivebench python=3.12
conda activate ivebench
pip install -r requirements.txt

2. Install requirements for Grounding DINO

Grounding DINO requires additional installation steps, which can be found in the Install section of Grounding DINO

3. Downloads the checkpoints used

All checkpoints utilized in this project are listed in matrics/path.yml. Additionally, you may download the following pretrained models as referenced below:

Qwen/Qwen2.5-VL-72B-Instruct
Koala-36M/Training_Suitability_Assessment
alibaba-pai/VideoCLIP-XL-v2
baseline_offline.pth from facebook/cotracker3
groundingdino_swinb_cogcoor.pth from Grounding DINO

After downloading the required checkpoints, you should replace the corresponding loading paths in matrics/path.yml with the local directories where the checkpoints are stored.

4. Downloads the IVEBench Database

This section provides access to the IVEBench Database, which contains the complete .mp4 video data of IVEBench and a .csv file (the file provides the original URLs for each video in the IVEBench Database, except for those from the OpenHumanVid subset, which do not have corresponding URLs). 🥰You can download IVEBench DB to your local path using the following command:

huggingface-cli download --repo-type dataset --resume-download Coraxor/IVEBench --local-dir $YOUR_LOCAL_PATH

💪Usage

You first need to run your own video editing model on the IVEBench DB to generate the corresponding Target Video dataset.
- For each source video, the associated source prompt, edit prompt, target prompt, target phrase, and target span are stored in the .json file provided within the IVEBench DB.
- The filenames of the videos in your generated Target Video dataset must match the corresponding source video names exactly.
- The metric computation of IVEBench requires both the original and target videos to be in the form of video frame folders. Therefore, you need to convert the .mp4 videos downloaded from IVEBench DB into video frame folders. Similarly, if the target videos you generate are in .mp4 format, they also need to be converted.
```
python data_process/mp42frames_batch.py --input_path $INPUT_PATH --output_path $OUTPUT_PATH
```
- The IVEBench DB contains videos ranging from 720P to 8K resolution, with frame counts between 32 and 1024. If your method has limitations regarding resolution or frame count, you can use data_process/resize_batch.py to perform downscaling and frame sampling on the frame folders converted from the IVEBench DB. This will produce a source video dataset at the maximum resolution and frame count supported by your method, making subsequent editing and evaluation more convenient.
```
python data_process/resize_batch.py --input_path $INPUT_PATH --output_path $OUTPUT_PATH --size $WIDTH $HEIGHT --max_frame $MAX_FRAME
```
After you have properly set up the environment, loaded the model weights, prepared the IVEBench DB, and generated the Target Video dataset using your editing method on IVEBench DB, you can use the evaluation script below to compute the performance scores for each video in your Target Video dataset across all metrics. And the evaluation results will be exported as a CSV file.
```
cd metrics
python evaluate.py \
    --output_path $YOUR_TARGET_VIDEOS_DIR \
    --source_videos_path $IVEBENCHDB_SOURCE_VIDEOS_DIR \
    --target_videos_path $YOUR_TARGET_VIDEOS_DIR \
    --info_json_path PROMPT_JSON_PATH \
    --metric $LIST_OF_METRICS_YOU_NEED \
```
After obtaining the evaluation results on each videos, you can use metrics\get_average_score.py to get the total score of your method on the IVEBench DB, as well as the average scores across the three dimensions and all individual metrics.
```
python get_average_score.py -i $INPUT_CSV -o $OUTPUT_CSV
```
It is important to note that IVEBench is divided into two subsets: the IVEBench DB Short subset and the IVEBench DB Long subset. The Short subset contains videos with 32–128 frames, while the Long subset contains videos with 129–1024 frames, representing a higher level of difficulty. If you wish to evaluate your method on the full IVEBench DB, you need to generate the Target Video dataset for both subsets separately and perform evaluation on each subset independently.

📊Experiments

Performance score of different methods

The continuously updated, sortable table of the latest IVE methods is available on the IVEBench website

Quantitative Visualization

IVEBench Evaluation Results of Video Editing Models. We visualize the evaluation results of four IVE models in 12 IVEBench metrics. We normalize the results per dimension for clearer comparisons.

Quanlitative Visualization

Comparative demonstrations of the source videos and the target videos generated by different methods can be viewed on IVEBench website.

✒️Citation

If you If you find IVEBench useful for your research, please consider giving a star⭐ and citation📝 :)

@inproceedings{chen2026ivebench,
  title={IVEBench: Modern Benchmark Suite for Instruction-Guided Video Editing Assessment},
  author={Chen, Yinan and Zhang, Jiangning and Hu, Teng and Zeng, Yuxiang and Xue, Zhucun and He, Qingdong and Wang, Chengjie and Liu, Yong and Hu, Xiaobin and Yan, Shuicheng},
  booktitle={The Fourteenth International Conference on Learning Representations},
  year={2026}
}

✉️Contact

yinanchencs@outlook.com

186368@zju.edu.cn

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
assets		assets
data_process		data_process
metrics		metrics
README.md		README.md
ivebench.yml		ivebench.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

😊Continuous Updates

✨ Highlight!!!

📬Summary of Contents

🎥 Data Pipeline

🌻Benchmark Statistics

🔨Installation

1. Install requirements

2. Install requirements for Grounding DINO

3. Downloads the checkpoints used

4. Downloads the IVEBench Database

💪Usage

📊Experiments

Performance score of different methods

Quantitative Visualization

Quanlitative Visualization

✒️Citation

✉️Contact

About

Uh oh!

Releases

Packages

Languages

RyanChenYN/IVEBench

Folders and files

Latest commit

History

Repository files navigation

😊Continuous Updates

✨ Highlight!!!

📬Summary of Contents

🎥 Data Pipeline

🌻Benchmark Statistics

🔨Installation

1. Install requirements

2. Install requirements for Grounding DINO

3. Downloads the checkpoints used

4. Downloads the IVEBench Database

💪Usage

📊Experiments

Performance score of different methods

Quantitative Visualization

Quanlitative Visualization

✒️Citation

✉️Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages