Spaces:
Running
on
Zero
Running
on
Zero
VLM
We follow InternVL2 to evaluate the performance on MME, MMBench, MMMU, MMVet, MathVista and MMVP.
Data prepration
Please follow the InternVL2 to prepare the corresponding data. And the link the data under vlm
.
The final directory structure is:
data
βββ MathVista
βββ mmbench
βββ mme
βββ MMMU
βββ mm-vet
βββ MMVP
Evaluation
Directly run scripts/eval/run_eval_vlm.sh
to evaluate different benchmarks. The output will be saved in $output_path
.
- Set
$model_path
and$output_path
for the path for checkpoint and log. - Increase
GPUS
if you want to run faster. - For MMBench, please use the official evaluation server.
- For MMVet, please use the official evaluation server.
- For MathVista, please set
$openai_api_key
inscripts/eval/run_eval_vlm.sh
andyour_api_url
ineval/vlm/eval/mathvista/utilities.py
. The default GPT version isgpt-4o-2024-11-20
. - For MMMU, we use CoT in the report, which improve the accuracy by about 2%. For evaluation of the oprn-ended answer, we use GPT-4o for judgement.
GenEval
We modify the code in GenEval for faster evaluation.
Setup
Install the following dependencies:
pip install open-clip-torch
pip install clip-benchmark
pip install --upgrade setuptools
sudo pip install -U openmim
sudo mim install mmengine mmcv-full==1.7.2
git clone https://github.com/open-mmlab/mmdetection.git
cd mmdetection; git checkout 2.x
pip install -v -e .
Download Detector:
cd ./eval/gen/geneval
mkdir model
bash ./evaluation/download_models.sh ./model
Evaluation
Directly run scripts/eval/run_geneval.sh
to evaluate GenEVAL. The output will be saved in $output_path
.
- Set
$model_path
and$output_path
for the path for checkpoint and log. - Set
metadata_file
to./eval/gen/geneval/prompts/evaluation_metadata.jsonl
for original GenEval prompts.
WISE
We modify the code in WISE for faster evaluation.
Evaluation
Directly run scripts/eval/run_wise.sh
to evaluate WISE. The output will be saved in $output_path
.
- Set
$model_path
and$output_path
for the path for checkpoint and log. - Set
$openai_api_key
inscripts/eval/run_wise.sh
andyour_api_url
ineval/gen/wise/gpt_eval_mp.py
. The default GPT version isgpt-4o-2024-11-20
. - Use
think
for thinking mode.
GEdit-Bench
Please follow GEdit-Bench for evaluation.
IntelligentBench
TBD