【Evaluation】Best practice for evaluating Qwen3 !!

#2
by wangxingjun778 - opened

For more details, please refer to: https://evalscope.readthedocs.io/en/latest/best_practice/qwen3.html
Power by: EvalScope https://github.com/modelscope/evalscope

  1. Speed Benchmark

image.png

image.png

  1. Benchmark collection (for evaluating abilities such as code、understanding、instruction following、math ...)

    NOTE: The result is based on samples of original benchmarks with eval arg --limit

image.png

  1. Thinking efficiency of Qwen3

image.png

image.png

  1. Run Gradio visualization
evalscope app

image.png

Get started and have fun ! :)

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment