Best practice for QwQ-32B evaluation

#55
by wangxingjun778 - opened

Best practice: https://evalscope.readthedocs.io/en/latest/best_practice/eval_qwq.html
EvalScope LLM Evaluation Framework: https://github.com/modelscope/evalscope

  1. Support “Overthinking” and "Underthinking" evaluation
  2. Support performance evaluation by math-level

image.png

image.png

image.png

And some conclusions as follows:

image.png

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment