Best practice for QwQ-32B evaluation
#55
by
wangxingjun778
- opened
Best practice: https://evalscope.readthedocs.io/en/latest/best_practice/eval_qwq.html
EvalScope LLM Evaluation Framework: https://github.com/modelscope/evalscope
- Support “Overthinking” and "Underthinking" evaluation
- Support performance evaluation by math-level