Update README.md
Browse files
README.md
CHANGED
@@ -20,45 +20,45 @@
|
|
20 |
| Model 5-shot | STEM | Humanities | Social Science | Other | China-specific | Average |
|
21 |
| --- | --- | --- | --- | --- | --- | --- |
|
22 |
| Multilingual-oriented | | | | | | |
|
23 |
-
| https://openai.com/gpt4 | 65.23 | 72.11 | 72.06 | 74.79 | 66.12 | 70.95 |
|
24 |
-
| https://openai.com/chatgpt | 47.81 | 55.68 | 56.50 | 62.66 | 50.69 | 55.51 |
|
25 |
-
| https://huggingface.co/tiiuae/falcon-40b | 33.33 | 43.46 | 44.28 | 44.75 | 39.46 | 41.45 |
|
26 |
-
| https://github.com/facebookresearch/llama | 34.47 | 40.24 | 41.55 | 42.88 | 37.00 | 39.80 |
|
27 |
-
| https://github.com/bigscience-workshop/xmtf | 30.56 | 39.10 | 38.59 | 40.32 | 37.15 | 37.04 |
|
28 |
-
| https://github.com/mbzuai-nlp/bactrian-x | 27.52 | 32.47 | 32.27 | 35.77 | 31.56 | 31.88 |
|
29 |
| Chinese-oriented | | | | | | |
|
30 |
-
| Zhuzhi-6B | 40.30 | 48.08 | 46.72 | 47.41 | 45.51 | 45.60 |
|
31 |
-
| Zhuhai-13B | 42.39 | 61.57 | 60.48 | 58.57 | 55.68 | 55.74 |
|
32 |
-
| https://github.com/baichuan-inc/Baichuan-13B | 42.38 | 61.61 | 60.44 | 59.26 | 56.62 | 55.82 |
|
33 |
-
| https://huggingface.co/THUDM/chatglm2-6b | 42.55 | 50.98 | 50.99 | 50.80 | 48.37 | 48.80 |
|
34 |
-
| https://github.com/baichuan-inc/baichuan-7B | 35.25 | 48.07 | 47.88 | 46.61 | 44.14 | 44.43 |
|
35 |
-
| https://github.com/THUDM/GLM-130B | 32.35 | 39.22 | 39.65 | 38.62 | 37.70 | 37.48 |
|
36 |
-
| https://github.com/haonan-li/CMMLU/blob/master | 34.96 | 35.45 | 36.31 | 42.14 | 37.89 | 37.16 |
|
37 |
-
| https://github.com/ymcui/Chinese-LLaMA-Alpaca | 27.12 | 33.18 | 34.87 | 35.10 | 32.97 | 32.63 |
|
38 |
-
| https://github.com/OpenLMLab/MOSS | 27.23 | 30.41 | 28.84 | 32.56 | 28.68 | 29.57 |
|
39 |
-
| https://github.com/THUDM/GLM | 25.49 | 27.05 | 27.42 | 29.21 | 28.05 | 27.26 |
|
40 |
| Random | 25.00 | 25.00 | 25.00 | 25.00 | 25.00 | 25.00 |
|
41 |
|
42 |
| Model 0-shot | STEM | Humanities | Social Science | Other | China-specific | Average |
|
43 |
| --- | --- | --- | --- | --- | --- | --- |
|
44 |
| Multilingual-oriented | | | | | | |
|
45 |
-
| https://openai.com/gpt4 | 63.16 | 69.19 | 70.26 | 73.16 | 63.47 | 68.9 |
|
46 |
-
| https://openai.com/chatgpt | 44.8 | 53.61 | 54.22 | 59.95 | 49.74 | 53.22 |
|
47 |
-
| https://github.com/bigscience-workshop/xmtf | 33.03 | 45.74 | 45.74 | 46.25 | 41.58 | 42.8 |
|
48 |
-
| https://huggingface.co/tiiuae/falcon-40b | 31.11 | 41.3 | 40.87 | 40.61 | 36.05 | 38.5 |
|
49 |
-
| https://github.com/facebookresearch/llama | 31.09 | 34.45 | 36.05 | 37.94 | 32.89 | 34.88 |
|
50 |
-
| https://github.com/mbzuai-nlp/bactrian-x | 26.46 | 29.36 | 31.81 | 31.55 | 29.17 | 30.06 |
|
51 |
| Chinese-oriented | | | | | | |
|
52 |
-
| Zhuzhi-6B | 42.51 | 48.91 | 48.85 | 50.25 | 47.57 | 47.62 |
|
53 |
-
| Zhuhai-13B | 42.37 | 60.97 | 59.71 | 56.35 | 54.81 | 54.84 |
|
54 |
-
| https://github.com/baichuan-inc/Baichuan-13B | 42.04 | 60.49 | 59.55 | 56.6 | 55.72 | 54.63 |
|
55 |
-
| https://huggingface.co/THUDM/chatglm2-6b | 41.28 | 52.85 | 53.37 | 52.24 | 50.58 | 49.95 |
|
56 |
-
| https://github.com/baichuan-inc/baichuan-7B | 32.79 | 44.43 | 46.78 | 44.79 | 43.11 | 42.33 |
|
57 |
-
| https://github.com/THUDM/GLM-130B | 32.22 | 42.91 | 44.81 | 42.6 | 41.93 | 40.79 |
|
58 |
-
| https://github.com/haonan-li/CMMLU/blob/master | 33.72 | 36.53 | 38.07 | 46.94 | 38.32 | 38.51 |
|
59 |
-
| https://github.com/ymcui/Chinese-LLaMA-Alpaca | 26.76 | 26.57 | 27.42 | 28.33 | 26.73 | 27.34 |
|
60 |
-
| https://github.com/OpenLMLab/MOSS | 25.68 | 26.35 | 27.21 | 27.92 | 26.7 | 26.88 |
|
61 |
-
| https://github.com/THUDM/GLM | 25.57 | 25.01 | 26.33 | 25.94 | 25.81 | 25.8 |
|
62 |
| Random | 25 | 25 | 25 | 25 | 25 | 25 |
|
63 |
|
64 |
# **推理对话**
|
|
|
20 |
| Model 5-shot | STEM | Humanities | Social Science | Other | China-specific | Average |
|
21 |
| --- | --- | --- | --- | --- | --- | --- |
|
22 |
| Multilingual-oriented | | | | | | |
|
23 |
+
| [GPT4](https://openai.com/gpt4) | 65.23 | 72.11 | 72.06 | 74.79 | 66.12 | 70.95 |
|
24 |
+
| [ChatGPT](https://openai.com/chatgpt) | 47.81 | 55.68 | 56.50 | 62.66 | 50.69 | 55.51 |
|
25 |
+
| [Falcon-40B](https://huggingface.co/tiiuae/falcon-40b) | 33.33 | 43.46 | 44.28 | 44.75 | 39.46 | 41.45 |
|
26 |
+
| [LLaMA-65B](https://github.com/facebookresearch/llama) | 34.47 | 40.24 | 41.55 | 42.88 | 37.00 | 39.80 |
|
27 |
+
| [BLOOMZ-7B](https://github.com/bigscience-workshop/xmtf) | 30.56 | 39.10 | 38.59 | 40.32 | 37.15 | 37.04 |
|
28 |
+
| [Bactrian-LLaMA-13B](https://github.com/mbzuai-nlp/bactrian-x) | 27.52 | 32.47 | 32.27 | 35.77 | 31.56 | 31.88 |
|
29 |
| Chinese-oriented | | | | | | |
|
30 |
+
| [Zhuzhi-6B](https://github.com/emotibot-inc/Zhuzhi-6B) | 40.30 | 48.08 | 46.72 | 47.41 | 45.51 | 45.60 |
|
31 |
+
| [Zhuhai-13B](https://github.com/emotibot-inc/Zhuhai-13B) | 42.39 | 61.57 | 60.48 | 58.57 | 55.68 | 55.74 |
|
32 |
+
| [Baichuan-13B](https://github.com/baichuan-inc/Baichuan-13B) | 42.38 | 61.61 | 60.44 | 59.26 | 56.62 | 55.82 |
|
33 |
+
| [ChatGLM2-6B](https://huggingface.co/THUDM/chatglm2-6b) | 42.55 | 50.98 | 50.99 | 50.80 | 48.37 | 48.80 |
|
34 |
+
| [Baichuan-7B](https://github.com/baichuan-inc/baichuan-7B) | 35.25 | 48.07 | 47.88 | 46.61 | 44.14 | 44.43 |
|
35 |
+
| [ChatGLM-6B](https://github.com/THUDM/GLM-130B) | 32.35 | 39.22 | 39.65 | 38.62 | 37.70 | 37.48 |
|
36 |
+
| [BatGPT-15B](https://github.com/haonan-li/CMMLU/blob/master) | 34.96 | 35.45 | 36.31 | 42.14 | 37.89 | 37.16 |
|
37 |
+
| [Chinese-LLaMA-13B](https://github.com/ymcui/Chinese-LLaMA-Alpaca) | 27.12 | 33.18 | 34.87 | 35.10 | 32.97 | 32.63 |
|
38 |
+
| [MOSS-SFT-16B](https://github.com/OpenLMLab/MOSS) | 27.23 | 30.41 | 28.84 | 32.56 | 28.68 | 29.57 |
|
39 |
+
| [Chinese-GLM-10B](https://github.com/THUDM/GLM) | 25.49 | 27.05 | 27.42 | 29.21 | 28.05 | 27.26 |
|
40 |
| Random | 25.00 | 25.00 | 25.00 | 25.00 | 25.00 | 25.00 |
|
41 |
|
42 |
| Model 0-shot | STEM | Humanities | Social Science | Other | China-specific | Average |
|
43 |
| --- | --- | --- | --- | --- | --- | --- |
|
44 |
| Multilingual-oriented | | | | | | |
|
45 |
+
| [GPT4](https://openai.com/gpt4) | 63.16 | 69.19 | 70.26 | 73.16 | 63.47 | 68.9 |
|
46 |
+
| [ChatGPT](https://openai.com/chatgpt) | 44.8 | 53.61 | 54.22 | 59.95 | 49.74 | 53.22 |
|
47 |
+
| [BLOOMZ-7B](https://github.com/bigscience-workshop/xmtf) | 33.03 | 45.74 | 45.74 | 46.25 | 41.58 | 42.8 |
|
48 |
+
| [Falcon-40B](https://huggingface.co/tiiuae/falcon-40b) | 31.11 | 41.3 | 40.87 | 40.61 | 36.05 | 38.5 |
|
49 |
+
| [LLaMA-65B](https://github.com/facebookresearch/llama) | 31.09 | 34.45 | 36.05 | 37.94 | 32.89 | 34.88 |
|
50 |
+
| [Bactrian-LLaMA-13B](https://github.com/mbzuai-nlp/bactrian-x) | 26.46 | 29.36 | 31.81 | 31.55 | 29.17 | 30.06 |
|
51 |
| Chinese-oriented | | | | | | |
|
52 |
+
| [Zhuzhi-6B](https://github.com/emotibot-inc/Zhuzhi-6B) | 42.51 | 48.91 | 48.85 | 50.25 | 47.57 | 47.62 |
|
53 |
+
| [Zhuhai-13B](https://github.com/emotibot-inc/Zhuhai-13B) | 42.37 | 60.97 | 59.71 | 56.35 | 54.81 | 54.84 |
|
54 |
+
| [Baichuan-13B](https://github.com/baichuan-inc/Baichuan-13B) | 42.04 | 60.49 | 59.55 | 56.6 | 55.72 | 54.63 |
|
55 |
+
| [ChatGLM2-6B](https://huggingface.co/THUDM/chatglm2-6b) | 41.28 | 52.85 | 53.37 | 52.24 | 50.58 | 49.95 |
|
56 |
+
| [Baichuan-7B](https://github.com/baichuan-inc/baichuan-7B) | 32.79 | 44.43 | 46.78 | 44.79 | 43.11 | 42.33 |
|
57 |
+
| [ChatGLM-6B](https://github.com/THUDM/GLM-130B) | 32.22 | 42.91 | 44.81 | 42.6 | 41.93 | 40.79 |
|
58 |
+
| [BatGPT-15B](https://github.com/haonan-li/CMMLU/blob/master) | 33.72 | 36.53 | 38.07 | 46.94 | 38.32 | 38.51 |
|
59 |
+
| [Chinese-LLaMA-13B](https://github.com/ymcui/Chinese-LLaMA-Alpaca) | 26.76 | 26.57 | 27.42 | 28.33 | 26.73 | 27.34 |
|
60 |
+
| [MOSS-SFT-16B](https://github.com/OpenLMLab/MOSS) | 25.68 | 26.35 | 27.21 | 27.92 | 26.7 | 26.88 |
|
61 |
+
| [Chinese-GLM-10B](https://github.com/THUDM/GLM) | 25.57 | 25.01 | 26.33 | 25.94 | 25.81 | 25.8 |
|
62 |
| Random | 25 | 25 | 25 | 25 | 25 | 25 |
|
63 |
|
64 |
# **推理对话**
|