Update README.md
Browse files
README.md
CHANGED
@@ -94,13 +94,16 @@ The OLMo-2 models have limited safety training, but are not deployed automatical
|
|
94 |
## Performance
|
95 |
|
96 |
| Model | Average | AlpacaEval 2 LC | BBH | DROP | GSM8K | IFEval | MATH | MMLU | Safety | PopQA | TruthQA |
|
97 |
-
|
98 |
-
|
|
99 |
-
|
|
100 |
-
|
|
101 |
-
|
|
102 |
-
|
|
103 |
-
| OLMo 2 1B |
|
|
|
|
|
|
|
104 |
|
105 |
|
106 |
|
|
|
94 |
## Performance
|
95 |
|
96 |
| Model | Average | AlpacaEval 2 LC | BBH | DROP | GSM8K | IFEval | MATH | MMLU | Safety | PopQA | TruthQA |
|
97 |
+
| **OLMo 1B 0724** | 24.4 | 2.4 | 29.9 | 27.9 | 10.8 | 25.3 | 2.2 | 36.6 | 52.0 | 12.1 | 44.3 |
|
98 |
+
| **SmolLM2 1.7B** | 34.2 | 5.8 | 39.8 | 30.9 | 45.3 | 51.6 | 20.3 | 34.3 | 52.4 | 16.4 | 45.3 |
|
99 |
+
| **Gemma 3 1B** | 38.3 | 20.4 | 39.4 | 25.1 | 35.0 | 60.6 | 40.3 | 38.9 | 70.2 | 9.6 | 43.8 |
|
100 |
+
| **Llama 3.1 1B** | 39.3 | 10.1 | 40.2 | 32.2 | 45.4 | 54.0 | 21.6 | 46.7 | 87.2 | 13.8 | 41.5 |
|
101 |
+
| **Qwen 2.5 1.5B** | 41.7 | 7.4 | 45.8 | 13.4 | 66.2 | 44.2 | 40.6 | 59.7 | 77.6 | 15.5 | 46.5 |
|
102 |
+
| **---** | | | | | | | | | | | |
|
103 |
+
| **OLMo 2 1B SFT** | 36.9 | 2.4 | 32.8 | 33.8 | 52.1 | 50.5 | 13.2 | 36.4 | 93.2 | 12.7 | 42.1 |
|
104 |
+
| **OLMo 2 1B DPO** | 40.6 | 9.5 | 33.0 | 34.5 | 59.0 | 67.1 | 14.1 | 39.9 | 89.9 | 12.3 | 46.4 |
|
105 |
+
| **OLMo 2 1B** | 42.7 | 9.1 | 35.0 | 34.6 | 68.3 | 70.1 | 20.7 | 40.0 | 87.6 | 12.9 | 48.7 |
|
106 |
+
|
107 |
|
108 |
|
109 |
|