ranarag commited on
Commit
53d028a
·
verified ·
1 Parent(s): 0276d99

Added thinking ablation evaluation results

Browse files
Files changed (1) hide show
  1. README.md +48 -4
README.md CHANGED
@@ -191,7 +191,7 @@ So, you need to add 10 liters of a 70% acid solution to the 10 liters of a 30% a
191
 
192
  **Evaluation Results:**
193
  <table>
194
-
195
  <thead>
196
  <tr>
197
  <th style="text-align:left; background-color: #001d6c; color: white;">Models</th>
@@ -309,7 +309,7 @@ So, you need to add 10 liters of a 70% acid solution to the 10 liters of a 30% a
309
 
310
  <tr>
311
  <td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.2-2B-Instruct</td>
312
- <td style="text-align:center; background-color: #DAE8FF; color: black;">24.86</td>
313
  <td style="text-align:center; background-color: #DAE8FF; color: black;">34.51</td>
314
  <td style="text-align:center; background-color: #DAE8FF; color: black;">57.18</td>
315
  <td style="text-align:center; background-color: #DAE8FF; color: black;">20.56</td>
@@ -340,10 +340,54 @@ So, you need to add 10 liters of a 70% acid solution to the 10 liters of a 30% a
340
 
341
  </tr>
342
 
343
-
344
-
345
  </tbody></table>
346
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
347
  **Training Data:**
348
  Overall, our training data is largely comprised of two key sources: (1) publicly available datasets with permissive license, (2) internal synthetically generated data targeted to enhance reasoning capabilites.
349
  <!-- A detailed attribution of datasets can be found in [Granite 3.2 Technical Report (coming soon)](#), and [Accompanying Author List](https://github.com/ibm-granite/granite-3.0-language-models/blob/main/author-ack.pdf). -->
 
191
 
192
  **Evaluation Results:**
193
  <table>
194
+ <caption><b> Comparison with Other Models</b></caption>
195
  <thead>
196
  <tr>
197
  <th style="text-align:left; background-color: #001d6c; color: white;">Models</th>
 
309
 
310
  <tr>
311
  <td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.2-2B-Instruct</td>
312
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">26.6</td>
313
  <td style="text-align:center; background-color: #DAE8FF; color: black;">34.51</td>
314
  <td style="text-align:center; background-color: #DAE8FF; color: black;">57.18</td>
315
  <td style="text-align:center; background-color: #DAE8FF; color: black;">20.56</td>
 
340
 
341
  </tr>
342
 
 
 
343
  </tbody></table>
344
 
345
+ <table>
346
+ <caption><b>Thinking Ablation</b></caption>
347
+ <thead>
348
+ <tr>
349
+ <th rowspan="2" style="text-align:left; background-color: #001d6c; color: white;">Models</th>
350
+ <th colspan="2" style="text-align:center; background-color: #001d6c; color: white;">Thinking=False</th>
351
+ <th colspan="2" style="text-align:center; background-color: #001d6c; color: white;">Thinking=True</th>
352
+ </tr>
353
+ <tr>
354
+ <th style="text-align:center; background-color: #001d6c; color: white;">ArenaHard</th>
355
+ <th style="text-align:center; background-color: #001d6c; color: white;">Alpaca-Eval-2</th>
356
+ <th style="text-align:center; background-color: #001d6c; color: white;">ArenaHard</th>
357
+ <th style="text-align:center; background-color: #001d6c; color: white;">Alpaca-Eval-2</th>
358
+ </tr></thead>
359
+ <tbody>
360
+ <tr>
361
+ <td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.1-8B-Instruct</td>
362
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">37.58</td>
363
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">30.34</td>
364
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">-</td>
365
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">-</td>
366
+ </tr>
367
+ <tr>
368
+ <td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.1-2B-Instruct</td>
369
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">23.3</td>
370
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">27.17</td>
371
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">-</td>
372
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">-</td>
373
+ </tr>
374
+ <tr>
375
+ <td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.2-2B-Instruct</td>
376
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">30.42</td>
377
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">31.65</td>
378
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">26.6</td>
379
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">34.51</td>
380
+ </tr>
381
+ <tr>
382
+ <td style="text-align:left; background-color: #DAE8FF; color: black;"><b>Granite-3.2-8B-Instruct</b></td>
383
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">40.54</td>
384
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">36.89</td>
385
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">55.25</td>
386
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">61.19</td>
387
+ </tr>
388
+ </tbody>
389
+ </table>
390
+
391
  **Training Data:**
392
  Overall, our training data is largely comprised of two key sources: (1) publicly available datasets with permissive license, (2) internal synthetically generated data targeted to enhance reasoning capabilites.
393
  <!-- A detailed attribution of datasets can be found in [Granite 3.2 Technical Report (coming soon)](#), and [Accompanying Author List](https://github.com/ibm-granite/granite-3.0-language-models/blob/main/author-ack.pdf). -->