HPAI-BSC
/

Meta-Llama-3.1-8B-Instruct-Egida-DPO

Model card Files Files and versions Community

danihinjos commited on 17 days ago

Commit

f408278

·

verified ·

1 Parent(s): b6bd0c6

Update README.md

Files changed (1) hide show

README.md +11 -2

README.md CHANGED Viewed

@@ -23,14 +23,23 @@ dataset for this model. This results in a DPO dataset composed by triplets < ”
 |                              | Egida (test) ↓ | DELPHI ↓ | Alert-Base ↓ | Alert-Adv ↓ |
 |------------------------------|:--------------:|:--------:|:------------:|:-----------:|
 | Meta-Llama-3.1-8B-Instruct   |     0.347      |  0.160   |    0.446     |    0.039    |
-| Meta-Llama-3.1-8B-Egida-DPO  |     0.038      |  0.025   |    0.038     |    0.014    |
 ### General Purpose Performance
 |                              | OpenLLM Leaderboard (Average) ↑ | MMLU Generative (ROUGE1) ↑ |
 |------------------------------|:---------------------:|:---------------:|
 | Meta-Llama-3.1-8B-Instruct   |         0.453         |      0.646      |
-| Meta-Llama-3.1-8B-Egida-DPO  |         0.453         |      0.643      |
 ## Training Details

 |                              | Egida (test) ↓ | DELPHI ↓ | Alert-Base ↓ | Alert-Adv ↓ |
 |------------------------------|:--------------:|:--------:|:------------:|:-----------:|
 | Meta-Llama-3.1-8B-Instruct   |     0.347      |  0.160   |    0.446     |    0.039    |
+| Meta-Llama-3.1-8B-Instruct-Egida-DPO  |     0.038      |  0.025   |    0.038     |    0.014    |
 ### General Purpose Performance
 |                              | OpenLLM Leaderboard (Average) ↑ | MMLU Generative (ROUGE1) ↑ |
 |------------------------------|:---------------------:|:---------------:|
 | Meta-Llama-3.1-8B-Instruct   |         0.453         |      0.646      |
+| Meta-Llama-3.1-8B-Instruct-Egida-DPO  |         0.453         |      0.643      |
+### Refusal Ratio
+|                              | OR Bench 80K (refusal) ↓ | OR Bench Hard (refusal) ↓ |
+|------------------------------|:---------------------:|:---------------:|
+| Meta-Llama-3.1-8B-Instruct         |          0.035           |           0.324           |
+| Meta-Llama-3.1-8B-Instruct-Egida-DPO        |          0.037           |           0.319           |
+Note that this refusal ratio is computed as keyword matching with a curated list of kewords. For more information, check the paper.
 ## Training Details