SELM-Llama
Collection
4 items
•
Updated
Self-Exploring Language Models: Active Preference Elicitation for Online Alignment.
This model is a fine-tuned version of ZhangShenao/SELM-Llama-3-8B-Instruct-iter-1 using synthetic data based on on the HuggingFaceH4/ultrafeedback_binarized dataset.
AlpacaEval 2.0 (LC WR) | MT-Bench (Average) | |
---|---|---|
SELM-Llama-3-8B-Instruct-iter-3 |        33.47 |       8.29 |
SELM-Llama-3-8B-Instruct-iter-2 |        35.65 |       8.09 |
SELM-Llama-3-8B-Instruct-iter-1 |        32.02 |       7.92 |
Meta-Llama-3-8B-Instruct |        24.31 |       7.93 |
The following hyperparameters were used during training:
Base model
meta-llama/Meta-Llama-3-8B-Instruct