DPO Fine-Tuned Adapter - LLM Judge Dataset
π§ Model
- Base:
meta-llama/Llama-3.2-1B-Instruct
- Fine-tuned using TRL's
DPOTrainer
with the LLM Judge preference dataset (50 pairs)
βοΈ Training Parameters
Parameter | Value |
---|---|
Learning Rate | 5e-5 |
Batch Size | 4 |
Epochs | 3 |
Beta (DPO regularizer) | 0.1 |
Max Input Length | 1024 tokens |
Max Prompt Length | 512 tokens |
Padding Token | eos_token |
π¦ Dataset
- Source:
llm_judge_preferences.csv
- Size: 50 human-labeled pairs with
prompt
,chosen
, andrejected
columns
π Output
- Adapter saved and uploaded as
Likhith003/dpo-llmjudge-lora-adapter
- Downloads last month
- 16
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support