--- license: other library_name: peft tags: - llama-factory - lora - generated_from_trainer base_model: Qwen/Qwen2.5-32B-Instruct model-index: - name: Qwen2.5-32B-simpo-LoRA results: [] language: - zho - eng - fra - spa - por - deu - ita - rus - jpn - kor - vie - tha - ara datasets: - IlyaGusev/saiga_preferences - 40umov/dostoevsky - Vikhrmodels/gutenpromax --- # radm_Qwen2.5-32B-simpo-LoRA This model is a fine-tuned version of [../models/Qwen2.5-32B-Instruct](https://huggingface.co/../models/Qwen2.5-32B-Instruct) on the custom dataset. Full model (FP8): [radm/Qwen2.5-32B-simpo-FP8](https://huggingface.co/radm/Qwen2.5-32B-simpo-FP8) ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 7e-07 - train_batch_size: 1 - eval_batch_size: 8 - seed: 42 - gradient_accumulation_steps: 32 - total_train_batch_size: 32 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 16 - num_epochs: 1.0 ### Training results ![image/png](https://huggingface.co/radm/Qwen2.5-32B-simpo-LoRA/resolve/main/training_rewards_accuracies.png) ![image/png](https://huggingface.co/radm/Qwen2.5-32B-simpo-LoRA/resolve/main/training_loss.png) ### Framework versions - PEFT 0.11.1 - Transformers 4.43.4 - Pytorch 2.4.0+cu121 - Datasets 2.19.1 - Tokenizers 0.19.1