htlou
/

mm-interp-AA_preference_cocour_new_step10_0_100

Image-Text-to-Text

Generated from Trainer

text-generation-inference

Model card Files Files and versions Community

mm-interp-AA_preference_cocour_new_step10_0_100 / README.md

htlou's picture

Upload folder using huggingface_hub

8ce541d verified 4 months ago

|

history blame contribute delete

3.42 kB

	---
	library_name: transformers
	license: other
	base_model: llava-hf/llava-v1.6-mistral-7b-hf
	tags:
	- llama-factory
	- full
	- generated_from_trainer
	model-index:
	- name: AA_preference_cocour_new_step10_0_100
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# AA_preference_cocour_new_step10_0_100

	This model is a fine-tuned version of [llava-hf/llava-v1.6-mistral-7b-hf](https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf) on the AA_preference_cocour_new_step10_0_100 dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.4957
	- Rewards/chosen: -0.4320
	- Rewards/rejected: -3.0552
	- Rewards/accuracies: 0.7917
	- Rewards/margins: 2.6232
	- Logps/rejected: -248.9210
	- Logps/chosen: -252.8571
	- Logits/rejected: -2.2740
	- Logits/chosen: -2.3049

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-06
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 8
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 256
	- total_eval_batch_size: 64
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 10
	- num_epochs: 3.0

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 0.577 \| 0.3738 \| 50 \| 0.5782 \| 0.7960 \| -0.1800 \| 0.7250 \| 0.9759 \| -220.1687 \| -240.5771 \| -2.1546 \| -2.1689 \|
	\| 0.5388 \| 0.7477 \| 100 \| 0.5391 \| -0.4398 \| -2.0133 \| 0.7479 \| 1.5735 \| -238.5014 \| -252.9343 \| -2.1740 \| -2.1991 \|
	\| 0.2653 \| 1.1215 \| 150 \| 0.5247 \| 0.2862 \| -1.6846 \| 0.7646 \| 1.9708 \| -235.2147 \| -245.6745 \| -2.3266 \| -2.3485 \|
	\| 0.2571 \| 1.4953 \| 200 \| 0.5108 \| -0.5979 \| -3.0808 \| 0.7792 \| 2.4828 \| -249.1766 \| -254.5160 \| -2.4752 \| -2.5016 \|
	\| 0.2803 \| 1.8692 \| 250 \| 0.4817 \| -0.2909 \| -2.6866 \| 0.7854 \| 2.3957 \| -245.2348 \| -251.4460 \| -2.3853 \| -2.4107 \|
	\| 0.1739 \| 2.2430 \| 300 \| 0.4912 \| -0.3815 \| -2.8477 \| 0.7917 \| 2.4662 \| -246.8459 \| -252.3520 \| -2.3281 \| -2.3560 \|
	\| 0.1631 \| 2.6168 \| 350 \| 0.4965 \| -0.4101 \| -3.0083 \| 0.7896 \| 2.5982 \| -248.4518 \| -252.6378 \| -2.2784 \| -2.3092 \|


	### Framework versions

	- Transformers 4.45.2
	- Pytorch 2.4.0+cu121
	- Datasets 2.21.0
	- Tokenizers 0.20.3