File size: 6,055 Bytes
5acd714
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
---
library_name: transformers
license: other
base_model: llava-hf/llava-v1.6-mistral-7b-hf
tags:
- llama-factory
- full
- generated_from_trainer
model-index:
- name: RLAIF-V-Dataset
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# RLAIF-V-Dataset

This model is a fine-tuned version of [llava-hf/llava-v1.6-mistral-7b-hf](https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf) on the RLAIF-V-Dataset dataset.
It achieves the following results on the evaluation set:
- Loss: 0.4513
- Rewards/chosen: -3.2808
- Rewards/rejected: -6.0928
- Rewards/accuracies: 0.8212
- Rewards/margins: 2.8121
- Logps/rejected: -219.8085
- Logps/chosen: -191.2850
- Logits/rejected: -2.2605
- Logits/chosen: -2.2964

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 4
- total_train_batch_size: 256
- total_eval_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 10
- num_epochs: 3.0

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.5989        | 0.1368 | 40   | 0.6069          | -0.3887        | -0.8615          | 0.6365             | 0.4728          | -167.4954      | -162.3644    | -2.4012         | -2.4102       |
| 0.5452        | 0.2735 | 80   | 0.5331          | -0.8812        | -1.8338          | 0.7135             | 0.9526          | -177.2182      | -167.2896    | -2.5177         | -2.5334       |
| 0.5026        | 0.4103 | 120  | 0.4925          | -1.4411        | -2.6703          | 0.7442             | 1.2292          | -185.5836      | -172.8887    | -1.9765         | -2.0268       |
| 0.4511        | 0.5470 | 160  | 0.4683          | -1.3283        | -3.0284          | 0.7625             | 1.7001          | -189.1644      | -171.7603    | -2.0280         | -2.0709       |
| 0.4562        | 0.6838 | 200  | 0.4528          | -1.4943        | -3.2675          | 0.7567             | 1.7732          | -191.5553      | -173.4200    | -2.1029         | -2.1462       |
| 0.4189        | 0.8205 | 240  | 0.4494          | -1.9309        | -3.8899          | 0.7663             | 1.9589          | -197.7792      | -177.7867    | -2.4165         | -2.4472       |
| 0.4484        | 0.9573 | 280  | 0.4432          | -1.7397        | -3.8238          | 0.7635             | 2.0841          | -197.1187      | -175.8746    | -2.1586         | -2.2000       |
| 0.222         | 1.0940 | 320  | 0.4504          | -1.2207        | -2.9698          | 0.7760             | 1.7491          | -188.5780      | -170.6839    | -2.4060         | -2.4397       |
| 0.2018        | 1.2308 | 360  | 0.4438          | -2.0855        | -4.4746          | 0.7885             | 2.3891          | -203.6262      | -179.3325    | -2.3445         | -2.3790       |
| 0.2017        | 1.3675 | 400  | 0.4350          | -1.9109        | -4.1414          | 0.7981             | 2.2305          | -200.2943      | -177.5862    | -2.3022         | -2.3351       |
| 0.1999        | 1.5043 | 440  | 0.4288          | -2.1056        | -4.4641          | 0.8048             | 2.3585          | -203.5214      | -179.5331    | -2.1361         | -2.1716       |
| 0.1837        | 1.6410 | 480  | 0.4262          | -2.2318        | -4.7056          | 0.8125             | 2.4738          | -205.9359      | -180.7949    | -2.2127         | -2.2452       |
| 0.1942        | 1.7778 | 520  | 0.4163          | -2.3806        | -5.0283          | 0.8115             | 2.6478          | -209.1637      | -182.2829    | -2.3333         | -2.3675       |
| 0.1821        | 1.9145 | 560  | 0.4165          | -2.2038        | -4.6709          | 0.8173             | 2.4671          | -205.5893      | -180.5155    | -2.3238         | -2.3543       |
| 0.0858        | 2.0513 | 600  | 0.4415          | -2.7029        | -5.1979          | 0.8144             | 2.4950          | -210.8597      | -185.5066    | -2.2872         | -2.3220       |
| 0.0832        | 2.1880 | 640  | 0.4414          | -2.8951        | -5.6554          | 0.8173             | 2.7603          | -215.4344      | -187.4282    | -2.2892         | -2.3247       |
| 0.0817        | 2.3248 | 680  | 0.4521          | -3.2403        | -6.0014          | 0.8154             | 2.7611          | -218.8945      | -190.8804    | -2.2697         | -2.3056       |
| 0.0858        | 2.4615 | 720  | 0.4479          | -3.3847        | -6.3012          | 0.8221             | 2.9165          | -221.8926      | -192.3248    | -2.2708         | -2.3072       |
| 0.0723        | 2.5983 | 760  | 0.4574          | -3.3436        | -6.1113          | 0.8173             | 2.7677          | -219.9932      | -191.9133    | -2.2754         | -2.3103       |
| 0.0717        | 2.7350 | 800  | 0.4532          | -3.3171        | -6.1289          | 0.8192             | 2.8118          | -220.1688      | -191.6483    | -2.2610         | -2.2973       |
| 0.0691        | 2.8718 | 840  | 0.4514          | -3.2739        | -6.0855          | 0.8212             | 2.8116          | -219.7354      | -191.2166    | -2.2604         | -2.2964       |


### Framework versions

- Transformers 4.45.2
- Pytorch 2.4.0+cu121
- Datasets 2.21.0
- Tokenizers 0.20.3