lbourdois commited on
Commit
f9fbb34
·
verified ·
1 Parent(s): 9259cb4

Improve language tag

Browse files

Hi! As the model is multilingual, this is a PR to add other languages than English to the language tag to improve the referencing. Note that 29 languages are announced in the README, but only 13 are explicitly listed. I was therefore only able to add these 13 languages.

Files changed (1) hide show
  1. README.md +185 -171
README.md CHANGED
@@ -1,172 +1,186 @@
1
- ---
2
- library_name: peft
3
- license: apache-2.0
4
- base_model: Qwen/Qwen2.5-7B
5
- tags:
6
- - axolotl
7
- - generated_from_trainer
8
- model-index:
9
- - name: c504561c-9cd9-4a50-8f9c-b0b80246538c
10
- results: []
11
- ---
12
-
13
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
- should probably proofread and complete it, then remove this comment. -->
15
-
16
- [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
17
- <details><summary>See axolotl config</summary>
18
-
19
- axolotl version: `0.4.1`
20
- ```yaml
21
- adapter: lora
22
- base_model: Qwen/Qwen2.5-7B
23
- bf16: auto
24
- chat_template: llama3
25
- cosine_min_lr_ratio: 0.1
26
- data_processes: 4
27
- dataset_prepared_path: null
28
- datasets:
29
- - data_files:
30
- - a991a262c7e64ac2_train_data.json
31
- ds_type: json
32
- format: custom
33
- num_proc: 4
34
- path: /workspace/input_data/a991a262c7e64ac2_train_data.json
35
- streaming: true
36
- type:
37
- field_input: premise
38
- field_instruction: question
39
- field_output: choice1
40
- format: '{instruction} {input}'
41
- no_input_format: '{instruction}'
42
- system_format: '{system}'
43
- system_prompt: ''
44
- debug: null
45
- deepspeed: null
46
- device_map: balanced
47
- do_eval: true
48
- early_stopping_patience: 1
49
- eval_batch_size: 1
50
- eval_sample_packing: false
51
- eval_steps: 25
52
- evaluation_strategy: steps
53
- flash_attention: false
54
- fp16: null
55
- fsdp: null
56
- fsdp_config: null
57
- gradient_accumulation_steps: 16
58
- gradient_checkpointing: true
59
- group_by_length: true
60
- hub_model_id: eeeebbb2/c504561c-9cd9-4a50-8f9c-b0b80246538c
61
- hub_strategy: checkpoint
62
- hub_token: null
63
- learning_rate: 0.0001
64
- load_in_4bit: false
65
- load_in_8bit: false
66
- local_rank: null
67
- logging_steps: 1
68
- lora_alpha: 64
69
- lora_dropout: 0.05
70
- lora_fan_in_fan_out: null
71
- lora_model_dir: null
72
- lora_r: 32
73
- lora_target_linear: true
74
- lora_target_modules:
75
- - q_proj
76
- - v_proj
77
- lr_scheduler: cosine
78
- max_grad_norm: 1.0
79
- max_memory:
80
- 0: 75GB
81
- 1: 75GB
82
- 2: 75GB
83
- 3: 75GB
84
- max_steps: 50
85
- micro_batch_size: 2
86
- mixed_precision: bf16
87
- mlflow_experiment_name: /tmp/a991a262c7e64ac2_train_data.json
88
- model_type: AutoModelForCausalLM
89
- num_epochs: 3
90
- optim_args:
91
- adam_beta1: 0.9
92
- adam_beta2: 0.95
93
- adam_epsilon: 1e-5
94
- optimizer: adamw_torch
95
- output_dir: miner_id_24
96
- pad_to_sequence_len: true
97
- resume_from_checkpoint: null
98
- s2_attention: null
99
- sample_packing: false
100
- save_steps: 25
101
- save_strategy: steps
102
- sequence_len: 2048
103
- strict: false
104
- tf32: false
105
- tokenizer_type: AutoTokenizer
106
- torch_compile: false
107
- train_on_inputs: false
108
- trust_remote_code: true
109
- val_set_size: 50
110
- wandb_entity: null
111
- wandb_mode: online
112
- wandb_name: c504561c-9cd9-4a50-8f9c-b0b80246538c
113
- wandb_project: Public_TuningSN
114
- wandb_runid: c504561c-9cd9-4a50-8f9c-b0b80246538c
115
- warmup_ratio: 0.04
116
- weight_decay: 0.01
117
- xformers_attention: null
118
-
119
- ```
120
-
121
- </details><br>
122
-
123
- # c504561c-9cd9-4a50-8f9c-b0b80246538c
124
-
125
- This model is a fine-tuned version of [Qwen/Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B) on the None dataset.
126
- It achieves the following results on the evaluation set:
127
- - Loss: 4.5850
128
-
129
- ## Model description
130
-
131
- More information needed
132
-
133
- ## Intended uses & limitations
134
-
135
- More information needed
136
-
137
- ## Training and evaluation data
138
-
139
- More information needed
140
-
141
- ## Training procedure
142
-
143
- ### Training hyperparameters
144
-
145
- The following hyperparameters were used during training:
146
- - learning_rate: 0.0001
147
- - train_batch_size: 2
148
- - eval_batch_size: 1
149
- - seed: 42
150
- - distributed_type: multi-GPU
151
- - num_devices: 4
152
- - gradient_accumulation_steps: 16
153
- - total_train_batch_size: 128
154
- - total_eval_batch_size: 4
155
- - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=adam_beta1=0.9,adam_beta2=0.95,adam_epsilon=1e-5
156
- - lr_scheduler_type: cosine
157
- - training_steps: 12
158
-
159
- ### Training results
160
-
161
- | Training Loss | Epoch | Step | Validation Loss |
162
- |:-------------:|:------:|:----:|:---------------:|
163
- | 4.2511 | 0.2581 | 1 | 4.5850 |
164
-
165
-
166
- ### Framework versions
167
-
168
- - PEFT 0.13.2
169
- - Transformers 4.46.0
170
- - Pytorch 2.5.0+cu124
171
- - Datasets 3.0.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
172
  - Tokenizers 0.20.1
 
1
+ ---
2
+ library_name: peft
3
+ license: apache-2.0
4
+ base_model: Qwen/Qwen2.5-7B
5
+ tags:
6
+ - axolotl
7
+ - generated_from_trainer
8
+ language:
9
+ - zho
10
+ - eng
11
+ - fra
12
+ - spa
13
+ - por
14
+ - deu
15
+ - ita
16
+ - rus
17
+ - jpn
18
+ - kor
19
+ - vie
20
+ - tha
21
+ - ara
22
+ model-index:
23
+ - name: c504561c-9cd9-4a50-8f9c-b0b80246538c
24
+ results: []
25
+ ---
26
+
27
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
28
+ should probably proofread and complete it, then remove this comment. -->
29
+
30
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
31
+ <details><summary>See axolotl config</summary>
32
+
33
+ axolotl version: `0.4.1`
34
+ ```yaml
35
+ adapter: lora
36
+ base_model: Qwen/Qwen2.5-7B
37
+ bf16: auto
38
+ chat_template: llama3
39
+ cosine_min_lr_ratio: 0.1
40
+ data_processes: 4
41
+ dataset_prepared_path: null
42
+ datasets:
43
+ - data_files:
44
+ - a991a262c7e64ac2_train_data.json
45
+ ds_type: json
46
+ format: custom
47
+ num_proc: 4
48
+ path: /workspace/input_data/a991a262c7e64ac2_train_data.json
49
+ streaming: true
50
+ type:
51
+ field_input: premise
52
+ field_instruction: question
53
+ field_output: choice1
54
+ format: '{instruction} {input}'
55
+ no_input_format: '{instruction}'
56
+ system_format: '{system}'
57
+ system_prompt: ''
58
+ debug: null
59
+ deepspeed: null
60
+ device_map: balanced
61
+ do_eval: true
62
+ early_stopping_patience: 1
63
+ eval_batch_size: 1
64
+ eval_sample_packing: false
65
+ eval_steps: 25
66
+ evaluation_strategy: steps
67
+ flash_attention: false
68
+ fp16: null
69
+ fsdp: null
70
+ fsdp_config: null
71
+ gradient_accumulation_steps: 16
72
+ gradient_checkpointing: true
73
+ group_by_length: true
74
+ hub_model_id: eeeebbb2/c504561c-9cd9-4a50-8f9c-b0b80246538c
75
+ hub_strategy: checkpoint
76
+ hub_token: null
77
+ learning_rate: 0.0001
78
+ load_in_4bit: false
79
+ load_in_8bit: false
80
+ local_rank: null
81
+ logging_steps: 1
82
+ lora_alpha: 64
83
+ lora_dropout: 0.05
84
+ lora_fan_in_fan_out: null
85
+ lora_model_dir: null
86
+ lora_r: 32
87
+ lora_target_linear: true
88
+ lora_target_modules:
89
+ - q_proj
90
+ - v_proj
91
+ lr_scheduler: cosine
92
+ max_grad_norm: 1.0
93
+ max_memory:
94
+ 0: 75GB
95
+ 1: 75GB
96
+ 2: 75GB
97
+ 3: 75GB
98
+ max_steps: 50
99
+ micro_batch_size: 2
100
+ mixed_precision: bf16
101
+ mlflow_experiment_name: /tmp/a991a262c7e64ac2_train_data.json
102
+ model_type: AutoModelForCausalLM
103
+ num_epochs: 3
104
+ optim_args:
105
+ adam_beta1: 0.9
106
+ adam_beta2: 0.95
107
+ adam_epsilon: 1e-5
108
+ optimizer: adamw_torch
109
+ output_dir: miner_id_24
110
+ pad_to_sequence_len: true
111
+ resume_from_checkpoint: null
112
+ s2_attention: null
113
+ sample_packing: false
114
+ save_steps: 25
115
+ save_strategy: steps
116
+ sequence_len: 2048
117
+ strict: false
118
+ tf32: false
119
+ tokenizer_type: AutoTokenizer
120
+ torch_compile: false
121
+ train_on_inputs: false
122
+ trust_remote_code: true
123
+ val_set_size: 50
124
+ wandb_entity: null
125
+ wandb_mode: online
126
+ wandb_name: c504561c-9cd9-4a50-8f9c-b0b80246538c
127
+ wandb_project: Public_TuningSN
128
+ wandb_runid: c504561c-9cd9-4a50-8f9c-b0b80246538c
129
+ warmup_ratio: 0.04
130
+ weight_decay: 0.01
131
+ xformers_attention: null
132
+
133
+ ```
134
+
135
+ </details><br>
136
+
137
+ # c504561c-9cd9-4a50-8f9c-b0b80246538c
138
+
139
+ This model is a fine-tuned version of [Qwen/Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B) on the None dataset.
140
+ It achieves the following results on the evaluation set:
141
+ - Loss: 4.5850
142
+
143
+ ## Model description
144
+
145
+ More information needed
146
+
147
+ ## Intended uses & limitations
148
+
149
+ More information needed
150
+
151
+ ## Training and evaluation data
152
+
153
+ More information needed
154
+
155
+ ## Training procedure
156
+
157
+ ### Training hyperparameters
158
+
159
+ The following hyperparameters were used during training:
160
+ - learning_rate: 0.0001
161
+ - train_batch_size: 2
162
+ - eval_batch_size: 1
163
+ - seed: 42
164
+ - distributed_type: multi-GPU
165
+ - num_devices: 4
166
+ - gradient_accumulation_steps: 16
167
+ - total_train_batch_size: 128
168
+ - total_eval_batch_size: 4
169
+ - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=adam_beta1=0.9,adam_beta2=0.95,adam_epsilon=1e-5
170
+ - lr_scheduler_type: cosine
171
+ - training_steps: 12
172
+
173
+ ### Training results
174
+
175
+ | Training Loss | Epoch | Step | Validation Loss |
176
+ |:-------------:|:------:|:----:|:---------------:|
177
+ | 4.2511 | 0.2581 | 1 | 4.5850 |
178
+
179
+
180
+ ### Framework versions
181
+
182
+ - PEFT 0.13.2
183
+ - Transformers 4.46.0
184
+ - Pytorch 2.5.0+cu124
185
+ - Datasets 3.0.1
186
  - Tokenizers 0.20.1