English
jeiku commited on
Commit
844c969
·
verified ·
1 Parent(s): 612fd14

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +414 -0
README.md ADDED
@@ -0,0 +1,414 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - Mielikki/Erebus-87k
5
+ - FourOhFour/Instruct_Phase
6
+ - FourOhFour/RP_Phase
7
+ - anthracite-core/full-opus-chosen-hermes-rejected-kto-v1
8
+ language:
9
+ - en
10
+ base_model:
11
+ - IntervitensInc/Llama-3.1-Minitron-4B-Width-Base-chatml
12
+ ---
13
+ ---
14
+ ### These are EXL2 quants for Aura-4B, Measurement file in the main branch, Check revisions for different BPW
15
+ ---
16
+ ## Aura-4B
17
+
18
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/626dfb8786671a29c715f8a9/jT4LeWC0ioarPieWtNZkE.png)
19
+
20
+ ## Introduction
21
+
22
+ **Aura-4B** is a state of the art dedicated roleplaying model designed to fulfill your every desire.
23
+
24
+ This finetune has seen several hundreds of millions of tokens of completion, instruction and roleplaying data. A Kahneman-Tversky Optimization was applied to give this model a unique output style.
25
+
26
+ Developed by **Aura Industries**, with contributions from **Anthracite Org**
27
+
28
+ ## Model Details
29
+
30
+ - **Model Name**: Aura-4B
31
+ - **Base Model**: [IntervitensInc/Llama-3.1-Minitron-4B-Width-Base-chatml](https://huggingface.co/IntervitensInc/Llama-3.1-Minitron-4B-Width-Base-chatml)
32
+ - **Model Type**: Chat Completions
33
+ - **Prompt Format**: ChatML
34
+ - **License**: Apache-2.0
35
+ - **Language**: English
36
+ - **Max Context**: 8,192+ tokens
37
+
38
+ ## License
39
+
40
+ This model is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).
41
+
42
+ ## Quantizations
43
+
44
+ [Static GGUF](https://huggingface.co/mradermacher/Aura-4B-GGUF)
45
+
46
+ [Imatrix GGUF](https://huggingface.co/mradermacher/Aura-4B-i1-GGUF)
47
+
48
+ EXL2 coming soon...
49
+
50
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
51
+
52
+ Coming soon...
53
+
54
+ | Metric |Value|
55
+ |-------------------|----:|
56
+ |Avg. | N/A|
57
+ |IFEval (0-Shot) | N/A|
58
+ |BBH (3-Shot) | N/A|
59
+ |MATH Lvl 5 (4-Shot)| N/A|
60
+ |GPQA (0-shot) | N/A|
61
+ |MuSR (0-shot) | N/A|
62
+ |MMLU-PRO (5-shot) | N/A|
63
+
64
+ ## Training Configuration
65
+
66
+ <details><summary>Click here for Axolotl configs</summary>
67
+
68
+ Completion SFT
69
+
70
+ ```yaml
71
+ base_model: IntervitensInc/Llama-3.1-Minitron-4B-Width-Base-chatml
72
+ model_type: AutoModelForCausalLM
73
+ tokenizer_type: AutoTokenizer
74
+
75
+ load_in_8bit: false
76
+ load_in_4bit: false
77
+ strict: false
78
+
79
+ hub_model_id: jeiku/completion4B
80
+ hub_strategy: "all_checkpoints"
81
+ push_dataset_to_hub:
82
+ hf_use_auth_token: true
83
+
84
+ datasets:
85
+ - path: Mielikki/Erebus-87k
86
+ type: completion
87
+ field: body
88
+
89
+ shuffle_merged_datasets: true
90
+ val_set_size: 0.0025
91
+ output_dir: ./outputs/out
92
+
93
+ adapter:
94
+ lora_r:
95
+ lora_alpha:
96
+ lora_dropout:
97
+ lora_target_linear:
98
+
99
+ sequence_len: 8192
100
+ sample_packing: true
101
+ eval_sample_packing: false
102
+ pad_to_sequence_len: true
103
+
104
+ plugins:
105
+ - axolotl.integrations.liger.LigerPlugin
106
+ liger_rope: true
107
+ liger_rms_norm: true
108
+ liger_swiglu: true
109
+ liger_fused_linear_cross_entropy: true
110
+
111
+ wandb_project: EXP4B
112
+ wandb_entity:
113
+ wandb_watch:
114
+ wandb_name: EXP4B
115
+ wandb_log_model:
116
+
117
+ gradient_accumulation_steps: 12
118
+ micro_batch_size: 3
119
+ num_epochs: 1
120
+ optimizer: adamw_bnb_8bit
121
+ lr_scheduler: cosine
122
+ learning_rate: 0.00001
123
+ weight_decay: 0.05
124
+
125
+ train_on_inputs: false
126
+ group_by_length: false
127
+ bf16: auto
128
+ fp16:
129
+ tf32: true
130
+
131
+ gradient_checkpointing: true
132
+ early_stopping_patience:
133
+ resume_from_checkpoint:
134
+ local_rank:
135
+ logging_steps: 1
136
+ xformers_attention:
137
+ flash_attention: true
138
+
139
+ warmup_ratio: 0.1
140
+ evals_per_epoch: 4
141
+ eval_table_size:
142
+ eval_max_new_tokens: 128
143
+ saves_per_epoch: 1
144
+
145
+ debug:
146
+ deepspeed: deepspeed_configs/zero3_bf16.json
147
+ fsdp:
148
+ fsdp_config:
149
+
150
+ special_tokens:
151
+ pad_token: <|finetune_right_pad_id|>
152
+ ```
153
+
154
+ Instruct SFT
155
+
156
+ ```yaml
157
+ base_model: jeiku/completion4B
158
+ model_type: AutoModelForCausalLM
159
+ tokenizer_type: AutoTokenizer
160
+
161
+ load_in_8bit: false
162
+ load_in_4bit: false
163
+ strict: false
164
+
165
+ hub_model_id: jeiku/instructered4B
166
+ hub_strategy: "all_checkpoints"
167
+ push_dataset_to_hub:
168
+ hf_use_auth_token: true
169
+
170
+ datasets:
171
+ - path: FourOhFour/Instruct_Phase
172
+ type: sharegpt
173
+ conversation: chatml
174
+
175
+ chat_template: chatml
176
+
177
+ shuffle_merged_datasets: true
178
+ val_set_size: 0.0025
179
+ output_dir: ./outputs/out
180
+
181
+ adapter:
182
+ lora_r:
183
+ lora_alpha:
184
+ lora_dropout:
185
+ lora_target_linear:
186
+
187
+ sequence_len: 8192
188
+ sample_packing: true
189
+ eval_sample_packing: false
190
+ pad_to_sequence_len: true
191
+
192
+ plugins:
193
+ - axolotl.integrations.liger.LigerPlugin
194
+ liger_rope: true
195
+ liger_rms_norm: true
196
+ liger_swiglu: true
197
+ liger_fused_linear_cross_entropy: true
198
+
199
+ wandb_project: EXP4B
200
+ wandb_entity:
201
+ wandb_watch:
202
+ wandb_name: EXP4B
203
+ wandb_log_model:
204
+
205
+ gradient_accumulation_steps: 12
206
+ micro_batch_size: 3
207
+ num_epochs: 2
208
+ optimizer: adamw_bnb_8bit
209
+ lr_scheduler: cosine
210
+ learning_rate: 0.00001
211
+ weight_decay: 0.05
212
+
213
+ train_on_inputs: false
214
+ group_by_length: false
215
+ bf16: auto
216
+ fp16:
217
+ tf32: true
218
+
219
+ gradient_checkpointing: true
220
+ early_stopping_patience:
221
+ resume_from_checkpoint:
222
+ local_rank:
223
+ logging_steps: 1
224
+ xformers_attention:
225
+ flash_attention: true
226
+
227
+ warmup_ratio: 0.1
228
+ evals_per_epoch: 4
229
+ eval_table_size:
230
+ eval_max_new_tokens: 128
231
+ saves_per_epoch: 2
232
+
233
+ debug:
234
+ deepspeed: deepspeed_configs/zero3_bf16.json
235
+ fsdp:
236
+ fsdp_config:
237
+
238
+ special_tokens:
239
+ pad_token: <|finetune_right_pad_id|>
240
+ ```
241
+
242
+ Roleplaying SFT
243
+
244
+ ```yaml
245
+ base_model: jeiku/instructered4B
246
+ model_type: AutoModelForCausalLM
247
+ tokenizer_type: AutoTokenizer
248
+
249
+ load_in_8bit: false
250
+ load_in_4bit: false
251
+ strict: false
252
+
253
+ hub_model_id: jeiku/TheBest4B
254
+ hub_strategy: "all_checkpoints"
255
+ push_dataset_to_hub:
256
+ hf_use_auth_token: true
257
+
258
+ datasets:
259
+ - path: FourOhFour/RP_Phase
260
+ type: sharegpt
261
+ conversation: chatml
262
+
263
+ chat_template: chatml
264
+
265
+ shuffle_merged_datasets: true
266
+ val_set_size: 0.0025
267
+ output_dir: ./outputs/out
268
+
269
+ adapter:
270
+ lora_r:
271
+ lora_alpha:
272
+ lora_dropout:
273
+ lora_target_linear:
274
+
275
+ sequence_len: 8192
276
+ sample_packing: true
277
+ eval_sample_packing: false
278
+ pad_to_sequence_len: true
279
+
280
+ plugins:
281
+ - axolotl.integrations.liger.LigerPlugin
282
+ liger_rope: true
283
+ liger_rms_norm: true
284
+ liger_swiglu: true
285
+ liger_fused_linear_cross_entropy: true
286
+
287
+ wandb_project: EXP4B
288
+ wandb_entity:
289
+ wandb_watch:
290
+ wandb_name: EXP4B
291
+ wandb_log_model:
292
+
293
+ gradient_accumulation_steps: 12
294
+ micro_batch_size: 3
295
+ num_epochs: 2
296
+ optimizer: adamw_bnb_8bit
297
+ lr_scheduler: cosine
298
+ learning_rate: 0.00001
299
+ weight_decay: 0.05
300
+
301
+ train_on_inputs: false
302
+ group_by_length: false
303
+ bf16: auto
304
+ fp16:
305
+ tf32: true
306
+
307
+ gradient_checkpointing: true
308
+ early_stopping_patience:
309
+ resume_from_checkpoint:
310
+ local_rank:
311
+ logging_steps: 1
312
+ xformers_attention:
313
+ flash_attention: true
314
+
315
+ warmup_ratio: 0.1
316
+ evals_per_epoch: 4
317
+ eval_table_size:
318
+ eval_max_new_tokens: 128
319
+ saves_per_epoch: 2
320
+
321
+ debug:
322
+ deepspeed: deepspeed_configs/zero3_bf16.json
323
+ fsdp:
324
+ fsdp_config:
325
+
326
+ special_tokens:
327
+ pad_token: <|finetune_right_pad_id|>
328
+ ```
329
+
330
+ KTO
331
+
332
+ ```yaml
333
+ base_model: FourOhFour/Crispy_Crab_4B
334
+ model_type: AutoModelForCausalLM
335
+ tokenizer_type: AutoTokenizer
336
+
337
+ load_in_8bit: false
338
+ load_in_4bit: false
339
+ strict: false
340
+
341
+ hub_model_id: jeiku/aura4bkto
342
+ hub_strategy: "all_checkpoints"
343
+ push_dataset_to_hub:
344
+ hf_use_auth_token: true
345
+
346
+ chat_template: chatml
347
+
348
+ rl: kto
349
+ rl_beta: 0.2
350
+ kto_desirable_weight: 0.2
351
+
352
+ datasets:
353
+ - path: anthracite-core/full-opus-chosen-hermes-rejected-kto-v1
354
+ type: chatml.argilla
355
+
356
+ shuffle_merged_datasets: true
357
+ val_set_size: 0.0
358
+ output_dir: ./outputs/out
359
+
360
+ sequence_len: 8192
361
+ sample_packing: false
362
+ eval_sample_packing: false
363
+ pad_to_sequence_len: false
364
+
365
+ wandb_project: Aura-4B
366
+ wandb_entity:
367
+ wandb_watch:
368
+ wandb_name: Aura-4B
369
+ wandb_log_model:
370
+
371
+ gradient_accumulation_steps: 16
372
+ micro_batch_size: 2
373
+ num_epochs: 2
374
+ max_steps: 500
375
+
376
+ optimizer: adamw_8bit
377
+ lr_scheduler: cosine
378
+ learning_rate: 0.00001
379
+ weight_decay: 0.05
380
+
381
+ train_on_inputs: false
382
+ group_by_length: false
383
+ bf16: auto
384
+ fp16:
385
+ tf32: true
386
+
387
+ gradient_checkpointing: true
388
+ gradient_checkpointing_kwargs:
389
+ use_reentrant: true
390
+ remove_unused_columns: false
391
+ early_stopping_patience:
392
+ resume_from_checkpoint:
393
+ local_rank:
394
+ logging_steps: 1
395
+ xformers_attention:
396
+ flash_attention: true
397
+
398
+ warmup_steps: 10
399
+ evals_per_epoch: 2
400
+ eval_table_size:
401
+ eval_max_new_tokens:
402
+ saves_per_epoch: 1
403
+
404
+ debug:
405
+ deepspeed:
406
+ fsdp:
407
+ fsdp_config:
408
+ fsdp:
409
+ fsdp_config:
410
+
411
+ special_tokens:
412
+ pad_token: <|finetune_right_pad_id|>
413
+ ```
414
+ </details><br>