Built with Axolotl

See axolotl config

axolotl version: 0.8.0

base_model: Dans-DiscountModels/7b-m-dans-personalityengine-v1.2.1-rc-2
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer

trust_remote_code:

# wandb configuration
wandb_project: 7b-m-dans-optimizersweeps
wandb_watch:

wandb_run_id: repremover-1-1-ademamix-b1_0.9-b2_0.999-b3_0.999-a15
wandb_log_model:

# push checkpoints to hub
hub_model_id: Dans-DiscountModels/7b-m-dans-optimizersweeps-repremover-1-ademamix-b1_0.9-b2_0.999-b3_0.999-a15
# how to push checkpoints to hub
# https://huggingface.co/docs/transformers/v4.31.0/en/main_classes/trainer#transformers.TrainingArguments.hub_strategy
hub_strategy: "every_save"
# Whether to use hf `use_auth_token` for loading datasets. Useful for fetching private datasets
# Required to be true when used in combination with `push_dataset_to_hub`
hf_use_auth_token: true

# where to save the finished model to
output_dir: ./7b-m-dans-optimizersweeps

# where to save the dataset to
dataset_prepared_path: ./7b-m-dans-optimizersweeps-data

save_safetensors: true

# dataset settings (local or huggingface repo)
datasets:
  - path: Dans-DiscountModels/pretokenization-test-3
    ds_type: parquet
    type:

plugins:
  - axolotl.integrations.liger.LigerPlugin
  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
liger_rope: true
liger_rms_norm: true
liger_layer_norm: true
liger_glu_activation: true
liger_fused_linear_cross_entropy: false
cut_cross_entropy: true

load_in_8bit: false
load_in_4bit: false
strict: false

adapter:
lora_model_dir:

val_set_size: 0.01
sequence_len: 8192

sample_packing: false
eval_sample_packing: false

pad_to_sequence_len: true

gradient_checkpointing: true
# gradient_checkpointing_kwargs:
# use_reentrant: false

gradient_accumulation_steps: 1
micro_batch_size: 4

num_epochs: 3

optimizer: ademamix
optim_args: "beta1=0.9,beta2=0.999,beta3=0.999,alpha=15"

lr_scheduler: rex
learning_rate: 0.0000001
cosine_min_lr_ratio:

# weight_decay: 0.03
max_grad_norm: 0.001

train_on_inputs: false
group_by_length: true

bf16: true
fp16: false
tf32: false

early_stopping_patience:

resume_from_checkpoint:
auto_resume_from_checkpoints: false

local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_ratio: 0.1

evals_per_epoch: 24
eval_table_size:
eval_max_new_tokens:

saves_per_epoch: 8
save_total_limit: 2

debug: false

deepspeed: deepspeed_configs/zero3_bf16.json

fsdp:
fsdp_config:

special_tokens:

7b-m-dans-optimizersweeps-repremover-1-ademamix-b1_0.9-b2_0.999-b3_0.999-a15

This model is a fine-tuned version of Dans-DiscountModels/7b-m-dans-personalityengine-v1.2.1-rc-2 on the Dans-DiscountModels/pretokenization-test-3 dataset. It achieves the following results on the evaluation set:

  • Loss: 2.0850

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-07
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use ademamix and the args are: beta1=0.9,beta2=0.999,beta3=0.999,alpha=15
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 41
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss
2.0376 0.0072 1 2.1457
2.2662 0.0432 6 2.1210
2.3077 0.0863 12 2.1574
2.1864 0.1295 18 2.1194
2.2386 0.1727 24 2.1311
2.0602 0.2158 30 2.1502
2.1397 0.2590 36 2.1246
2.038 0.3022 42 2.1153
2.0877 0.3453 48 2.1311
2.1585 0.3885 54 2.1273
2.0513 0.4317 60 2.1105
2.0461 0.4748 66 2.1311
2.2131 0.5180 72 2.1174
2.1054 0.5612 78 2.1201
2.027 0.6043 84 2.1396
2.1459 0.6475 90 2.1223
2.0967 0.6906 96 2.1113
2.1131 0.7338 102 2.1283
2.0769 0.7770 108 2.1267
2.0293 0.8201 114 2.1059
2.0288 0.8633 120 2.1166
1.9989 0.9065 126 2.1163
2.1579 0.9496 132 2.1041
1.9982 0.9928 138 2.1103
2.0953 1.0360 144 2.1216
1.9626 1.0791 150 2.1030
2.1126 1.1223 156 2.1256
2.0291 1.1655 162 2.1370
2.0219 1.2086 168 2.1236
2.014 1.2518 174 2.1176
2.0008 1.2950 180 2.1286
2.0728 1.3381 186 2.1221
2.0873 1.3813 192 2.1235
2.1341 1.4245 198 2.1250
2.0258 1.4676 204 2.1253
2.0804 1.5108 210 2.1213
1.9285 1.5540 216 2.1091
2.0789 1.5971 222 2.1192
2.0234 1.6403 228 2.1141
1.9992 1.6835 234 2.1120
2.0681 1.7266 240 2.1243
2.0501 1.7698 246 2.1073
1.9897 1.8129 252 2.1228
2.016 1.8561 258 2.1291
2.0801 1.8993 264 2.1172
2.051 1.9424 270 2.0833
2.0864 1.9856 276 2.1147
2.0431 2.0288 282 2.1215
2.0321 2.0719 288 2.1119
2.1107 2.1151 294 2.1023
2.0375 2.1583 300 2.1155
1.979 2.2014 306 2.1224
2.0081 2.2446 312 2.1010
2.06 2.2878 318 2.1260
2.0285 2.3309 324 2.1282
2.0394 2.3741 330 2.1087
2.0224 2.4173 336 2.0999
2.0705 2.4604 342 2.1132
2.0153 2.5036 348 2.1028
2.0899 2.5468 354 2.1298
2.0474 2.5899 360 2.1162
2.0441 2.6331 366 2.0987
2.0019 2.6763 372 2.1071
1.9176 2.7194 378 2.1005
1.982 2.7626 384 2.0925
2.064 2.8058 390 2.1295
2.0284 2.8489 396 2.0917
2.0648 2.8921 402 2.1241
1.9668 2.9353 408 2.1286
1.9427 2.9784 414 2.0850

Framework versions

  • Transformers 4.51.3
  • Pytorch 2.5.1+cu124
  • Datasets 3.5.0
  • Tokenizers 0.21.1
Downloads last month
3
Safetensors
Model size
7.25B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Dans-DiscountModels/7b-m-dans-optimizersweeps-repremover-1-ademamix-b1_0.9-b2_0.999-b3_0.999-a15