tfa_output_2025_m05_d10_t23h_34m_59s

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0191

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 8
  • optimizer: Use OptimizerNames.PAGED_ADAMW with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss
No log 0 0 1.1126
2.2647 0.0101 50 1.1123
2.2524 0.0203 100 1.1094
2.072 0.0304 150 1.1027
2.3878 0.0406 200 1.0913
2.0925 0.0507 250 1.0792
2.0948 0.0609 300 1.0673
2.136 0.0710 350 1.0587
2.0376 0.0811 400 1.0522
2.0947 0.0913 450 1.0485
2.1447 0.1014 500 1.0452
2.1028 0.1116 550 1.0432
2.0132 0.1217 600 1.0417
2.1878 0.1318 650 1.0401
2.0967 0.1420 700 1.0388
2.0371 0.1521 750 1.0379
2.3369 0.1623 800 1.0371
2.017 0.1724 850 1.0361
2.0818 0.1826 900 1.0354
2.1155 0.1927 950 1.0352
2.1819 0.2028 1000 1.0344
1.8637 0.2130 1050 1.0338
2.0307 0.2231 1100 1.0329
2.2259 0.2333 1150 1.0326
2.1553 0.2434 1200 1.0321
2.1097 0.2535 1250 1.0315
2.107 0.2637 1300 1.0310
2.0916 0.2738 1350 1.0307
1.9564 0.2840 1400 1.0301
2.0589 0.2941 1450 1.0294
2.0271 0.3043 1500 1.0289
2.0601 0.3144 1550 1.0288
2.2035 0.3245 1600 1.0284
2.1796 0.3347 1650 1.0280
2.0038 0.3448 1700 1.0275
2.0133 0.3550 1750 1.0275
2.2494 0.3651 1800 1.0271
1.9862 0.3753 1850 1.0273
2.2446 0.3854 1900 1.0269
2.241 0.3955 1950 1.0267
1.8817 0.4057 2000 1.0264
2.3231 0.4158 2050 1.0261
2.3223 0.4260 2100 1.0261
2.3235 0.4361 2150 1.0259
2.0343 0.4462 2200 1.0256
2.018 0.4564 2250 1.0253
2.1532 0.4665 2300 1.0251
2.0791 0.4767 2350 1.0250
1.8937 0.4868 2400 1.0249
2.0474 0.4970 2450 1.0246
1.9105 0.5071 2500 1.0242
2.0524 0.5172 2550 1.0241
1.829 0.5274 2600 1.0241
1.985 0.5375 2650 1.0237
2.2954 0.5477 2700 1.0236
2.1254 0.5578 2750 1.0235
1.9017 0.5680 2800 1.0235
2.1831 0.5781 2850 1.0232
2.0031 0.5882 2900 1.0231
1.9792 0.5984 2950 1.0230
1.8149 0.6085 3000 1.0226
2.0161 0.6187 3050 1.0225
2.1239 0.6288 3100 1.0224
1.9753 0.6389 3150 1.0222
1.848 0.6491 3200 1.0220
2.0922 0.6592 3250 1.0220
2.0263 0.6694 3300 1.0218
2.0812 0.6795 3350 1.0216
2.1709 0.6897 3400 1.0216
2.0482 0.6998 3450 1.0218
2.0617 0.7099 3500 1.0216
2.1892 0.7201 3550 1.0215
1.8795 0.7302 3600 1.0215
2.0765 0.7404 3650 1.0214
2.1375 0.7505 3700 1.0211
2.3386 0.7606 3750 1.0210
2.1539 0.7708 3800 1.0208
2.076 0.7809 3850 1.0210
1.9461 0.7911 3900 1.0208
1.9757 0.8012 3950 1.0206
2.1436 0.8114 4000 1.0207
2.0764 0.8215 4050 1.0207
2.0771 0.8316 4100 1.0205
2.1269 0.8418 4150 1.0207
2.211 0.8519 4200 1.0204
2.0004 0.8621 4250 1.0203
1.9485 0.8722 4300 1.0202
1.9821 0.8824 4350 1.0200
2.1556 0.8925 4400 1.0200
1.9863 0.9026 4450 1.0198
1.7163 0.9128 4500 1.0199
2.0893 0.9229 4550 1.0197
2.1352 0.9331 4600 1.0196
1.7597 0.9432 4650 1.0196
2.193 0.9533 4700 1.0194
2.0867 0.9635 4750 1.0195
2.3983 0.9736 4800 1.0192
2.1052 0.9838 4850 1.0192
1.9144 0.9939 4900 1.0191

Framework versions

  • Transformers 4.51.3
  • Pytorch 2.1.2+cu121
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
0
Safetensors
Model size
8.03B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for brando/tfa_output_2025_m05_d10_t23h_34m_59s

Finetuned
(627)
this model