tfa_output_2025_m05_d10_t23h_34m_59s

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.0191

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-06
train_batch_size: 1
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 8
optimizer: Use OptimizerNames.PAGED_ADAMW with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss
No log	0	0	1.1126
2.2647	0.0101	50	1.1123
2.2524	0.0203	100	1.1094
2.072	0.0304	150	1.1027
2.3878	0.0406	200	1.0913
2.0925	0.0507	250	1.0792
2.0948	0.0609	300	1.0673
2.136	0.0710	350	1.0587
2.0376	0.0811	400	1.0522
2.0947	0.0913	450	1.0485
2.1447	0.1014	500	1.0452
2.1028	0.1116	550	1.0432
2.0132	0.1217	600	1.0417
2.1878	0.1318	650	1.0401
2.0967	0.1420	700	1.0388
2.0371	0.1521	750	1.0379
2.3369	0.1623	800	1.0371
2.017	0.1724	850	1.0361
2.0818	0.1826	900	1.0354
2.1155	0.1927	950	1.0352
2.1819	0.2028	1000	1.0344
1.8637	0.2130	1050	1.0338
2.0307	0.2231	1100	1.0329
2.2259	0.2333	1150	1.0326
2.1553	0.2434	1200	1.0321
2.1097	0.2535	1250	1.0315
2.107	0.2637	1300	1.0310
2.0916	0.2738	1350	1.0307
1.9564	0.2840	1400	1.0301
2.0589	0.2941	1450	1.0294
2.0271	0.3043	1500	1.0289
2.0601	0.3144	1550	1.0288
2.2035	0.3245	1600	1.0284
2.1796	0.3347	1650	1.0280
2.0038	0.3448	1700	1.0275
2.0133	0.3550	1750	1.0275
2.2494	0.3651	1800	1.0271
1.9862	0.3753	1850	1.0273
2.2446	0.3854	1900	1.0269
2.241	0.3955	1950	1.0267
1.8817	0.4057	2000	1.0264
2.3231	0.4158	2050	1.0261
2.3223	0.4260	2100	1.0261
2.3235	0.4361	2150	1.0259
2.0343	0.4462	2200	1.0256
2.018	0.4564	2250	1.0253
2.1532	0.4665	2300	1.0251
2.0791	0.4767	2350	1.0250
1.8937	0.4868	2400	1.0249
2.0474	0.4970	2450	1.0246
1.9105	0.5071	2500	1.0242
2.0524	0.5172	2550	1.0241
1.829	0.5274	2600	1.0241
1.985	0.5375	2650	1.0237
2.2954	0.5477	2700	1.0236
2.1254	0.5578	2750	1.0235
1.9017	0.5680	2800	1.0235
2.1831	0.5781	2850	1.0232
2.0031	0.5882	2900	1.0231
1.9792	0.5984	2950	1.0230
1.8149	0.6085	3000	1.0226
2.0161	0.6187	3050	1.0225
2.1239	0.6288	3100	1.0224
1.9753	0.6389	3150	1.0222
1.848	0.6491	3200	1.0220
2.0922	0.6592	3250	1.0220
2.0263	0.6694	3300	1.0218
2.0812	0.6795	3350	1.0216
2.1709	0.6897	3400	1.0216
2.0482	0.6998	3450	1.0218
2.0617	0.7099	3500	1.0216
2.1892	0.7201	3550	1.0215
1.8795	0.7302	3600	1.0215
2.0765	0.7404	3650	1.0214
2.1375	0.7505	3700	1.0211
2.3386	0.7606	3750	1.0210
2.1539	0.7708	3800	1.0208
2.076	0.7809	3850	1.0210
1.9461	0.7911	3900	1.0208
1.9757	0.8012	3950	1.0206
2.1436	0.8114	4000	1.0207
2.0764	0.8215	4050	1.0207
2.0771	0.8316	4100	1.0205
2.1269	0.8418	4150	1.0207
2.211	0.8519	4200	1.0204
2.0004	0.8621	4250	1.0203
1.9485	0.8722	4300	1.0202
1.9821	0.8824	4350	1.0200
2.1556	0.8925	4400	1.0200
1.9863	0.9026	4450	1.0198
1.7163	0.9128	4500	1.0199
2.0893	0.9229	4550	1.0197
2.1352	0.9331	4600	1.0196
1.7597	0.9432	4650	1.0196
2.193	0.9533	4700	1.0194
2.0867	0.9635	4750	1.0195
2.3983	0.9736	4800	1.0192
2.1052	0.9838	4850	1.0192
1.9144	0.9939	4900	1.0191

Framework versions

Transformers 4.51.3
Pytorch 2.1.2+cu121
Datasets 3.6.0
Tokenizers 0.21.1

brando
/

tfa_output_2025_m05_d10_t23h_34m_59s

tfa_output_2025_m05_d10_t23h_34m_59s

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for brando/tfa_output_2025_m05_d10_t23h_34m_59s

Evaluation results