finetuned_llm

This model is a fine-tuned version of distilgpt2 on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 1

Training Loss	Epoch	Step	Validation Loss
1.2829	0.0272	500	1.4731
1.4874	0.0545	1000	1.4516
1.5212	0.0817	1500	1.4447
1.4601	0.1089	2000	1.4374
1.4151	0.1362	2500	1.4319
1.2995	0.1634	3000	1.4261
1.3719	0.1906	3500	1.4228
1.2652	0.2179	4000	1.4177
1.4001	0.2451	4500	1.4182
1.2865	0.2723	5000	1.4157
1.2739	0.2996	5500	1.4133
0.9939	0.3268	6000	1.4148
1.5027	0.3540	6500	1.4097
1.2803	0.3813	7000	1.4071
1.2642	0.4085	7500	1.4058
1.5353	0.4358	8000	1.4044
1.4802	0.4630	8500	1.4036
1.4642	0.4902	9000	1.4009
1.3732	0.5175	9500	1.3990
1.2863	0.5447	10000	1.3984
1.5622	0.5719	10500	1.3967
1.1612	0.5992	11000	1.3938
1.2709	0.6264	11500	1.3945
1.3817	0.6536	12000	1.3917
1.317	0.6809	12500	1.3916
1.4059	0.7081	13000	1.3896
1.4351	0.7353	13500	1.3895
1.3698	0.7626	14000	1.3868
1.2773	0.7898	14500	1.3863
1.397	0.8170	15000	1.3856
1.4358	0.8443	15500	1.3843
1.4635	0.8715	16000	1.3828
1.3525	0.8987	16500	1.3830
1.3133	0.9260	17000	1.3822
1.2064	0.9532	17500	1.3816
1.3223	0.9804	18000	1.3810