train_3

This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 128
eval_batch_size: 128
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 5000
num_epochs: 200

Training Loss	Epoch	Step	Validation Loss
0.1074	1.0	568	0.3485
0.108	2.0	1136	0.3540
0.106	3.0	1704	0.3306
0.1045	4.0	2272	0.3545
0.1052	5.0	2840	0.3218
0.1033	6.0	3408	0.3230
0.1023	7.0	3976	0.3480
0.0995	8.0	4544	0.4591
0.1	9.0	5112	0.3469
0.0978	10.0	5680	0.3328
0.0974	11.0	6248	0.3641
0.0965	12.0	6816	0.3167
0.0951	13.0	7384	0.3220
0.0953	14.0	7952	0.3034
0.0935	15.0	8520	0.3595
0.0934	16.0	9088	0.3090
0.0942	17.0	9656	0.2997
0.0939	18.0	10224	0.3231
0.0918	19.0	10792	0.3788
0.0933	20.0	11360	0.3888
0.0916	21.0	11928	0.5056
0.0907	22.0	12496	0.3029
0.0905	23.0	13064	0.3338
0.0898	24.0	13632	0.3883
0.0892	25.0	14200	0.4280
0.0884	26.0	14768	0.3281
0.0894	27.0	15336	0.3609
0.0879	28.0	15904	0.3560
0.0881	29.0	16472	0.3502
0.0873	30.0	17040	0.3961
0.0866	31.0	17608	0.3481
0.0864	32.0	18176	0.3609