gpt-small-c4

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 4.5355

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 12
eval_batch_size: 12
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 24
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 20
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
7.1748	0.3116	1000	6.5566
6.3822	0.6232	2000	6.1967
6.1053	0.9349	3000	5.9622
5.892	1.2465	4000	5.7887
5.7427	1.5581	5000	5.6520
5.6236	1.8697	6000	5.5419
5.4991	2.1814	7000	5.4486
5.4177	2.4930	8000	5.3717
5.3543	2.8046	9000	5.3053
5.2758	3.1162	10000	5.2414
5.2012	3.4279	11000	5.1860
5.1618	3.7395	12000	5.1336
5.1123	4.0511	13000	5.0856
5.0395	4.3627	14000	5.0440
5.0134	4.6744	15000	5.0032
4.983	4.9860	16000	4.9643
4.9113	5.2976	17000	4.9329
4.8946	5.6092	18000	4.9002
4.8683	5.9208	19000	4.8749
4.8187	6.2325	20000	4.8469
4.7957	6.5441	21000	4.8241
4.7783	6.8557	22000	4.8015
4.7395	7.1673	23000	4.7823
4.7114	7.4790	24000	4.7671
4.7065	7.7906	25000	4.7499
4.6786	8.1022	26000	4.7338
4.6408	8.4138	27000	4.7195
4.639	8.7255	28000	4.7046
4.6306	9.0371	29000	4.6947
4.5925	9.3487	30000	4.6806
4.5793	9.6603	31000	4.6735
4.5797	9.9720	32000	4.6602
4.5404	10.2836	33000	4.6556
4.5374	10.5952	34000	4.6461
4.5373	10.9068	35000	4.6375
4.4964	11.2184	36000	4.6307
4.501	11.5301	37000	4.6238
4.4999	11.8417	38000	4.6140
4.4756	12.1533	39000	4.6090
4.4597	12.4649	40000	4.6053
4.4628	12.7766	41000	4.5970
4.4539	13.0882	42000	4.5910
4.4265	13.3998	43000	4.5901
4.4316	13.7114	44000	4.5829
4.4333	14.0231	45000	4.5771
4.4019	14.3347	46000	4.5721
4.412	14.6463	47000	4.5709
4.4036	14.9579	48000	4.5706
4.3814	15.2696	49000	4.5631
4.3843	15.5812	50000	4.5588
4.3849	15.8928	51000	4.5580
4.3716	16.2044	52000	4.5535
4.3648	16.5160	53000	4.5539
4.3672	16.8277	54000	4.5488
4.3582	17.1393	55000	4.5467
4.348	17.4509	56000	4.5454
4.351	17.7625	57000	4.5420
4.3499	18.0742	58000	4.5412
4.3387	18.3858	59000	4.5411
4.3347	18.6974	60000	4.5398
4.338	19.0090	61000	4.5383
4.3294	19.3207	62000	4.5375
4.3277	19.6323	63000	4.5351
4.3303	19.9439	64000	4.5355

Framework versions

Transformers 4.44.2
Pytorch 2.4.1+cu118
Datasets 3.0.1
Tokenizers 0.19.1

binhphap5
/

gpt-small-c4

gpt-small-c4

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Evaluation results