gpt2-large-countdown

This model is a fine-tuned version of openai-community/gpt2-large on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.1586

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 3e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 16
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 5
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
0.241	0.0533	500	0.2299
0.2104	0.1067	1000	0.2008
0.1994	0.16	1500	0.1927
0.1927	0.2133	2000	0.1869
0.1894	0.2667	2500	0.1828
0.1847	0.32	3000	0.1803
0.1825	0.3733	3500	0.1785
0.1803	0.4267	4000	0.1761
0.1781	0.48	4500	0.1748
0.1768	0.5333	5000	0.1730
0.1764	0.5867	5500	0.1721
0.1745	0.64	6000	0.1712
0.1727	0.6933	6500	0.1708
0.1719	0.7467	7000	0.1690
0.171	0.8	7500	0.1691
0.1704	0.8533	8000	0.1679
0.1697	0.9067	8500	0.1677
0.1696	0.96	9000	0.1670
0.1663	1.0133	9500	0.1665
0.1663	1.0667	10000	0.1665
0.1667	1.12	10500	0.1663
0.1661	1.1733	11000	0.1657
0.166	1.2267	11500	0.1656
0.1655	1.28	12000	0.1651
0.1645	1.3333	12500	0.1649
0.1645	1.3867	13000	0.1646
0.1642	1.44	13500	0.1642
0.1642	1.4933	14000	0.1637
0.1641	1.5467	14500	0.1639
0.1635	1.6	15000	0.1635
0.1634	1.6533	15500	0.1631
0.1637	1.7067	16000	0.1629
0.1636	1.76	16500	0.1630
0.1628	1.8133	17000	0.1627
0.1623	1.8667	17500	0.1624
0.1623	1.92	18000	0.1620
0.1621	1.9733	18500	0.1621
0.1596	2.0267	19000	0.1619
0.1597	2.08	19500	0.1619
0.159	2.1333	20000	0.1618
0.1594	2.1867	20500	0.1616
0.1591	2.24	21000	0.1615
0.1595	2.2933	21500	0.1613
0.1593	2.3467	22000	0.1611
0.1591	2.4	22500	0.1612
0.1591	2.4533	23000	0.1609
0.159	2.5067	23500	0.1607
0.1586	2.56	24000	0.1606
0.1592	2.6133	24500	0.1607
0.1581	2.6667	25000	0.1604
0.1586	2.7200	25500	0.1601
0.1584	2.7733	26000	0.1602
0.1581	2.8267	26500	0.1600
0.1579	2.88	27000	0.1599
0.1584	2.9333	27500	0.1598
0.1581	2.9867	28000	0.1597
0.1553	3.04	28500	0.1601
0.1554	3.0933	29000	0.1599
0.155	3.1467	29500	0.1601
0.1551	3.2	30000	0.1600
0.1554	3.2533	30500	0.1597
0.1549	3.3067	31000	0.1597
0.1549	3.36	31500	0.1596
0.1548	3.4133	32000	0.1597
0.1545	3.4667	32500	0.1594
0.1548	3.52	33000	0.1595
0.1549	3.5733	33500	0.1593
0.1544	3.6267	34000	0.1592
0.1551	3.68	34500	0.1592
0.1549	3.7333	35000	0.1591
0.1547	3.7867	35500	0.1590
0.1544	3.84	36000	0.1588
0.1545	3.8933	36500	0.1587
0.1547	3.9467	37000	0.1588
0.1549	4.0	37500	0.1588
0.1519	4.0533	38000	0.1591
0.1514	4.1067	38500	0.1592
0.1516	4.16	39000	0.1593
0.1518	4.2133	39500	0.1592
0.1514	4.2667	40000	0.1591
0.1516	4.32	40500	0.1591
0.1514	4.3733	41000	0.1590
0.152	4.4267	41500	0.1589
0.1512	4.48	42000	0.1589
0.152	4.5333	42500	0.1588
0.1511	4.5867	43000	0.1588
0.1511	4.64	43500	0.1588
0.1514	4.6933	44000	0.1588
0.1513	4.7467	44500	0.1586
0.1511	4.8	45000	0.1586
0.1513	4.8533	45500	0.1586
0.1514	4.9067	46000	0.1586
0.1511	4.96	46500	0.1586

Framework versions

Transformers 4.51.1
Pytorch 2.5.1+cu121
Datasets 3.5.0
Tokenizers 0.21.1

giordanorogers
/

gpt2-large-countdown

gpt2-large-countdown

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for giordanorogers/gpt2-large-countdown

Evaluation results