gpt-small-c4

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 4.5355

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 12
  • eval_batch_size: 12
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 24
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 20
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
7.1748 0.3116 1000 6.5566
6.3822 0.6232 2000 6.1967
6.1053 0.9349 3000 5.9622
5.892 1.2465 4000 5.7887
5.7427 1.5581 5000 5.6520
5.6236 1.8697 6000 5.5419
5.4991 2.1814 7000 5.4486
5.4177 2.4930 8000 5.3717
5.3543 2.8046 9000 5.3053
5.2758 3.1162 10000 5.2414
5.2012 3.4279 11000 5.1860
5.1618 3.7395 12000 5.1336
5.1123 4.0511 13000 5.0856
5.0395 4.3627 14000 5.0440
5.0134 4.6744 15000 5.0032
4.983 4.9860 16000 4.9643
4.9113 5.2976 17000 4.9329
4.8946 5.6092 18000 4.9002
4.8683 5.9208 19000 4.8749
4.8187 6.2325 20000 4.8469
4.7957 6.5441 21000 4.8241
4.7783 6.8557 22000 4.8015
4.7395 7.1673 23000 4.7823
4.7114 7.4790 24000 4.7671
4.7065 7.7906 25000 4.7499
4.6786 8.1022 26000 4.7338
4.6408 8.4138 27000 4.7195
4.639 8.7255 28000 4.7046
4.6306 9.0371 29000 4.6947
4.5925 9.3487 30000 4.6806
4.5793 9.6603 31000 4.6735
4.5797 9.9720 32000 4.6602
4.5404 10.2836 33000 4.6556
4.5374 10.5952 34000 4.6461
4.5373 10.9068 35000 4.6375
4.4964 11.2184 36000 4.6307
4.501 11.5301 37000 4.6238
4.4999 11.8417 38000 4.6140
4.4756 12.1533 39000 4.6090
4.4597 12.4649 40000 4.6053
4.4628 12.7766 41000 4.5970
4.4539 13.0882 42000 4.5910
4.4265 13.3998 43000 4.5901
4.4316 13.7114 44000 4.5829
4.4333 14.0231 45000 4.5771
4.4019 14.3347 46000 4.5721
4.412 14.6463 47000 4.5709
4.4036 14.9579 48000 4.5706
4.3814 15.2696 49000 4.5631
4.3843 15.5812 50000 4.5588
4.3849 15.8928 51000 4.5580
4.3716 16.2044 52000 4.5535
4.3648 16.5160 53000 4.5539
4.3672 16.8277 54000 4.5488
4.3582 17.1393 55000 4.5467
4.348 17.4509 56000 4.5454
4.351 17.7625 57000 4.5420
4.3499 18.0742 58000 4.5412
4.3387 18.3858 59000 4.5411
4.3347 18.6974 60000 4.5398
4.338 19.0090 61000 4.5383
4.3294 19.3207 62000 4.5375
4.3277 19.6323 63000 4.5351
4.3303 19.9439 64000 4.5355

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.4.1+cu118
  • Datasets 3.0.1
  • Tokenizers 0.19.1
Downloads last month
51
Safetensors
Model size
70.4M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support