gpt-small-c4
This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 4.5355
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 12
- eval_batch_size: 12
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 24
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 20
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
7.1748 | 0.3116 | 1000 | 6.5566 |
6.3822 | 0.6232 | 2000 | 6.1967 |
6.1053 | 0.9349 | 3000 | 5.9622 |
5.892 | 1.2465 | 4000 | 5.7887 |
5.7427 | 1.5581 | 5000 | 5.6520 |
5.6236 | 1.8697 | 6000 | 5.5419 |
5.4991 | 2.1814 | 7000 | 5.4486 |
5.4177 | 2.4930 | 8000 | 5.3717 |
5.3543 | 2.8046 | 9000 | 5.3053 |
5.2758 | 3.1162 | 10000 | 5.2414 |
5.2012 | 3.4279 | 11000 | 5.1860 |
5.1618 | 3.7395 | 12000 | 5.1336 |
5.1123 | 4.0511 | 13000 | 5.0856 |
5.0395 | 4.3627 | 14000 | 5.0440 |
5.0134 | 4.6744 | 15000 | 5.0032 |
4.983 | 4.9860 | 16000 | 4.9643 |
4.9113 | 5.2976 | 17000 | 4.9329 |
4.8946 | 5.6092 | 18000 | 4.9002 |
4.8683 | 5.9208 | 19000 | 4.8749 |
4.8187 | 6.2325 | 20000 | 4.8469 |
4.7957 | 6.5441 | 21000 | 4.8241 |
4.7783 | 6.8557 | 22000 | 4.8015 |
4.7395 | 7.1673 | 23000 | 4.7823 |
4.7114 | 7.4790 | 24000 | 4.7671 |
4.7065 | 7.7906 | 25000 | 4.7499 |
4.6786 | 8.1022 | 26000 | 4.7338 |
4.6408 | 8.4138 | 27000 | 4.7195 |
4.639 | 8.7255 | 28000 | 4.7046 |
4.6306 | 9.0371 | 29000 | 4.6947 |
4.5925 | 9.3487 | 30000 | 4.6806 |
4.5793 | 9.6603 | 31000 | 4.6735 |
4.5797 | 9.9720 | 32000 | 4.6602 |
4.5404 | 10.2836 | 33000 | 4.6556 |
4.5374 | 10.5952 | 34000 | 4.6461 |
4.5373 | 10.9068 | 35000 | 4.6375 |
4.4964 | 11.2184 | 36000 | 4.6307 |
4.501 | 11.5301 | 37000 | 4.6238 |
4.4999 | 11.8417 | 38000 | 4.6140 |
4.4756 | 12.1533 | 39000 | 4.6090 |
4.4597 | 12.4649 | 40000 | 4.6053 |
4.4628 | 12.7766 | 41000 | 4.5970 |
4.4539 | 13.0882 | 42000 | 4.5910 |
4.4265 | 13.3998 | 43000 | 4.5901 |
4.4316 | 13.7114 | 44000 | 4.5829 |
4.4333 | 14.0231 | 45000 | 4.5771 |
4.4019 | 14.3347 | 46000 | 4.5721 |
4.412 | 14.6463 | 47000 | 4.5709 |
4.4036 | 14.9579 | 48000 | 4.5706 |
4.3814 | 15.2696 | 49000 | 4.5631 |
4.3843 | 15.5812 | 50000 | 4.5588 |
4.3849 | 15.8928 | 51000 | 4.5580 |
4.3716 | 16.2044 | 52000 | 4.5535 |
4.3648 | 16.5160 | 53000 | 4.5539 |
4.3672 | 16.8277 | 54000 | 4.5488 |
4.3582 | 17.1393 | 55000 | 4.5467 |
4.348 | 17.4509 | 56000 | 4.5454 |
4.351 | 17.7625 | 57000 | 4.5420 |
4.3499 | 18.0742 | 58000 | 4.5412 |
4.3387 | 18.3858 | 59000 | 4.5411 |
4.3347 | 18.6974 | 60000 | 4.5398 |
4.338 | 19.0090 | 61000 | 4.5383 |
4.3294 | 19.3207 | 62000 | 4.5375 |
4.3277 | 19.6323 | 63000 | 4.5351 |
4.3303 | 19.9439 | 64000 | 4.5355 |
Framework versions
- Transformers 4.44.2
- Pytorch 2.4.1+cu118
- Datasets 3.0.1
- Tokenizers 0.19.1
- Downloads last month
- 51
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support