train_cola_1744902669

This model is a fine-tuned version of google/gemma-3-1b-it on the cola dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1136
  • Num Input Tokens Seen: 31253176

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • training_steps: 40000

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.1486 0.4158 200 0.1388 156832
0.1359 0.8316 400 0.1436 313248
0.1519 1.2474 600 0.1377 469520
0.1529 1.6632 800 0.1216 625360
0.1103 2.0790 1000 0.1448 782304
0.0885 2.4948 1200 0.1172 938560
0.1365 2.9106 1400 0.1182 1094144
0.0822 3.3264 1600 0.1136 1250544
0.0472 3.7422 1800 0.1163 1407440
0.0569 4.1580 2000 0.1181 1563512
0.0738 4.5738 2200 0.1314 1719064
0.0523 4.9896 2400 0.1549 1875384
0.0289 5.4054 2600 0.1586 2031440
0.0208 5.8212 2800 0.1612 2187952
0.0186 6.2370 3000 0.1915 2344864
0.0309 6.6528 3200 0.2197 2500448
0.0053 7.0686 3400 0.2014 2656400
0.0009 7.4844 3600 0.2372 2812912
0.0028 7.9002 3800 0.2015 2968816
0.0336 8.3160 4000 0.2057 3124448
0.0461 8.7318 4200 0.2184 3280320
0.0188 9.1476 4400 0.2262 3437072
0.0194 9.5634 4600 0.2845 3593520
0.0212 9.9792 4800 0.2370 3750544
0.0049 10.3950 5000 0.2759 3905920
0.0018 10.8108 5200 0.2150 4063008
0.0027 11.2266 5400 0.1926 4219472
0.0052 11.6424 5600 0.2233 4376048
0.001 12.0582 5800 0.3083 4531752
0.0015 12.4740 6000 0.2678 4687112
0.0072 12.8898 6200 0.2843 4843464
0.0117 13.3056 6400 0.3738 4999648
0.0599 13.7214 6600 0.3714 5157152
0.0025 14.1372 6800 0.2460 5312328
0.0005 14.5530 7000 0.3300 5468680
0.0206 14.9688 7200 0.3044 5624776
0.0012 15.3846 7400 0.3596 5782032
0.0001 15.8004 7600 0.4537 5938000
0.0187 16.2162 7800 0.4085 6094536
0.0204 16.6320 8000 0.3620 6250760
0.0135 17.0478 8200 0.5138 6406616
0.0025 17.4636 8400 0.4107 6563416
0.0004 17.8794 8600 0.3036 6719288
0.0086 18.2952 8800 0.4243 6875592
0.0238 18.7110 9000 0.3896 7032392
0.0003 19.1268 9200 0.3224 7188120
0.0335 19.5426 9400 0.3270 7344760
0.0025 19.9584 9600 0.2940 7501144
0.029 20.3742 9800 0.2817 7657160
0.0 20.7900 10000 0.4502 7813128
0.0196 21.2058 10200 0.3967 7969880
0.0003 21.6216 10400 0.3231 8126392
0.0102 22.0374 10600 0.4725 8282480
0.0002 22.4532 10800 0.4422 8438992
0.0022 22.8690 11000 0.4707 8595376
0.0002 23.2848 11200 0.5725 8751352
0.0046 23.7006 11400 0.3873 8907960
0.0116 24.1164 11600 0.4293 9064424
0.0002 24.5322 11800 0.3991 9220456
0.0 24.9480 12000 0.3167 9376488
0.0 25.3638 12200 0.5012 9533208
0.0 25.7796 12400 0.4632 9689464
0.0 26.1954 12600 0.4233 9845048
0.009 26.6112 12800 0.4824 10001784
0.008 27.0270 13000 0.5094 10157800
0.0 27.4428 13200 0.3998 10313128
0.0005 27.8586 13400 0.3729 10469384
0.0 28.2744 13600 0.4658 10625944
0.0002 28.6902 13800 0.3126 10782456
0.0 29.1060 14000 0.4560 10938304
0.0019 29.5218 14200 0.4240 11094528
0.0038 29.9376 14400 0.4524 11250976
0.0001 30.3534 14600 0.4690 11406672
0.0 30.7692 14800 0.4113 11562768
0.0072 31.1850 15000 0.3631 11719016
0.0 31.6008 15200 0.4143 11875368
0.0014 32.0166 15400 0.3420 12031048
0.0 32.4324 15600 0.3402 12187432
0.0002 32.8482 15800 0.3676 12343432
0.0 33.2640 16000 0.3827 12500472
0.0001 33.6798 16200 0.3322 12656248
0.0001 34.0956 16400 0.3601 12811752
0.0001 34.5114 16600 0.3241 12968104
0.0 34.9272 16800 0.3508 13124392
0.0 35.3430 17000 0.4569 13281144
0.0 35.7588 17200 0.4360 13437720
0.0048 36.1746 17400 0.4481 13594448
0.0 36.5904 17600 0.4795 13750544
0.0032 37.0062 17800 0.4704 13906304
0.0 37.4220 18000 0.3902 14062784
0.0 37.8378 18200 0.4358 14219168
0.0 38.2536 18400 0.4856 14375024
0.0039 38.6694 18600 0.3733 14530800
0.0365 39.0852 18800 0.3963 14687808
0.0 39.5010 19000 0.3972 14843360
0.0001 39.9168 19200 0.3093 14999808
0.042 40.3326 19400 0.3385 15155496
0.0001 40.7484 19600 0.3630 15311688
0.0 41.1642 19800 0.3730 15468264
0.0029 41.5800 20000 0.3523 15624072
0.0001 41.9958 20200 0.4160 15780456
0.0 42.4116 20400 0.4530 15936432
0.0019 42.8274 20600 0.4244 16092272
0.0 43.2432 20800 0.4572 16249048
0.0 43.6590 21000 0.3548 16405368
0.0041 44.0748 21200 0.3602 16561000
0.0 44.4906 21400 0.4284 16718312
0.0 44.9064 21600 0.4154 16874632
0.0 45.3222 21800 0.4509 17031680
0.0 45.7380 22000 0.4369 17188288
0.0 46.1538 22200 0.5120 17345048
0.0 46.5696 22400 0.4886 17501560
0.0051 46.9854 22600 0.5075 17657336
0.0 47.4012 22800 0.5012 17813576
0.0 47.8170 23000 0.4887 17970024
0.0 48.2328 23200 0.5224 18126280
0.0023 48.6486 23400 0.5204 18282568
0.0 49.0644 23600 0.5279 18438872
0.0 49.4802 23800 0.5578 18595416
0.0054 49.8960 24000 0.4464 18751672
0.0 50.3119 24200 0.4530 18906848
0.0 50.7277 24400 0.4825 19064192
0.0 51.1435 24600 0.4913 19219856
0.0 51.5593 24800 0.5009 19376464
0.0 51.9751 25000 0.5192 19532272
0.0029 52.3909 25200 0.5196 19688288
0.0 52.8067 25400 0.5195 19844672
0.0 53.2225 25600 0.5287 20001552
0.0 53.6383 25800 0.5364 20157424
0.003 54.0541 26000 0.5265 20313440
0.0025 54.4699 26200 0.5382 20469664
0.0058 54.8857 26400 0.5314 20625984
0.002 55.3015 26600 0.5449 20781904
0.0 55.7173 26800 0.5369 20938512
0.0 56.1331 27000 0.5501 21095008
0.0 56.5489 27200 0.5638 21251264
0.0 56.9647 27400 0.5525 21407744
0.0 57.3805 27600 0.5526 21564560
0.0 57.7963 27800 0.5543 21720560
0.0 58.2121 28000 0.5647 21877024
0.0 58.6279 28200 0.5636 22033344
0.0 59.0437 28400 0.5684 22189872
0.0 59.4595 28600 0.5754 22345712
0.0 59.8753 28800 0.5927 22502352
0.0 60.2911 29000 0.5581 22658440
0.0 60.7069 29200 0.5676 22814056
0.0 61.1227 29400 0.5812 22970680
0.0 61.5385 29600 0.5788 23126776
0.0 61.9543 29800 0.5783 23283064
0.0 62.3701 30000 0.5960 23440000
0.0035 62.7859 30200 0.5771 23596224
0.0 63.2017 30400 0.5874 23751880
0.0 63.6175 30600 0.5837 23907624
0.0 64.0333 30800 0.5827 24063864
0.0 64.4491 31000 0.5825 24219608
0.0 64.8649 31200 0.5768 24376856
0.0 65.2807 31400 0.5760 24533352
0.0023 65.6965 31600 0.5757 24688616
0.0026 66.1123 31800 0.5854 24844832
0.0 66.5281 32000 0.5968 25002240
0.003 66.9439 32200 0.5836 25158144
0.0 67.3597 32400 0.5944 25314384
0.0 67.7755 32600 0.5914 25470704
0.0 68.1913 32800 0.5901 25627200
0.0 68.6071 33000 0.5959 25783456
0.0 69.0229 33200 0.5941 25940304
0.0025 69.4387 33400 0.5907 26096432
0.0 69.8545 33600 0.5958 26253360
0.0027 70.2703 33800 0.5947 26408736
0.0 70.6861 34000 0.5995 26565056
0.0 71.1019 34200 0.5997 26721176
0.0 71.5177 34400 0.5993 26877368
0.0 71.9335 34600 0.5985 27033912
0.0 72.3493 34800 0.6059 27190376
0.0 72.7651 35000 0.5963 27347112
0.0024 73.1809 35200 0.5994 27503480
0.0 73.5967 35400 0.5972 27660280
0.0 74.0125 35600 0.6042 27815536
0.0028 74.4283 35800 0.6032 27971600
0.0 74.8441 36000 0.5979 28127664
0.0 75.2599 36200 0.5976 28284736
0.0 75.6757 36400 0.6029 28440672
0.0 76.0915 36600 0.6010 28596968
0.0025 76.5073 36800 0.6039 28753672
0.0027 76.9231 37000 0.6040 28909800
0.0 77.3389 37200 0.6021 29066104
0.0 77.7547 37400 0.6025 29222328
0.0027 78.1705 37600 0.6012 29378344
0.0 78.5863 37800 0.6025 29534888
0.0 79.0021 38000 0.5997 29690392
0.0 79.4179 38200 0.6019 29846936
0.0 79.8337 38400 0.5982 30002424
0.0 80.2495 38600 0.6050 30158536
0.0025 80.6653 38800 0.6021 30314984
0.0 81.0811 39000 0.6039 30471288
0.0 81.4969 39200 0.6009 30628024
0.0 81.9127 39400 0.6022 30784376
0.0027 82.3285 39600 0.6016 30940904
0.0 82.7443 39800 0.6060 31097352
0.0 83.1601 40000 0.6031 31253176

Framework versions

  • PEFT 0.15.1
  • Transformers 4.51.3
  • Pytorch 2.6.0+cu124
  • Datasets 3.5.0
  • Tokenizers 0.21.1
Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_cola_1744902669

Adapter
(81)
this model