train_stsb_1745333588

This model is a fine-tuned version of google/gemma-3-1b-it on the stsb dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2676
  • Num Input Tokens Seen: 61089232

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • training_steps: 40000

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.3275 0.6182 200 0.4026 305312
0.221 1.2349 400 0.3055 610048
0.2426 1.8532 600 0.2938 917664
0.2098 2.4699 800 0.2834 1223104
0.2086 3.0866 1000 0.2775 1528432
0.3556 3.7048 1200 0.2740 1837520
0.2249 4.3215 1400 0.2798 2143216
0.2019 4.9397 1600 0.2718 2448176
0.1746 5.5564 1800 0.2676 2752768
0.2006 6.1731 2000 0.2784 3059504
0.1994 6.7913 2200 0.2786 3364688
0.2069 7.4080 2400 0.2786 3672432
0.1646 8.0247 2600 0.2956 3978272
0.1708 8.6430 2800 0.2893 4285856
0.1487 9.2597 3000 0.3124 4588608
0.1676 9.8779 3200 0.3109 4894432
0.1486 10.4946 3400 0.3226 5200528
0.1304 11.1113 3600 0.3695 5504960
0.1203 11.7295 3800 0.3689 5808800
0.1129 12.3462 4000 0.3892 6114608
0.1536 12.9645 4200 0.4068 6419376
0.1164 13.5811 4400 0.3944 6725664
0.0889 14.1978 4600 0.4298 7030208
0.1248 14.8161 4800 0.4356 7335712
0.0907 15.4328 5000 0.4585 7641232
0.0892 16.0495 5200 0.5003 7945360
0.0685 16.6677 5400 0.5207 8252048
0.0894 17.2844 5600 0.5641 8557024
0.0896 17.9026 5800 0.5424 8862080
0.0709 18.5193 6000 0.5848 9167248
0.0591 19.1360 6200 0.5905 9472816
0.0805 19.7543 6400 0.5791 9779344
0.0392 20.3709 6600 0.6397 10085888
0.0585 20.9892 6800 0.6511 10391904
0.0437 21.6059 7000 0.6394 10697664
0.056 22.2226 7200 0.6989 11000832
0.0462 22.8408 7400 0.6881 11308384
0.0347 23.4575 7600 0.7002 11614048
0.0281 24.0742 7800 0.7577 11917328
0.0427 24.6924 8000 0.7156 12224848
0.0331 25.3091 8200 0.7304 12530128
0.0283 25.9274 8400 0.7480 12838032
0.0382 26.5440 8600 0.7612 13142096
0.0248 27.1607 8800 0.7911 13447712
0.0368 27.7790 9000 0.7727 13751968
0.0228 28.3957 9200 0.8362 14060176
0.0215 29.0124 9400 0.8463 14362928
0.0207 29.6306 9600 0.8463 14669168
0.0132 30.2473 9800 0.8564 14973568
0.0241 30.8655 10000 0.8121 15279840
0.0179 31.4822 10200 0.8873 15586352
0.0163 32.0989 10400 0.8885 15891232
0.0097 32.7172 10600 0.9024 16197472
0.0258 33.3338 10800 0.9062 16500992
0.0122 33.9521 11000 0.9393 16807808
0.0129 34.5688 11200 0.9574 17112928
0.0141 35.1855 11400 0.9508 17420016
0.011 35.8037 11600 0.9517 17726608
0.0148 36.4204 11800 0.9903 18030288
0.0134 37.0371 12000 0.9711 18337584
0.0064 37.6553 12200 0.9344 18640720
0.003 38.2720 12400 0.9471 18946400
0.0037 38.8903 12600 1.0090 19254240
0.0073 39.5070 12800 1.0206 19558592
0.0023 40.1236 13000 1.0002 19861168
0.004 40.7419 13200 1.0219 20169712
0.0063 41.3586 13400 1.0153 20475008
0.0032 41.9768 13600 1.0315 20782016
0.0188 42.5935 13800 1.0707 21085440
0.0031 43.2102 14000 1.0690 21391616
0.0116 43.8284 14200 1.0848 21696768
0.0024 44.4451 14400 1.0729 22001488
0.0045 45.0618 14600 1.0530 22307216
0.0061 45.6801 14800 1.0578 22612016
0.0034 46.2968 15000 1.0937 22917744
0.0044 46.9150 15200 1.1081 23224720
0.0048 47.5317 15400 1.0822 23531040
0.0059 48.1484 15600 1.0939 23836048
0.0009 48.7666 15800 1.0975 24140240
0.0028 49.3833 16000 1.1265 24445104
0.0034 50.0 16200 1.0952 24750256
0.0009 50.6182 16400 1.1263 25055056
0.0004 51.2349 16600 1.0917 25360976
0.0044 51.8532 16800 1.1134 25669136
0.0019 52.4699 17000 1.0966 25972400
0.0008 53.0866 17200 1.1220 26280272
0.0015 53.7048 17400 1.1494 26583184
0.001 54.3215 17600 1.1380 26891152
0.0004 54.9397 17800 1.1527 27197008
0.0007 55.5564 18000 1.1521 27500160
0.0007 56.1731 18200 1.1340 27805616
0.0006 56.7913 18400 1.1533 28112336
0.0027 57.4080 18600 1.1523 28419888
0.0063 58.0247 18800 1.1742 28724096
0.0003 58.6430 19000 1.1629 29031328
0.0002 59.2597 19200 1.1756 29336560
0.0009 59.8779 19400 1.1314 29642224
0.0018 60.4946 19600 1.1558 29947456
0.0005 61.1113 19800 1.1776 30252288
0.0064 61.7295 20000 1.1481 30557408
0.0001 62.3462 20200 1.1813 30862656
0.0013 62.9645 20400 1.1702 31169472
0.0001 63.5811 20600 1.1900 31474928
0.0002 64.1978 20800 1.2230 31778496
0.0013 64.8161 21000 1.1976 32086304
0.0009 65.4328 21200 1.2225 32389328
0.0009 66.0495 21400 1.1915 32696656
0.0008 66.6677 21600 1.2035 33001008
0.0004 67.2844 21800 1.1833 33306288
0.0001 67.9026 22000 1.1998 33612592
0.0026 68.5193 22200 1.2146 33914992
0.002 69.1360 22400 1.2088 34219808
0.0001 69.7543 22600 1.2275 34525536
0.0002 70.3709 22800 1.2042 34829856
0.0004 70.9892 23000 1.2163 35134560
0.0 71.6059 23200 1.2090 35439168
0.0008 72.2226 23400 1.2194 35744608
0.0012 72.8408 23600 1.1988 36050688
0.0 73.4575 23800 1.2098 36353808
0.0 74.0742 24000 1.2206 36660560
0.0018 74.6924 24200 1.2063 36968464
0.0002 75.3091 24400 1.2526 37273264
0.0001 75.9274 24600 1.2034 37578896
0.0012 76.5440 24800 1.1966 37882832
0.0001 77.1607 25000 1.2145 38187312
0.0029 77.7790 25200 1.2366 38492720
0.0 78.3957 25400 1.2173 38796864
0.0003 79.0124 25600 1.2300 39103824
0.0 79.6306 25800 1.2622 39410448
0.0 80.2473 26000 1.2182 39715280
0.0001 80.8655 26200 1.2078 40021520
0.0 81.4822 26400 1.2485 40325376
0.0002 82.0989 26600 1.2378 40631296
0.0014 82.7172 26800 1.2316 40937312
0.0 83.3338 27000 1.2401 41240464
0.0001 83.9521 27200 1.2655 41550128
0.0006 84.5688 27400 1.2645 41855152
0.0 85.1855 27600 1.2606 42158912
0.0001 85.8037 27800 1.2502 42461856
0.0 86.4204 28000 1.2512 42769760
0.0001 87.0371 28200 1.2645 43074800
0.0 87.6553 28400 1.2440 43378640
0.0 88.2720 28600 1.2596 43683840
0.0 88.8903 28800 1.2711 43988256
0.0012 89.5070 29000 1.2651 44294256
0.0002 90.1236 29200 1.2797 44598464
0.0018 90.7419 29400 1.2737 44904928
0.0 91.3586 29600 1.2572 45208784
0.0 91.9768 29800 1.2705 45516336
0.0 92.5935 30000 1.2617 45820432
0.0 93.2102 30200 1.2716 46127408
0.0 93.8284 30400 1.2695 46431888
0.0 94.4451 30600 1.2839 46736368
0.0 95.0618 30800 1.2772 47043472
0.0 95.6801 31000 1.2825 47348976
0.0 96.2968 31200 1.2849 47652864
0.0 96.9150 31400 1.2851 47959872
0.0 97.5317 31600 1.2911 48265392
0.0 98.1484 31800 1.3045 48569984
0.0 98.7666 32000 1.3080 48874016
0.0015 99.3833 32200 1.2938 49181056
0.0 100.0 32400 1.2963 49485120
0.0 100.6182 32600 1.3041 49790304
0.0 101.2349 32800 1.3001 50097008
0.0 101.8532 33000 1.2975 50403088
0.001 102.4699 33200 1.2996 50707088
0.0 103.0866 33400 1.3146 51010144
0.0 103.7048 33600 1.3123 51318976
0.0006 104.3215 33800 1.3119 51623136
0.0 104.9397 34000 1.3114 51930240
0.0 105.5564 34200 1.3167 52233888
0.0006 106.1731 34400 1.3102 52541008
0.0 106.7913 34600 1.3217 52845904
0.0 107.4080 34800 1.3157 53150720
0.0 108.0247 35000 1.3300 53456816
0.0 108.6430 35200 1.3226 53760816
0.0 109.2597 35400 1.3300 54066160
0.0 109.8779 35600 1.3214 54371888
0.0 110.4946 35800 1.3269 54676672
0.0 111.1113 36000 1.3257 54983008
0.0 111.7295 36200 1.3201 55289472
0.0 112.3462 36400 1.3271 55591632
0.0 112.9645 36600 1.3276 55898640
0.0 113.5811 36800 1.3308 56202288
0.0005 114.1978 37000 1.3314 56510384
0.0 114.8161 37200 1.3328 56816880
0.0001 115.4328 37400 1.3345 57119232
0.0 116.0495 37600 1.3253 57424224
0.0 116.6677 37800 1.3277 57729856
0.0 117.2844 38000 1.3280 58034352
0.0 117.9026 38200 1.3345 58342576
0.0 118.5193 38400 1.3325 58648384
0.0 119.1360 38600 1.3329 58953568
0.0 119.7543 38800 1.3302 59257088
0.0 120.3709 39000 1.3315 59562208
0.0 120.9892 39200 1.3316 59867712
0.0 121.6059 39400 1.3352 60173616
0.0 122.2226 39600 1.3316 60476592
0.0 122.8408 39800 1.3298 60782960
0.0 123.4575 40000 1.3301 61089232

Framework versions

  • PEFT 0.15.1
  • Transformers 4.51.3
  • Pytorch 2.6.0+cu124
  • Datasets 3.5.0
  • Tokenizers 0.21.1
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_stsb_1745333588

Adapter
(81)
this model