train_rte_1744902656

This model is a fine-tuned version of google/gemma-3-1b-it on the rte dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0910
  • Num Input Tokens Seen: 102120968

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • training_steps: 40000

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.0766 1.4207 200 0.0910 514544
0.05 2.8414 400 0.1007 1025888
0.0277 4.2567 600 0.1036 1532968
0.0481 5.6774 800 0.1126 2047232
0.0029 7.0927 1000 0.1400 2553328
0.0018 8.5134 1200 0.1731 3064584
0.0007 9.9340 1400 0.1923 3578720
0.0001 11.3494 1600 0.2826 4086872
0.0074 12.7701 1800 0.2753 4597072
0.0 14.1854 2000 0.3234 5107408
0.0014 15.6061 2200 0.3322 5618600
0.0 17.0214 2400 0.3375 6127088
0.0006 18.4421 2600 0.2542 6640448
0.0 19.8627 2800 0.3356 7149392
0.0001 21.2781 3000 0.3316 7655512
0.0 22.6988 3200 0.3616 8170864
0.0001 24.1141 3400 0.3057 8679240
0.0001 25.5348 3600 0.2613 9186256
0.0034 26.9554 3800 0.2324 9701496
0.0 28.3708 4000 0.3714 10208536
0.0001 29.7914 4200 0.3458 10718920
0.0 31.2068 4400 0.2944 11238024
0.0001 32.6275 4600 0.2806 11745784
0.0001 34.0428 4800 0.2851 12256384
0.0 35.4635 5000 0.3899 12764464
0.0 36.8841 5200 0.4043 13274464
0.0 38.2995 5400 0.4146 13783792
0.0 39.7201 5600 0.4235 14300048
0.0 41.1355 5800 0.4292 14802928
0.0 42.5561 6000 0.4423 15310736
0.0 43.9768 6200 0.4502 15826504
0.0 45.3922 6400 0.4570 16328920
0.0 46.8128 6600 0.4632 16846456
0.0 48.2282 6800 0.4743 17353504
0.0 49.6488 7000 0.4799 17866072
0.0 51.0642 7200 0.4847 18373808
0.0 52.4848 7400 0.4939 18884136
0.0 53.9055 7600 0.4967 19402904
0.0 55.3209 7800 0.5068 19913320
0.0 56.7415 8000 0.5081 20425320
0.0 58.1569 8200 0.5173 20932224
0.0 59.5775 8400 0.5122 21443632
0.0 60.9982 8600 0.5242 21958520
0.0 62.4135 8800 0.5309 22465568
0.0 63.8342 9000 0.5373 22978872
0.0 65.2496 9200 0.5365 23489416
0.0 66.6702 9400 0.5463 23998568
0.0 68.0856 9600 0.5540 24508040
0.0 69.5062 9800 0.5643 25021816
0.0 70.9269 10000 0.5673 25536008
0.0 72.3422 10200 0.5545 26049360
0.0 73.7629 10400 0.5728 26563296
0.0 75.1783 10600 0.5705 27069112
0.0 76.5989 10800 0.5677 27583320
0.0 78.0143 11000 0.5812 28092696
0.0 79.4349 11200 0.5774 28604784
0.0 80.8556 11400 0.5948 29118496
0.0 82.2709 11600 0.5939 29628968
0.0 83.6916 11800 0.5948 30142648
0.0 85.1070 12000 0.6109 30650416
0.0 86.5276 12200 0.6012 31164200
0.0 87.9483 12400 0.6139 31680176
0.0 89.3636 12600 0.6123 32191784
0.0 90.7843 12800 0.6302 32704016
0.0 92.1996 13000 0.6097 33211832
0.0 93.6203 13200 0.6330 33725272
0.0 95.0357 13400 0.6259 34239520
0.0 96.4563 13600 0.6247 34749688
0.0 97.8770 13800 0.6368 35255888
0.0 99.2923 14000 0.6383 35764264
0.0 100.7130 14200 0.6436 36272560
0.0 102.1283 14400 0.6400 36779392
0.0 103.5490 14600 0.6422 37288688
0.0 104.9697 14800 0.6506 37798808
0.0 106.3850 15000 0.6455 38306176
0.0 107.8057 15200 0.6585 38818056
0.0 109.2210 15400 0.6545 39327152
0.0 110.6417 15600 0.6747 39834472
0.0 112.0570 15800 0.6589 40347168
0.0 113.4777 16000 0.6638 40861968
0.0 114.8984 16200 0.6717 41373408
0.0 116.3137 16400 0.6828 41884568
0.0 117.7344 16600 0.6751 42393232
0.0 119.1497 16800 0.6763 42901968
0.0 120.5704 17000 0.6716 43418880
0.0 121.9911 17200 0.6751 43930128
0.0 123.4064 17400 0.6885 44439224
0.0 124.8271 17600 0.6910 44949200
0.0 126.2424 17800 0.6787 45456488
0.0 127.6631 18000 0.6914 45966968
0.0 129.0784 18200 0.6963 46478752
0.0 130.4991 18400 0.7079 46990144
0.0 131.9198 18600 0.7064 47496384
0.0 133.3351 18800 0.7120 48002064
0.0 134.7558 19000 0.7207 48514080
0.0 136.1711 19200 0.7277 49020824
0.0 137.5918 19400 0.7413 49536472
0.0 139.0071 19600 0.7401 50047616
0.0 140.4278 19800 0.7618 50560928
0.0 141.8485 20000 0.7576 51077560
0.0 143.2638 20200 0.7553 51589728
0.0 144.6845 20400 0.7623 52091960
0.0 146.0998 20600 0.7690 52599968
0.0 147.5205 20800 0.7673 53105080
0.0 148.9412 21000 0.7606 53615088
0.0 150.3565 21200 0.7605 54126520
0.0 151.7772 21400 0.7900 54637568
0.0 153.1925 21600 0.7664 55145992
0.0 154.6132 21800 0.7468 55658768
0.0 156.0285 22000 0.7568 56165752
0.0 157.4492 22200 0.7631 56680344
0.0 158.8699 22400 0.7582 57190184
0.0 160.2852 22600 0.7520 57701264
0.0 161.7059 22800 0.7708 58206800
0.0 163.1212 23000 0.7630 58714616
0.0 164.5419 23200 0.7519 59223288
0.0 165.9626 23400 0.7630 59731816
0.0 167.3779 23600 0.7588 60238984
0.0 168.7986 23800 0.7505 60751824
0.0 170.2139 24000 0.3984 61264200
0.0 171.6346 24200 0.3611 61774128
0.0 173.0499 24400 0.3774 62287528
0.0 174.4706 24600 0.3918 62802568
0.0 175.8913 24800 0.4074 63313384
0.0 177.3066 25000 0.4131 63824360
0.0 178.7273 25200 0.4258 64334200
0.0 180.1426 25400 0.4303 64843720
0.0 181.5633 25600 0.4473 65355856
0.0 182.9840 25800 0.4498 65867080
0.0 184.3993 26000 0.4586 66376432
0.0 185.8200 26200 0.4608 66891552
0.0 187.2353 26400 0.4639 67395432
0.0 188.6560 26600 0.4765 67911272
0.0 190.0713 26800 0.4785 68421624
0.0 191.4920 27000 0.4857 68928904
0.0 192.9127 27200 0.4869 69438872
0.0 194.3280 27400 0.4987 69956344
0.0 195.7487 27600 0.5055 70469232
0.0 197.1640 27800 0.5042 70980552
0.0 198.5847 28000 0.5134 71494368
0.0 200.0 28200 0.5235 71999768
0.0 201.4207 28400 0.5179 72508480
0.0 202.8414 28600 0.5304 73019472
0.0 204.2567 28800 0.5323 73527632
0.0 205.6774 29000 0.5366 74041024
0.0 207.0927 29200 0.5414 74544840
0.0 208.5134 29400 0.5502 75056168
0.0 209.9340 29600 0.5589 75568152
0.0 211.3494 29800 0.5496 76079080
0.0 212.7701 30000 0.5585 76588464
0.0 214.1854 30200 0.5567 77091184
0.0 215.6061 30400 0.5617 77604592
0.0 217.0214 30600 0.5656 78117872
0.0 218.4421 30800 0.5774 78636456
0.0 219.8627 31000 0.5764 79145648
0.0 221.2781 31200 0.5792 79656680
0.0 222.6988 31400 0.5826 80170976
0.0 224.1141 31600 0.5820 80680568
0.0 225.5348 31800 0.5906 81190016
0.0 226.9554 32000 0.5976 81700080
0.0 228.3708 32200 0.5978 82211496
0.0 229.7914 32400 0.5975 82723360
0.0 231.2068 32600 0.6007 83233760
0.0 232.6275 32800 0.6094 83744152
0.0 234.0428 33000 0.6164 84252696
0.0 235.4635 33200 0.6051 84766920
0.0 236.8841 33400 0.6043 85270792
0.0 238.2995 33600 0.6227 85785816
0.0 239.7201 33800 0.6132 86297192
0.0 241.1355 34000 0.6190 86800112
0.0 242.5561 34200 0.6188 87308856
0.0 243.9768 34400 0.6290 87823904
0.0 245.3922 34600 0.6303 88328184
0.0 246.8128 34800 0.6306 88842368
0.0 248.2282 35000 0.6350 89352024
0.0 249.6488 35200 0.6324 89859880
0.0 251.0642 35400 0.6408 90371864
0.0 252.4848 35600 0.6337 90889904
0.0 253.9055 35800 0.6411 91398024
0.0 255.3209 36000 0.6357 91909952
0.0 256.7415 36200 0.6342 92415808
0.0 258.1569 36400 0.6490 92924744
0.0 259.5775 36600 0.6418 93438136
0.0 260.9982 36800 0.6424 93945688
0.0 262.4135 37000 0.6511 94456488
0.0 263.8342 37200 0.6493 94967816
0.0 265.2496 37400 0.6465 95480152
0.0 266.6702 37600 0.6456 95992952
0.0 268.0856 37800 0.6453 96503784
0.0 269.5062 38000 0.6600 97017704
0.0 270.9269 38200 0.6490 97525848
0.0 272.3422 38400 0.6451 98034592
0.0 273.7629 38600 0.6451 98546400
0.0 275.1783 38800 0.6476 99055216
0.0 276.5989 39000 0.6474 99570208
0.0 278.0143 39200 0.6456 100077240
0.0 279.4349 39400 0.6475 100585296
0.0 280.8556 39600 0.6512 101096120
0.0 282.2709 39800 0.6429 101609904
0.0 283.6916 40000 0.6471 102120968

Framework versions

  • PEFT 0.15.1
  • Transformers 4.51.3
  • Pytorch 2.6.0+cu124
  • Datasets 3.5.0
  • Tokenizers 0.21.1
Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_rte_1744902656

Adapter
(81)
this model