train_multirc_1745950262

This model is a fine-tuned version of google/gemma-3-1b-it on the multirc dataset. It achieves the following results on the evaluation set:

  • Loss: 3.3166
  • Num Input Tokens Seen: 76963024

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 123
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • training_steps: 40000

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
2.857 0.0326 200 3.5027 385088
3.2623 0.0653 400 3.4216 770352
2.9273 0.0979 600 3.3862 1160480
2.3565 0.1305 800 3.3748 1543296
2.9929 0.1631 1000 3.3255 1931808
3.3083 0.1958 1200 3.3574 2315744
2.5698 0.2284 1400 3.3604 2710208
3.5809 0.2610 1600 3.3383 3095216
3.2428 0.2937 1800 3.3427 3483504
3.0597 0.3263 2000 3.3370 3872976
3.813 0.3589 2200 3.3336 4254272
3.866 0.3915 2400 3.3476 4637376
3.2625 0.4242 2600 3.3691 5019664
4.0913 0.4568 2800 3.3289 5406912
3.2895 0.4894 3000 3.3188 5786080
3.27 0.5221 3200 3.3495 6167600
3.4807 0.5547 3400 3.3346 6553904
3.9413 0.5873 3600 3.3282 6936656
3.0852 0.6200 3800 3.3407 7321136
3.3249 0.6526 4000 3.3292 7709856
2.8785 0.6852 4200 3.3577 8100560
3.0068 0.7178 4400 3.3204 8482208
3.9827 0.7505 4600 3.3427 8868016
3.1844 0.7831 4800 3.3166 9254560
2.9282 0.8157 5000 3.3262 9634544
3.329 0.8484 5200 3.3274 10013984
3.8631 0.8810 5400 3.3282 10397792
3.3116 0.9136 5600 3.3371 10784512
2.5499 0.9462 5800 3.3406 11165168
2.2533 0.9789 6000 3.3248 11553056
2.9946 1.0114 6200 3.3201 11940352
2.331 1.0440 6400 3.3198 12331920
2.64 1.0767 6600 3.3347 12726352
3.6418 1.1093 6800 3.3414 13105200
3.6926 1.1419 7000 3.3382 13483648
2.6131 1.1746 7200 3.3405 13862816
3.8571 1.2072 7400 3.3480 14252288
3.1471 1.2398 7600 3.3307 14638816
3.6328 1.2725 7800 3.3687 15024560
2.5802 1.3051 8000 3.3462 15412000
3.9663 1.3377 8200 3.3536 15789456
3.242 1.3703 8400 3.3473 16173616
2.9885 1.4030 8600 3.3319 16558464
3.2899 1.4356 8800 3.3373 16945488
3.8357 1.4682 9000 3.3396 17338800
2.511 1.5009 9200 3.3474 17729104
3.2625 1.5335 9400 3.3420 18107328
3.4682 1.5661 9600 3.3343 18497776
3.3546 1.5987 9800 3.3381 18881008
3.7871 1.6314 10000 3.3457 19266960
3.1788 1.6640 10200 3.3292 19650480
3.2014 1.6966 10400 3.3497 20041120
3.1338 1.7293 10600 3.3413 20421120
4.135 1.7619 10800 3.3325 20808496
3.0333 1.7945 11000 3.3395 21195024
2.8654 1.8271 11200 3.3553 21570368
2.9058 1.8598 11400 3.3378 21950896
3.2689 1.8924 11600 3.3470 22333376
2.5088 1.9250 11800 3.3491 22714512
3.3349 1.9577 12000 3.3541 23099888
2.5775 1.9903 12200 3.3717 23482400
3.0117 2.0228 12400 3.3611 23860160
3.3181 2.0555 12600 3.3665 24249008
3.4065 2.0881 12800 3.3510 24639552
3.3034 2.1207 13000 3.3573 25026880
2.4809 2.1534 13200 3.3575 25410448
2.7989 2.1860 13400 3.3628 25785744
3.3774 2.2186 13600 3.3717 26163104
3.7661 2.2512 13800 3.3681 26546240
3.3281 2.2839 14000 3.3671 26923408
3.3407 2.3165 14200 3.3710 27309344
3.6008 2.3491 14400 3.3624 27698752
3.2149 2.3818 14600 3.3701 28082208
3.1563 2.4144 14800 3.3692 28468576
3.712 2.4470 15000 3.3657 28856272
3.1448 2.4796 15200 3.3596 29234704
2.6872 2.5123 15400 3.3429 29617728
4.1079 2.5449 15600 3.3574 30004032
3.3531 2.5775 15800 3.3597 30386752
3.3259 2.6102 16000 3.3508 30774224
3.0081 2.6428 16200 3.3413 31164304
3.8045 2.6754 16400 3.3482 31548832
2.5507 2.7081 16600 3.3499 31943568
3.342 2.7407 16800 3.3370 32327088
3.1806 2.7733 17000 3.3335 32713728
2.9748 2.8059 17200 3.3408 33093744
3.2799 2.8386 17400 3.3411 33484336
2.7973 2.8712 17600 3.3399 33875072
3.497 2.9038 17800 3.3334 34264832
3.1674 2.9365 18000 3.3369 34652800
3.3584 2.9691 18200 3.3395 35036144
3.1948 3.0016 18400 3.3313 35410304
3.6125 3.0343 18600 3.3329 35808688
2.9242 3.0669 18800 3.3340 36200720
3.0657 3.0995 19000 3.3365 36580112
3.4129 3.1321 19200 3.3343 36961872
2.4262 3.1648 19400 3.3348 37345136
3.3369 3.1974 19600 3.3297 37732992
3.0873 3.2300 19800 3.3333 38118784
2.6554 3.2627 20000 3.3385 38503392
3.5597 3.2953 20200 3.3338 38885696
3.5319 3.3279 20400 3.3332 39270320
2.9296 3.3606 20600 3.3318 39665472
3.0795 3.3932 20800 3.3292 40049680
3.4726 3.4258 21000 3.3370 40436560
3.3311 3.4584 21200 3.3357 40820704
3.94 3.4911 21400 3.3415 41202080
2.414 3.5237 21600 3.3442 41588560
3.3365 3.5563 21800 3.3424 41977888
3.1493 3.5890 22000 3.3374 42361392
3.7413 3.6216 22200 3.3442 42746416
2.7835 3.6542 22400 3.3415 43126400
3.1416 3.6868 22600 3.3417 43513248
4.1077 3.7195 22800 3.3385 43896720
3.1847 3.7521 23000 3.3420 44278640
3.1111 3.7847 23200 3.3425 44666464
2.8387 3.8174 23400 3.3423 45047360
3.5614 3.8500 23600 3.3412 45426496
2.8613 3.8826 23800 3.3445 45813536
3.2588 3.9152 24000 3.3387 46192656
2.5633 3.9479 24200 3.3436 46576928
4.2529 3.9805 24400 3.3428 46965120
3.0276 4.0131 24600 3.3475 47347920
3.1426 4.0457 24800 3.3347 47741360
2.9586 4.0783 25000 3.3380 48131120
3.51 4.1109 25200 3.3403 48513200
3.5387 4.1436 25400 3.3383 48894496
2.7833 4.1762 25600 3.3424 49280736
3.2627 4.2088 25800 3.3448 49662304
3.2861 4.2415 26000 3.3386 50049312
2.8297 4.2741 26200 3.3398 50433008
3.7263 4.3067 26400 3.3382 50815824
3.2323 4.3393 26600 3.3389 51200224
3.079 4.3720 26800 3.3459 51585680
3.7935 4.4046 27000 3.3449 51969184
2.6226 4.4372 27200 3.3459 52363216
3.8058 4.4699 27400 3.3450 52737552
3.5748 4.5025 27600 3.3485 53112128
3.168 4.5351 27800 3.3442 53489200
3.2045 4.5677 28000 3.3454 53870832
3.3387 4.6004 28200 3.3447 54260848
3.2715 4.6330 28400 3.3464 54647840
2.7614 4.6656 28600 3.3428 55035376
2.8811 4.6983 28800 3.3451 55421296
3.2825 4.7309 29000 3.3448 55807776
3.315 4.7635 29200 3.3454 56188960
3.6957 4.7961 29400 3.3451 56576864
3.9208 4.8288 29600 3.3394 56959888
3.2552 4.8614 29800 3.3442 57347776
3.0983 4.8940 30000 3.3445 57727072
3.3484 4.9267 30200 3.3448 58119904
3.6294 4.9593 30400 3.3468 58503776
3.1149 4.9919 30600 3.3455 58892528
2.9508 5.0245 30800 3.3448 59278112
3.7158 5.0571 31000 3.3395 59663264
2.9366 5.0897 31200 3.3395 60047056
2.7281 5.1224 31400 3.3434 60433680
3.9076 5.1550 31600 3.3407 60809376
3.2993 5.1876 31800 3.3402 61186608
2.8036 5.2202 32000 3.3395 61567504
3.2689 5.2529 32200 3.3391 61958976
2.7509 5.2855 32400 3.3434 62346176
4.7217 5.3181 32600 3.3395 62734064
2.6531 5.3508 32800 3.3395 63124752
2.744 5.3834 33000 3.3407 63517792
2.8734 5.4160 33200 3.3407 63894896
3.7619 5.4486 33400 3.3407 64277584
3.6114 5.4813 33600 3.3402 64661856
3.2589 5.5139 33800 3.3402 65043136
3.401 5.5465 34000 3.3402 65439360
3.4793 5.5792 34200 3.3402 65819600
2.8978 5.6118 34400 3.3402 66199376
2.7495 5.6444 34600 3.3402 66583936
3.9339 5.6771 34800 3.3402 66968960
2.2472 5.7097 35000 3.3402 67361344
3.9815 5.7423 35200 3.3402 67746288
4.0091 5.7749 35400 3.3402 68131952
3.0225 5.8076 35600 3.3402 68514656
3.0446 5.8402 35800 3.3402 68904544
3.671 5.8728 36000 3.3402 69286320
3.2561 5.9055 36200 3.3402 69676640
3.7261 5.9381 36400 3.3402 70057024
3.8545 5.9707 36600 3.3402 70432848
2.8251 6.0033 36800 3.3402 70819440
3.6945 6.0359 37000 3.3402 71203008
3.9083 6.0685 37200 3.3402 71588672
3.5624 6.1012 37400 3.3402 71972608
3.4654 6.1338 37600 3.3402 72358032
3.7411 6.1664 37800 3.3402 72749840
3.2175 6.1990 38000 3.3402 73128448
3.3323 6.2317 38200 3.3402 73518048
3.9256 6.2643 38400 3.3402 73911328
3.2926 6.2969 38600 3.3402 74293168
3.0041 6.3296 38800 3.3402 74668864
3.8996 6.3622 39000 3.3402 75058640
3.4163 6.3948 39200 3.3402 75440784
3.9653 6.4274 39400 3.3402 75822528
3.666 6.4601 39600 3.3402 76198368
2.5767 6.4927 39800 3.3402 76581104
3.2328 6.5253 40000 3.3402 76963024

Framework versions

  • PEFT 0.15.2.dev0
  • Transformers 4.51.3
  • Pytorch 2.6.0+cu124
  • Datasets 3.5.0
  • Tokenizers 0.21.1
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_multirc_1745950262

Adapter
(81)
this model