train_qnli_1744902614

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.3 on the qnli dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0316
  • Num Input Tokens Seen: 74724160

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.3
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • training_steps: 40000

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.1403 0.0339 200 0.1379 375872
0.1382 0.0679 400 0.1370 754656
0.1236 0.1018 600 0.1185 1127296
0.1512 0.1358 800 0.1434 1500832
0.1154 0.1697 1000 0.1269 1870752
0.1121 0.2037 1200 0.1039 2248448
0.1074 0.2376 1400 0.0942 2622784
0.0663 0.2716 1600 0.0766 2995616
0.0599 0.3055 1800 0.0593 3370144
0.0636 0.3395 2000 0.0803 3747936
0.0434 0.3734 2200 0.0447 4126560
0.0328 0.4073 2400 0.0444 4497920
0.0356 0.4413 2600 0.0593 4870432
0.056 0.4752 2800 0.0489 5242976
0.0374 0.5092 3000 0.0473 5615808
0.039 0.5431 3200 0.0407 5984672
0.0668 0.5771 3400 0.0394 6356832
0.0352 0.6110 3600 0.0430 6732928
0.0353 0.6450 3800 0.0391 7111456
0.0274 0.6789 4000 0.0406 7481824
0.0373 0.7129 4200 0.0376 7857440
0.0283 0.7468 4400 0.0400 8229632
0.0367 0.7808 4600 0.0387 8601824
0.0279 0.8147 4800 0.0371 8974688
0.0372 0.8486 5000 0.0377 9345088
0.0395 0.8826 5200 0.0378 9720928
0.0279 0.9165 5400 0.0391 10090976
0.0322 0.9505 5600 0.0356 10461824
0.056 0.9844 5800 0.0364 10837568
0.0379 1.0183 6000 0.0380 11211008
0.0378 1.0523 6200 0.0386 11582528
0.0369 1.0862 6400 0.0357 11958208
0.0585 1.1202 6600 0.0386 12334752
0.0421 1.1541 6800 0.0363 12710176
0.0202 1.1881 7000 0.0357 13083200
0.0364 1.2220 7200 0.0394 13458944
0.0389 1.2560 7400 0.0360 13836256
0.0582 1.2899 7600 0.0425 14209248
0.0686 1.3238 7800 0.0377 14585344
0.0314 1.3578 8000 0.0344 14955328
0.0167 1.3917 8200 0.0365 15331776
0.021 1.4257 8400 0.0431 15706624
0.0217 1.4596 8600 0.0335 16075392
0.0285 1.4936 8800 0.0388 16445568
0.0705 1.5275 9000 0.0363 16819648
0.0413 1.5615 9200 0.0374 17191872
0.0967 1.5954 9400 0.0344 17561280
0.0372 1.6294 9600 0.0344 17936128
0.0583 1.6633 9800 0.0345 18307616
0.0593 1.6972 10000 0.0531 18683168
0.0296 1.7312 10200 0.0378 19053408
0.0555 1.7651 10400 0.0358 19427296
0.0365 1.7991 10600 0.0348 19802400
0.0244 1.8330 10800 0.0360 20173056
0.0426 1.8670 11000 0.0337 20550720
0.0393 1.9009 11200 0.0336 20920224
0.04 1.9349 11400 0.0355 21289344
0.0205 1.9688 11600 0.0337 21666048
0.0417 2.0027 11800 0.0334 22041760
0.0263 2.0367 12000 0.0350 22412256
0.0388 2.0706 12200 0.0334 22782848
0.03 2.1046 12400 0.0345 23151392
0.0277 2.1385 12600 0.0355 23523648
0.0279 2.1724 12800 0.0370 23892992
0.0176 2.2064 13000 0.0340 24264192
0.0311 2.2403 13200 0.0337 24635264
0.0665 2.2743 13400 0.0340 25009664
0.0389 2.3082 13600 0.0339 25382432
0.0443 2.3422 13800 0.0356 25755616
0.0414 2.3761 14000 0.0348 26131424
0.0413 2.4101 14200 0.0357 26504960
0.0291 2.4440 14400 0.0342 26877888
0.0446 2.4780 14600 0.0343 27248384
0.0487 2.5119 14800 0.0328 27625376
0.0329 2.5458 15000 0.0357 28005696
0.0415 2.5798 15200 0.0333 28379936
0.0319 2.6137 15400 0.0333 28749536
0.0199 2.6477 15600 0.0332 29128672
0.024 2.6816 15800 0.0324 29503456
0.0393 2.7156 16000 0.0340 29874176
0.054 2.7495 16200 0.0340 30251904
0.0168 2.7835 16400 0.0330 30626560
0.0251 2.8174 16600 0.0378 30999968
0.0355 2.8514 16800 0.0342 31376704
0.0345 2.8853 17000 0.0322 31749472
0.039 2.9193 17200 0.0326 32128320
0.0255 2.9532 17400 0.0322 32501056
0.0341 2.9871 17600 0.0329 32872640
0.0246 3.0210 17800 0.0356 33243744
0.0205 3.0550 18000 0.0323 33619808
0.0476 3.0889 18200 0.0336 33994048
0.0108 3.1229 18400 0.0335 34361920
0.0366 3.1568 18600 0.0326 34735392
0.0173 3.1908 18800 0.0334 35107872
0.0292 3.2247 19000 0.0337 35486976
0.0389 3.2587 19200 0.0348 35862880
0.023 3.2926 19400 0.0327 36237280
0.0393 3.3266 19600 0.0338 36614176
0.0259 3.3605 19800 0.0332 36987200
0.021 3.3944 20000 0.0333 37357312
0.0167 3.4284 20200 0.0344 37728448
0.0357 3.4623 20400 0.0324 38104736
0.0305 3.4963 20600 0.0338 38477696
0.0262 3.5302 20800 0.0331 38847808
0.015 3.5642 21000 0.0329 39222464
0.0269 3.5981 21200 0.0351 39595392
0.0226 3.6321 21400 0.0323 39971968
0.0376 3.6660 21600 0.0323 40341952
0.0289 3.7000 21800 0.0329 40713376
0.0246 3.7339 22000 0.0335 41085856
0.0225 3.7679 22200 0.0320 41461568
0.0214 3.8018 22400 0.0319 41833280
0.0295 3.8357 22600 0.0317 42205152
0.0266 3.8697 22800 0.0319 42578144
0.0238 3.9036 23000 0.0317 42956608
0.0186 3.9376 23200 0.0316 43327904
0.0178 3.9715 23400 0.0323 43700960
0.0152 4.0054 23600 0.0318 44077568
0.0102 4.0394 23800 0.0318 44449632
0.0318 4.0733 24000 0.0331 44825184
0.0223 4.1073 24200 0.0351 45195872
0.0171 4.1412 24400 0.0353 45566816
0.0199 4.1752 24600 0.0350 45945824
0.0164 4.2091 24800 0.0363 46322304
0.0106 4.2431 25000 0.0341 46694976
0.0236 4.2770 25200 0.0338 47069472
0.0321 4.3109 25400 0.0353 47444064
0.0127 4.3449 25600 0.0332 47819744
0.043 4.3788 25800 0.0342 48190912
0.0119 4.4128 26000 0.0352 48563040
0.0415 4.4467 26200 0.0347 48936320
0.0175 4.4807 26400 0.0343 49306944
0.0268 4.5146 26600 0.0334 49683712
0.0069 4.5486 26800 0.0340 50057824
0.0141 4.5825 27000 0.0334 50431552
0.0099 4.6165 27200 0.0332 50808576
0.0232 4.6504 27400 0.0336 51182144
0.0133 4.6843 27600 0.0350 51554016
0.0285 4.7183 27800 0.0336 51925888
0.0206 4.7522 28000 0.0340 52295168
0.0159 4.7862 28200 0.0339 52664096
0.0134 4.8201 28400 0.0340 53038784
0.0297 4.8541 28600 0.0334 53412352
0.0241 4.8880 28800 0.0332 53788608
0.0168 4.9220 29000 0.0336 54166176
0.029 4.9559 29200 0.0341 54541216
0.0257 4.9899 29400 0.0330 54916928
0.0135 5.0238 29600 0.0346 55288160
0.0023 5.0577 29800 0.0358 55662784
0.0044 5.0917 30000 0.0362 56034432
0.0101 5.1256 30200 0.0361 56405792
0.0486 5.1595 30400 0.0376 56777504
0.0207 5.1935 30600 0.0368 57149760
0.0126 5.2274 30800 0.0365 57521536
0.0039 5.2614 31000 0.0370 57889408
0.0134 5.2953 31200 0.0370 58258624
0.0134 5.3293 31400 0.0368 58635520
0.0303 5.3632 31600 0.0380 59006592
0.0086 5.3972 31800 0.0383 59381312
0.0214 5.4311 32000 0.0363 59761568
0.0218 5.4651 32200 0.0365 60138720
0.0248 5.4990 32400 0.0358 60511168
0.0291 5.5329 32600 0.0359 60884448
0.0246 5.5669 32800 0.0371 61259680
0.014 5.6008 33000 0.0365 61636416
0.0335 5.6348 33200 0.0375 62013760
0.0276 5.6687 33400 0.0377 62389440
0.0313 5.7027 33600 0.0367 62764512
0.0146 5.7366 33800 0.0373 63139872
0.0055 5.7706 34000 0.0374 63517632
0.0281 5.8045 34200 0.0373 63889248
0.0073 5.8385 34400 0.0372 64262048
0.0153 5.8724 34600 0.0372 64632256
0.0094 5.9064 34800 0.0374 65006944
0.0494 5.9403 35000 0.0374 65382656
0.0178 5.9742 35200 0.0380 65756992
0.008 6.0081 35400 0.0387 66125280
0.0164 6.0421 35600 0.0390 66493536
0.015 6.0760 35800 0.0392 66867936
0.0131 6.1100 36000 0.0397 67243328
0.0232 6.1439 36200 0.0398 67616992
0.0015 6.1779 36400 0.0406 67995520
0.0233 6.2118 36600 0.0406 68370624
0.0046 6.2458 36800 0.0404 68746880
0.0144 6.2797 37000 0.0404 69119328
0.0356 6.3137 37200 0.0405 69490336
0.0216 6.3476 37400 0.0401 69862688
0.0035 6.3816 37600 0.0401 70238592
0.0208 6.4155 37800 0.0399 70612608
0.0144 6.4494 38000 0.0401 70985568
0.0045 6.4834 38200 0.0400 71360704
0.0039 6.5173 38400 0.0401 71738432
0.0028 6.5513 38600 0.0401 72112640
0.0152 6.5852 38800 0.0401 72484256
0.0065 6.6192 39000 0.0401 72858912
0.0217 6.6531 39200 0.0401 73232576
0.0166 6.6871 39400 0.0402 73604352
0.0083 6.7210 39600 0.0401 73975648
0.0159 6.7550 39800 0.0402 74349632
0.009 6.7889 40000 0.0402 74724160

Framework versions

  • PEFT 0.15.1
  • Transformers 4.51.3
  • Pytorch 2.6.0+cu124
  • Datasets 3.5.0
  • Tokenizers 0.21.1
Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_qnli_1744902614

Adapter
(355)
this model