modernbert-llm-router

This model is a fine-tuned version of answerdotai/ModernBERT-large on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5828
  • F1: 0.6346
  • Macro F1: 0.6346
  • Precision: 0.6742
  • Cross Entropy: 0.8262
  • Min Class Accuracy: 0.469
  • Confusion Matrix: [[927, 67, 6], [445, 469, 86], [170, 291, 539]]
  • Accuracy Class 0: 0.927
  • Accuracy Class 1: 0.469
  • Accuracy Class 2: 0.539

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 64
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss F1 Macro F1 Precision Cross Entropy Min Class Accuracy Confusion Matrix Accuracy Class 0 Accuracy Class 1 Accuracy Class 2
6.6541 0.0939 4000 0.6395 0.5917 0.5917 0.6473 0.8784 0.396 [[948, 47, 5], [521, 396, 83], [219, 298, 483]] 0.948 0.396 0.483
6.5576 0.0986 4200 0.6015 0.6205 0.6205 0.6615 0.8458 0.464 [[925, 67, 8], [446, 464, 90], [178, 314, 508]] 0.925 0.464 0.508
6.4177 0.1033 4400 0.5828 0.6346 0.6346 0.6742 0.8262 0.469 [[927, 67, 6], [445, 469, 86], [170, 291, 539]] 0.927 0.469 0.539
6.5136 0.1080 4600 0.6152 0.6052 0.6052 0.6613 0.8551 0.46 [[934, 61, 5], [475, 463, 62], [178, 362, 460]] 0.934 0.463 0.46
6.457 0.1127 4800 0.5772 0.6214 0.6214 0.6724 0.8266 0.467 [[923, 72, 5], [426, 509, 65], [174, 359, 467]] 0.923 0.509 0.467
6.3853 0.1174 5000 0.6392 0.5743 0.5743 0.6504 0.8829 0.384 [[940, 57, 3], [493, 458, 49], [215, 401, 384]] 0.94 0.458 0.384
6.2768 0.1221 5200 0.6634 0.5896 0.5896 0.6577 0.8922 0.425 [[949, 48, 3], [494, 448, 58], [225, 350, 425]] 0.949 0.448 0.425
6.2838 0.1268 5400 0.6526 0.5862 0.5862 0.6543 0.8866 0.398 [[963, 32, 5], [539, 398, 63], [237, 308, 455]] 0.963 0.398 0.455
6.2685 0.1314 5600 0.5996 0.6250 0.6250 0.6681 0.8341 0.472 [[928, 66, 6], [447, 472, 81], [171, 320, 509]] 0.928 0.472 0.509
6.2514 0.1361 5800 0.6076 0.6162 0.6162 0.6706 0.8416 0.448 [[952, 43, 5], [489, 448, 63], [179, 330, 491]] 0.952 0.448 0.491
6.1262 0.1408 6000 0.6951 0.5636 0.5636 0.6533 0.9194 0.36 [[956, 42, 2], [523, 444, 33], [198, 442, 360]] 0.956 0.444 0.36
6.1849 0.1455 6200 0.5803 0.6143 0.6143 0.6708 0.8233 0.441 [[927, 69, 4], [429, 514, 57], [152, 407, 441]] 0.927 0.514 0.441
6.0257 0.1502 6400 0.6354 0.6039 0.6039 0.6683 0.8584 0.431 [[944, 53, 3], [468, 483, 49], [165, 404, 431]] 0.944 0.483 0.431

Framework versions

  • Transformers 4.51.3
  • Pytorch 2.6.0+cu126
  • Datasets 3.5.1
  • Tokenizers 0.21.1
Downloads last month
0
Safetensors
Model size
396M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Arisp123/modernbert-llm-router

Finetuned
(105)
this model