modernbert-llm-router
This model is a fine-tuned version of answerdotai/ModernBERT-large on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.5828
- F1: 0.6346
- Macro F1: 0.6346
- Precision: 0.6742
- Cross Entropy: 0.8262
- Min Class Accuracy: 0.469
- Confusion Matrix: [[927, 67, 6], [445, 469, 86], [170, 291, 539]]
- Accuracy Class 0: 0.927
- Accuracy Class 1: 0.469
- Accuracy Class 2: 0.539
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- gradient_accumulation_steps: 16
- total_train_batch_size: 64
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | F1 | Macro F1 | Precision | Cross Entropy | Min Class Accuracy | Confusion Matrix | Accuracy Class 0 | Accuracy Class 1 | Accuracy Class 2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
6.6541 | 0.0939 | 4000 | 0.6395 | 0.5917 | 0.5917 | 0.6473 | 0.8784 | 0.396 | [[948, 47, 5], [521, 396, 83], [219, 298, 483]] | 0.948 | 0.396 | 0.483 |
6.5576 | 0.0986 | 4200 | 0.6015 | 0.6205 | 0.6205 | 0.6615 | 0.8458 | 0.464 | [[925, 67, 8], [446, 464, 90], [178, 314, 508]] | 0.925 | 0.464 | 0.508 |
6.4177 | 0.1033 | 4400 | 0.5828 | 0.6346 | 0.6346 | 0.6742 | 0.8262 | 0.469 | [[927, 67, 6], [445, 469, 86], [170, 291, 539]] | 0.927 | 0.469 | 0.539 |
6.5136 | 0.1080 | 4600 | 0.6152 | 0.6052 | 0.6052 | 0.6613 | 0.8551 | 0.46 | [[934, 61, 5], [475, 463, 62], [178, 362, 460]] | 0.934 | 0.463 | 0.46 |
6.457 | 0.1127 | 4800 | 0.5772 | 0.6214 | 0.6214 | 0.6724 | 0.8266 | 0.467 | [[923, 72, 5], [426, 509, 65], [174, 359, 467]] | 0.923 | 0.509 | 0.467 |
6.3853 | 0.1174 | 5000 | 0.6392 | 0.5743 | 0.5743 | 0.6504 | 0.8829 | 0.384 | [[940, 57, 3], [493, 458, 49], [215, 401, 384]] | 0.94 | 0.458 | 0.384 |
6.2768 | 0.1221 | 5200 | 0.6634 | 0.5896 | 0.5896 | 0.6577 | 0.8922 | 0.425 | [[949, 48, 3], [494, 448, 58], [225, 350, 425]] | 0.949 | 0.448 | 0.425 |
6.2838 | 0.1268 | 5400 | 0.6526 | 0.5862 | 0.5862 | 0.6543 | 0.8866 | 0.398 | [[963, 32, 5], [539, 398, 63], [237, 308, 455]] | 0.963 | 0.398 | 0.455 |
6.2685 | 0.1314 | 5600 | 0.5996 | 0.6250 | 0.6250 | 0.6681 | 0.8341 | 0.472 | [[928, 66, 6], [447, 472, 81], [171, 320, 509]] | 0.928 | 0.472 | 0.509 |
6.2514 | 0.1361 | 5800 | 0.6076 | 0.6162 | 0.6162 | 0.6706 | 0.8416 | 0.448 | [[952, 43, 5], [489, 448, 63], [179, 330, 491]] | 0.952 | 0.448 | 0.491 |
6.1262 | 0.1408 | 6000 | 0.6951 | 0.5636 | 0.5636 | 0.6533 | 0.9194 | 0.36 | [[956, 42, 2], [523, 444, 33], [198, 442, 360]] | 0.956 | 0.444 | 0.36 |
6.1849 | 0.1455 | 6200 | 0.5803 | 0.6143 | 0.6143 | 0.6708 | 0.8233 | 0.441 | [[927, 69, 4], [429, 514, 57], [152, 407, 441]] | 0.927 | 0.514 | 0.441 |
6.0257 | 0.1502 | 6400 | 0.6354 | 0.6039 | 0.6039 | 0.6683 | 0.8584 | 0.431 | [[944, 53, 3], [468, 483, 49], [165, 404, 431]] | 0.944 | 0.483 | 0.431 |
Framework versions
- Transformers 4.51.3
- Pytorch 2.6.0+cu126
- Datasets 3.5.1
- Tokenizers 0.21.1
- Downloads last month
- 0
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for Arisp123/modernbert-llm-router
Base model
answerdotai/ModernBERT-large