modernbert-llm-router

This model is a fine-tuned version of answerdotai/ModernBERT-large on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.5828
F1: 0.6346
Macro F1: 0.6346
Precision: 0.6742
Cross Entropy: 0.8262
Min Class Accuracy: 0.469
Confusion Matrix: [[927, 67, 6], [445, 469, 86], [170, 291, 539]]
Accuracy Class 0: 0.927
Accuracy Class 1: 0.469
Accuracy Class 2: 0.539

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-06
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 16
total_train_batch_size: 64
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	F1	Macro F1	Precision	Cross Entropy	Min Class Accuracy	Confusion Matrix	Accuracy Class 0	Accuracy Class 1	Accuracy Class 2
6.6541	0.0939	4000	0.6395	0.5917	0.5917	0.6473	0.8784	0.396	[[948, 47, 5], [521, 396, 83], [219, 298, 483]]	0.948	0.396	0.483
6.5576	0.0986	4200	0.6015	0.6205	0.6205	0.6615	0.8458	0.464	[[925, 67, 8], [446, 464, 90], [178, 314, 508]]	0.925	0.464	0.508
6.4177	0.1033	4400	0.5828	0.6346	0.6346	0.6742	0.8262	0.469	[[927, 67, 6], [445, 469, 86], [170, 291, 539]]	0.927	0.469	0.539
6.5136	0.1080	4600	0.6152	0.6052	0.6052	0.6613	0.8551	0.46	[[934, 61, 5], [475, 463, 62], [178, 362, 460]]	0.934	0.463	0.46
6.457	0.1127	4800	0.5772	0.6214	0.6214	0.6724	0.8266	0.467	[[923, 72, 5], [426, 509, 65], [174, 359, 467]]	0.923	0.509	0.467
6.3853	0.1174	5000	0.6392	0.5743	0.5743	0.6504	0.8829	0.384	[[940, 57, 3], [493, 458, 49], [215, 401, 384]]	0.94	0.458	0.384
6.2768	0.1221	5200	0.6634	0.5896	0.5896	0.6577	0.8922	0.425	[[949, 48, 3], [494, 448, 58], [225, 350, 425]]	0.949	0.448	0.425
6.2838	0.1268	5400	0.6526	0.5862	0.5862	0.6543	0.8866	0.398	[[963, 32, 5], [539, 398, 63], [237, 308, 455]]	0.963	0.398	0.455
6.2685	0.1314	5600	0.5996	0.6250	0.6250	0.6681	0.8341	0.472	[[928, 66, 6], [447, 472, 81], [171, 320, 509]]	0.928	0.472	0.509
6.2514	0.1361	5800	0.6076	0.6162	0.6162	0.6706	0.8416	0.448	[[952, 43, 5], [489, 448, 63], [179, 330, 491]]	0.952	0.448	0.491
6.1262	0.1408	6000	0.6951	0.5636	0.5636	0.6533	0.9194	0.36	[[956, 42, 2], [523, 444, 33], [198, 442, 360]]	0.956	0.444	0.36
6.1849	0.1455	6200	0.5803	0.6143	0.6143	0.6708	0.8233	0.441	[[927, 69, 4], [429, 514, 57], [152, 407, 441]]	0.927	0.514	0.441
6.0257	0.1502	6400	0.6354	0.6039	0.6039	0.6683	0.8584	0.431	[[944, 53, 3], [468, 483, 49], [165, 404, 431]]	0.944	0.483	0.431

Framework versions

Transformers 4.51.3
Pytorch 2.6.0+cu126
Datasets 3.5.1
Tokenizers 0.21.1

Arisp123
/

modernbert-llm-router

modernbert-llm-router

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for Arisp123/modernbert-llm-router

Evaluation results