smartmind-cyberone-20250410_x2

This model is a fine-tuned version of PowerInfer/SmallThinker-3B-Preview on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0159

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 64
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine_with_restarts
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.6761 0.0499 276 0.2245
0.2072 0.0998 552 0.1757
0.1812 0.1498 828 0.1140
0.1469 0.1997 1104 0.1493
0.1224 0.2496 1380 0.0789
0.1142 0.2995 1656 0.1227
0.1194 0.3494 1932 0.0812
0.1048 0.3994 2208 0.0452
0.1145 0.4493 2484 0.0593
0.0943 0.4992 2760 0.0880
0.1149 0.5491 3036 0.2158
0.2192 0.5990 3312 0.1650
0.123 0.6490 3588 0.1046
0.1071 0.6989 3864 0.0775
0.0936 0.7488 4140 0.1638
0.0867 0.7987 4416 0.0447
0.0832 0.8486 4692 0.0624
0.1466 0.8986 4968 0.3147
0.0932 0.9485 5244 0.0552
0.0897 0.9984 5520 0.0408
0.0694 1.0485 5796 0.0458
0.0714 1.0984 6072 0.0582
0.0737 1.1483 6348 0.0550
0.0796 1.1982 6624 0.0386
0.0621 1.2482 6900 0.0586
0.0578 1.2981 7176 0.0283
0.0539 1.3480 7452 0.0320
0.0491 1.3979 7728 0.0518
0.0448 1.4478 8004 0.0360
0.0475 1.4978 8280 0.0403
0.0411 1.5477 8556 0.0217
0.0382 1.5976 8832 0.0255
0.0453 1.6475 9108 0.0215
0.0424 1.6974 9384 0.0250
0.039 1.7473 9660 0.0247
0.0393 1.7973 9936 0.0230
0.0384 1.8472 10212 0.0200
0.032 1.8971 10488 0.0210
0.0352 1.9470 10764 0.0234
0.0346 1.9969 11040 0.0228
0.0331 2.0470 11316 0.0276
0.0314 2.0969 11592 0.0219
0.0355 2.1469 11868 0.0208
0.0271 2.1968 12144 0.0235
0.0258 2.2467 12420 0.0197
0.0286 2.2966 12696 0.0191
0.0284 2.3465 12972 0.0203
0.0251 2.3965 13248 0.0177
0.0273 2.4464 13524 0.0171
0.0244 2.4963 13800 0.0157
0.0247 2.5462 14076 0.0150
0.0256 2.5961 14352 0.0149
0.0227 2.6461 14628 0.0156
0.0257 2.6960 14904 0.0155
0.0217 2.7459 15180 0.0156
0.0243 2.7958 15456 0.0688
0.047 2.8457 15732 0.0269
0.0511 2.8957 16008 0.0220
0.0526 2.9456 16284 0.0311
0.0441 2.9955 16560 0.0264
0.0383 3.0456 16836 0.0263
0.0333 3.0955 17112 0.0239
0.0484 3.1454 17388 0.0328
0.0431 3.1953 17664 0.0268
0.0394 3.2453 17940 0.0409
0.0406 3.2952 18216 0.0388
0.038 3.3451 18492 0.0312
0.0391 3.3950 18768 0.0261
0.0361 3.4449 19044 0.0259
0.0485 3.4949 19320 0.0393
0.0394 3.5448 19596 0.0564
0.0391 3.5947 19872 0.0466
0.0388 3.6446 20148 0.0571
0.0326 3.6945 20424 0.0354
0.0428 3.7445 20700 0.0282
0.0342 3.7944 20976 0.0212
0.0389 3.8443 21252 0.0304
0.0369 3.8942 21528 0.0273
0.0298 3.9441 21804 0.0215
0.027 3.9941 22080 0.0234
0.0334 4.0441 22356 0.0218
0.0316 4.0941 22632 0.0241
0.0296 4.1440 22908 0.0228
0.0324 4.1939 23184 0.0183
0.0286 4.2438 23460 0.0196
0.0213 4.2937 23736 0.0219
0.0299 4.3437 24012 0.0226
0.0253 4.3936 24288 0.0223
0.0222 4.4435 24564 0.0186
0.0228 4.4934 24840 0.0209
0.0265 4.5433 25116 0.0166
0.0224 4.5932 25392 0.0196
0.0257 4.6432 25668 0.0198
0.0278 4.6931 25944 0.0178
0.0236 4.7430 26220 0.0174
0.0225 4.7929 26496 0.0165
0.024 4.8428 26772 0.0163
0.0244 4.8928 27048 0.0159
0.0233 4.9427 27324 0.0159
0.0252 4.9926 27600 0.0159

Framework versions

  • Transformers 4.50.3
  • Pytorch 2.5.1+cu124
  • Datasets 3.5.0
  • Tokenizers 0.21.1
Downloads last month
8
Safetensors
Model size
3.09B params
Tensor type
FP16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for yangwooko/smartmind-cyberone-20250410_x2

Base model

Qwen/Qwen2.5-3B
Finetuned
(14)
this model