smartmind-cyberone-20250410_x10

This model is a fine-tuned version of PowerInfer/SmallThinker-3B-Preview on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0078

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 64
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine_with_restarts
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.5867 0.0499 310 0.1835
0.2091 0.0998 620 0.1088
0.1618 0.1497 930 0.0802
0.1325 0.1996 1240 0.0467
0.1496 0.2495 1550 0.0908
0.1206 0.2994 1860 0.0129
0.0787 0.3493 2170 0.0497
0.1031 0.3992 2480 0.0679
0.1326 0.4491 2790 0.1064
0.0932 0.4990 3100 0.0284
0.0869 0.5488 3410 0.0149
0.0765 0.5987 3720 0.0170
0.074 0.6486 4030 0.0338
0.073 0.6985 4340 0.0443
0.0862 0.7484 4650 0.0349
0.0961 0.7983 4960 0.0203
0.1037 0.8482 5270 0.0373
0.0705 0.8981 5580 0.0240
0.0695 0.9480 5890 0.0704
0.0686 0.9979 6200 0.0189
0.061 1.0478 6510 0.0178
0.0562 1.0977 6820 0.0262
0.0707 1.1476 7130 0.0189
0.0538 1.1975 7440 0.0137
0.0498 1.2474 7750 0.0146
0.0419 1.2973 8060 0.0193
0.0373 1.3472 8370 0.0120
0.0305 1.3971 8680 0.0126
0.0276 1.4470 8990 0.0098
0.0257 1.4969 9300 0.0125
0.0288 1.5468 9610 0.0128
0.0281 1.5967 9920 0.0072
0.0273 1.6465 10230 0.0085
0.0238 1.6964 10540 0.0157
0.0237 1.7463 10850 0.0088
0.0227 1.7962 11160 0.0125
0.0237 1.8461 11470 0.0107
0.0244 1.8960 11780 0.0063
0.0201 1.9459 12090 0.0047
0.023 1.9958 12400 0.0049
0.0211 2.0457 12710 0.0038
0.0171 2.0956 13020 0.0057
0.0229 2.1455 13330 0.0097
0.018 2.1954 13640 0.0060
0.0162 2.2453 13950 0.0089
0.0202 2.2952 14260 0.0098
0.0171 2.3451 14570 0.0072
0.0195 2.3950 14880 0.0044
0.0195 2.4449 15190 0.0043
0.0173 2.4948 15500 0.0046
0.015 2.5447 15810 0.0039
0.0149 2.5946 16120 0.0041
0.0204 2.6445 16430 0.0041
0.0173 2.6944 16740 0.0041
0.0181 2.7442 17050 0.0041
0.0165 2.7941 17360 0.0067
0.0326 2.8440 17670 0.0464
0.0732 2.8939 17980 0.0393
0.0367 2.9438 18290 0.0190
0.0515 2.9937 18600 0.0347
0.0348 3.0436 18910 0.0107
0.0288 3.0935 19220 0.0103
0.0363 3.1434 19530 0.0140
0.0409 3.1933 19840 0.0131
0.0211 3.2432 20150 0.0091
0.0279 3.2931 20460 0.0164
0.0286 3.3430 20770 0.0212
0.0244 3.3929 21080 0.0140
0.0301 3.4428 21390 0.0317
0.0274 3.4927 21700 0.0140
0.0245 3.5426 22010 0.0175
0.0216 3.5925 22320 0.0160
0.0209 3.6424 22630 0.0150
0.0243 3.6923 22940 0.0137
0.0255 3.7422 23250 0.0192
0.0233 3.7920 23560 0.0168
0.021 3.8419 23870 0.0210
0.021 3.8918 24180 0.0104
0.0174 3.9417 24490 0.0121
0.0195 3.9916 24800 0.0090
0.0168 4.0415 25110 0.0100
0.0198 4.0914 25420 0.0093
0.0208 4.1413 25730 0.0103
0.0197 4.1912 26040 0.0103
0.0204 4.2411 26350 0.0097
0.0156 4.2910 26660 0.0101
0.0163 4.3409 26970 0.0120
0.0168 4.3908 27280 0.0104
0.0192 4.4407 27590 0.0095
0.0175 4.4906 27900 0.0089
0.0185 4.5405 28210 0.0089
0.0163 4.5904 28520 0.0077
0.0135 4.6403 28830 0.0074
0.0136 4.6902 29140 0.0078
0.0138 4.7401 29450 0.0077
0.016 4.7900 29760 0.0076
0.0136 4.8399 30070 0.0078
0.0199 4.8897 30380 0.0078
0.0155 4.9396 30690 0.0078
0.0136 4.9895 31000 0.0078

Framework versions

  • Transformers 4.51.3
  • Pytorch 2.5.1+cu124
  • Datasets 3.5.0
  • Tokenizers 0.21.1
Downloads last month
18
Safetensors
Model size
3.09B params
Tensor type
FP16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for yangwooko/smartmind-cyberone-20250410_x10

Base model

Qwen/Qwen2.5-3B
Finetuned
(14)
this model