smartmind-cyberone-20250410_x2
This model is a fine-tuned version of PowerInfer/SmallThinker-3B-Preview on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.0159
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 8
- total_train_batch_size: 64
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine_with_restarts
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 5
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
0.6761 | 0.0499 | 276 | 0.2245 |
0.2072 | 0.0998 | 552 | 0.1757 |
0.1812 | 0.1498 | 828 | 0.1140 |
0.1469 | 0.1997 | 1104 | 0.1493 |
0.1224 | 0.2496 | 1380 | 0.0789 |
0.1142 | 0.2995 | 1656 | 0.1227 |
0.1194 | 0.3494 | 1932 | 0.0812 |
0.1048 | 0.3994 | 2208 | 0.0452 |
0.1145 | 0.4493 | 2484 | 0.0593 |
0.0943 | 0.4992 | 2760 | 0.0880 |
0.1149 | 0.5491 | 3036 | 0.2158 |
0.2192 | 0.5990 | 3312 | 0.1650 |
0.123 | 0.6490 | 3588 | 0.1046 |
0.1071 | 0.6989 | 3864 | 0.0775 |
0.0936 | 0.7488 | 4140 | 0.1638 |
0.0867 | 0.7987 | 4416 | 0.0447 |
0.0832 | 0.8486 | 4692 | 0.0624 |
0.1466 | 0.8986 | 4968 | 0.3147 |
0.0932 | 0.9485 | 5244 | 0.0552 |
0.0897 | 0.9984 | 5520 | 0.0408 |
0.0694 | 1.0485 | 5796 | 0.0458 |
0.0714 | 1.0984 | 6072 | 0.0582 |
0.0737 | 1.1483 | 6348 | 0.0550 |
0.0796 | 1.1982 | 6624 | 0.0386 |
0.0621 | 1.2482 | 6900 | 0.0586 |
0.0578 | 1.2981 | 7176 | 0.0283 |
0.0539 | 1.3480 | 7452 | 0.0320 |
0.0491 | 1.3979 | 7728 | 0.0518 |
0.0448 | 1.4478 | 8004 | 0.0360 |
0.0475 | 1.4978 | 8280 | 0.0403 |
0.0411 | 1.5477 | 8556 | 0.0217 |
0.0382 | 1.5976 | 8832 | 0.0255 |
0.0453 | 1.6475 | 9108 | 0.0215 |
0.0424 | 1.6974 | 9384 | 0.0250 |
0.039 | 1.7473 | 9660 | 0.0247 |
0.0393 | 1.7973 | 9936 | 0.0230 |
0.0384 | 1.8472 | 10212 | 0.0200 |
0.032 | 1.8971 | 10488 | 0.0210 |
0.0352 | 1.9470 | 10764 | 0.0234 |
0.0346 | 1.9969 | 11040 | 0.0228 |
0.0331 | 2.0470 | 11316 | 0.0276 |
0.0314 | 2.0969 | 11592 | 0.0219 |
0.0355 | 2.1469 | 11868 | 0.0208 |
0.0271 | 2.1968 | 12144 | 0.0235 |
0.0258 | 2.2467 | 12420 | 0.0197 |
0.0286 | 2.2966 | 12696 | 0.0191 |
0.0284 | 2.3465 | 12972 | 0.0203 |
0.0251 | 2.3965 | 13248 | 0.0177 |
0.0273 | 2.4464 | 13524 | 0.0171 |
0.0244 | 2.4963 | 13800 | 0.0157 |
0.0247 | 2.5462 | 14076 | 0.0150 |
0.0256 | 2.5961 | 14352 | 0.0149 |
0.0227 | 2.6461 | 14628 | 0.0156 |
0.0257 | 2.6960 | 14904 | 0.0155 |
0.0217 | 2.7459 | 15180 | 0.0156 |
0.0243 | 2.7958 | 15456 | 0.0688 |
0.047 | 2.8457 | 15732 | 0.0269 |
0.0511 | 2.8957 | 16008 | 0.0220 |
0.0526 | 2.9456 | 16284 | 0.0311 |
0.0441 | 2.9955 | 16560 | 0.0264 |
0.0383 | 3.0456 | 16836 | 0.0263 |
0.0333 | 3.0955 | 17112 | 0.0239 |
0.0484 | 3.1454 | 17388 | 0.0328 |
0.0431 | 3.1953 | 17664 | 0.0268 |
0.0394 | 3.2453 | 17940 | 0.0409 |
0.0406 | 3.2952 | 18216 | 0.0388 |
0.038 | 3.3451 | 18492 | 0.0312 |
0.0391 | 3.3950 | 18768 | 0.0261 |
0.0361 | 3.4449 | 19044 | 0.0259 |
0.0485 | 3.4949 | 19320 | 0.0393 |
0.0394 | 3.5448 | 19596 | 0.0564 |
0.0391 | 3.5947 | 19872 | 0.0466 |
0.0388 | 3.6446 | 20148 | 0.0571 |
0.0326 | 3.6945 | 20424 | 0.0354 |
0.0428 | 3.7445 | 20700 | 0.0282 |
0.0342 | 3.7944 | 20976 | 0.0212 |
0.0389 | 3.8443 | 21252 | 0.0304 |
0.0369 | 3.8942 | 21528 | 0.0273 |
0.0298 | 3.9441 | 21804 | 0.0215 |
0.027 | 3.9941 | 22080 | 0.0234 |
0.0334 | 4.0441 | 22356 | 0.0218 |
0.0316 | 4.0941 | 22632 | 0.0241 |
0.0296 | 4.1440 | 22908 | 0.0228 |
0.0324 | 4.1939 | 23184 | 0.0183 |
0.0286 | 4.2438 | 23460 | 0.0196 |
0.0213 | 4.2937 | 23736 | 0.0219 |
0.0299 | 4.3437 | 24012 | 0.0226 |
0.0253 | 4.3936 | 24288 | 0.0223 |
0.0222 | 4.4435 | 24564 | 0.0186 |
0.0228 | 4.4934 | 24840 | 0.0209 |
0.0265 | 4.5433 | 25116 | 0.0166 |
0.0224 | 4.5932 | 25392 | 0.0196 |
0.0257 | 4.6432 | 25668 | 0.0198 |
0.0278 | 4.6931 | 25944 | 0.0178 |
0.0236 | 4.7430 | 26220 | 0.0174 |
0.0225 | 4.7929 | 26496 | 0.0165 |
0.024 | 4.8428 | 26772 | 0.0163 |
0.0244 | 4.8928 | 27048 | 0.0159 |
0.0233 | 4.9427 | 27324 | 0.0159 |
0.0252 | 4.9926 | 27600 | 0.0159 |
Framework versions
- Transformers 4.50.3
- Pytorch 2.5.1+cu124
- Datasets 3.5.0
- Tokenizers 0.21.1
- Downloads last month
- 8
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for yangwooko/smartmind-cyberone-20250410_x2
Base model
Qwen/Qwen2.5-3B
Finetuned
Qwen/Qwen2.5-3B-Instruct
Finetuned
PowerInfer/SmallThinker-3B-Preview