smartmind-cyberone-20250410_x2

This model is a fine-tuned version of PowerInfer/SmallThinker-3B-Preview on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.0159

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 8
total_train_batch_size: 64
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine_with_restarts
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
0.6761	0.0499	276	0.2245
0.2072	0.0998	552	0.1757
0.1812	0.1498	828	0.1140
0.1469	0.1997	1104	0.1493
0.1224	0.2496	1380	0.0789
0.1142	0.2995	1656	0.1227
0.1194	0.3494	1932	0.0812
0.1048	0.3994	2208	0.0452
0.1145	0.4493	2484	0.0593
0.0943	0.4992	2760	0.0880
0.1149	0.5491	3036	0.2158
0.2192	0.5990	3312	0.1650
0.123	0.6490	3588	0.1046
0.1071	0.6989	3864	0.0775
0.0936	0.7488	4140	0.1638
0.0867	0.7987	4416	0.0447
0.0832	0.8486	4692	0.0624
0.1466	0.8986	4968	0.3147
0.0932	0.9485	5244	0.0552
0.0897	0.9984	5520	0.0408
0.0694	1.0485	5796	0.0458
0.0714	1.0984	6072	0.0582
0.0737	1.1483	6348	0.0550
0.0796	1.1982	6624	0.0386
0.0621	1.2482	6900	0.0586
0.0578	1.2981	7176	0.0283
0.0539	1.3480	7452	0.0320
0.0491	1.3979	7728	0.0518
0.0448	1.4478	8004	0.0360
0.0475	1.4978	8280	0.0403
0.0411	1.5477	8556	0.0217
0.0382	1.5976	8832	0.0255
0.0453	1.6475	9108	0.0215
0.0424	1.6974	9384	0.0250
0.039	1.7473	9660	0.0247
0.0393	1.7973	9936	0.0230
0.0384	1.8472	10212	0.0200
0.032	1.8971	10488	0.0210
0.0352	1.9470	10764	0.0234
0.0346	1.9969	11040	0.0228
0.0331	2.0470	11316	0.0276
0.0314	2.0969	11592	0.0219
0.0355	2.1469	11868	0.0208
0.0271	2.1968	12144	0.0235
0.0258	2.2467	12420	0.0197
0.0286	2.2966	12696	0.0191
0.0284	2.3465	12972	0.0203
0.0251	2.3965	13248	0.0177
0.0273	2.4464	13524	0.0171
0.0244	2.4963	13800	0.0157
0.0247	2.5462	14076	0.0150
0.0256	2.5961	14352	0.0149
0.0227	2.6461	14628	0.0156
0.0257	2.6960	14904	0.0155
0.0217	2.7459	15180	0.0156
0.0243	2.7958	15456	0.0688
0.047	2.8457	15732	0.0269
0.0511	2.8957	16008	0.0220
0.0526	2.9456	16284	0.0311
0.0441	2.9955	16560	0.0264
0.0383	3.0456	16836	0.0263
0.0333	3.0955	17112	0.0239
0.0484	3.1454	17388	0.0328
0.0431	3.1953	17664	0.0268
0.0394	3.2453	17940	0.0409
0.0406	3.2952	18216	0.0388
0.038	3.3451	18492	0.0312
0.0391	3.3950	18768	0.0261
0.0361	3.4449	19044	0.0259
0.0485	3.4949	19320	0.0393
0.0394	3.5448	19596	0.0564
0.0391	3.5947	19872	0.0466
0.0388	3.6446	20148	0.0571
0.0326	3.6945	20424	0.0354
0.0428	3.7445	20700	0.0282
0.0342	3.7944	20976	0.0212
0.0389	3.8443	21252	0.0304
0.0369	3.8942	21528	0.0273
0.0298	3.9441	21804	0.0215
0.027	3.9941	22080	0.0234
0.0334	4.0441	22356	0.0218
0.0316	4.0941	22632	0.0241
0.0296	4.1440	22908	0.0228
0.0324	4.1939	23184	0.0183
0.0286	4.2438	23460	0.0196
0.0213	4.2937	23736	0.0219
0.0299	4.3437	24012	0.0226
0.0253	4.3936	24288	0.0223
0.0222	4.4435	24564	0.0186
0.0228	4.4934	24840	0.0209
0.0265	4.5433	25116	0.0166
0.0224	4.5932	25392	0.0196
0.0257	4.6432	25668	0.0198
0.0278	4.6931	25944	0.0178
0.0236	4.7430	26220	0.0174
0.0225	4.7929	26496	0.0165
0.024	4.8428	26772	0.0163
0.0244	4.8928	27048	0.0159
0.0233	4.9427	27324	0.0159
0.0252	4.9926	27600	0.0159

Framework versions

Transformers 4.50.3
Pytorch 2.5.1+cu124
Datasets 3.5.0
Tokenizers 0.21.1

yangwooko
/

smartmind-cyberone-20250410_x2

smartmind-cyberone-20250410_x2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for yangwooko/smartmind-cyberone-20250410_x2

Evaluation results