smartmind-cyberone-20250410_x10

This model is a fine-tuned version of PowerInfer/SmallThinker-3B-Preview on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.0078

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 8
total_train_batch_size: 64
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine_with_restarts
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
0.5867	0.0499	310	0.1835
0.2091	0.0998	620	0.1088
0.1618	0.1497	930	0.0802
0.1325	0.1996	1240	0.0467
0.1496	0.2495	1550	0.0908
0.1206	0.2994	1860	0.0129
0.0787	0.3493	2170	0.0497
0.1031	0.3992	2480	0.0679
0.1326	0.4491	2790	0.1064
0.0932	0.4990	3100	0.0284
0.0869	0.5488	3410	0.0149
0.0765	0.5987	3720	0.0170
0.074	0.6486	4030	0.0338
0.073	0.6985	4340	0.0443
0.0862	0.7484	4650	0.0349
0.0961	0.7983	4960	0.0203
0.1037	0.8482	5270	0.0373
0.0705	0.8981	5580	0.0240
0.0695	0.9480	5890	0.0704
0.0686	0.9979	6200	0.0189
0.061	1.0478	6510	0.0178
0.0562	1.0977	6820	0.0262
0.0707	1.1476	7130	0.0189
0.0538	1.1975	7440	0.0137
0.0498	1.2474	7750	0.0146
0.0419	1.2973	8060	0.0193
0.0373	1.3472	8370	0.0120
0.0305	1.3971	8680	0.0126
0.0276	1.4470	8990	0.0098
0.0257	1.4969	9300	0.0125
0.0288	1.5468	9610	0.0128
0.0281	1.5967	9920	0.0072
0.0273	1.6465	10230	0.0085
0.0238	1.6964	10540	0.0157
0.0237	1.7463	10850	0.0088
0.0227	1.7962	11160	0.0125
0.0237	1.8461	11470	0.0107
0.0244	1.8960	11780	0.0063
0.0201	1.9459	12090	0.0047
0.023	1.9958	12400	0.0049
0.0211	2.0457	12710	0.0038
0.0171	2.0956	13020	0.0057
0.0229	2.1455	13330	0.0097
0.018	2.1954	13640	0.0060
0.0162	2.2453	13950	0.0089
0.0202	2.2952	14260	0.0098
0.0171	2.3451	14570	0.0072
0.0195	2.3950	14880	0.0044
0.0195	2.4449	15190	0.0043
0.0173	2.4948	15500	0.0046
0.015	2.5447	15810	0.0039
0.0149	2.5946	16120	0.0041
0.0204	2.6445	16430	0.0041
0.0173	2.6944	16740	0.0041
0.0181	2.7442	17050	0.0041
0.0165	2.7941	17360	0.0067
0.0326	2.8440	17670	0.0464
0.0732	2.8939	17980	0.0393
0.0367	2.9438	18290	0.0190
0.0515	2.9937	18600	0.0347
0.0348	3.0436	18910	0.0107
0.0288	3.0935	19220	0.0103
0.0363	3.1434	19530	0.0140
0.0409	3.1933	19840	0.0131
0.0211	3.2432	20150	0.0091
0.0279	3.2931	20460	0.0164
0.0286	3.3430	20770	0.0212
0.0244	3.3929	21080	0.0140
0.0301	3.4428	21390	0.0317
0.0274	3.4927	21700	0.0140
0.0245	3.5426	22010	0.0175
0.0216	3.5925	22320	0.0160
0.0209	3.6424	22630	0.0150
0.0243	3.6923	22940	0.0137
0.0255	3.7422	23250	0.0192
0.0233	3.7920	23560	0.0168
0.021	3.8419	23870	0.0210
0.021	3.8918	24180	0.0104
0.0174	3.9417	24490	0.0121
0.0195	3.9916	24800	0.0090
0.0168	4.0415	25110	0.0100
0.0198	4.0914	25420	0.0093
0.0208	4.1413	25730	0.0103
0.0197	4.1912	26040	0.0103
0.0204	4.2411	26350	0.0097
0.0156	4.2910	26660	0.0101
0.0163	4.3409	26970	0.0120
0.0168	4.3908	27280	0.0104
0.0192	4.4407	27590	0.0095
0.0175	4.4906	27900	0.0089
0.0185	4.5405	28210	0.0089
0.0163	4.5904	28520	0.0077
0.0135	4.6403	28830	0.0074
0.0136	4.6902	29140	0.0078
0.0138	4.7401	29450	0.0077
0.016	4.7900	29760	0.0076
0.0136	4.8399	30070	0.0078
0.0199	4.8897	30380	0.0078
0.0155	4.9396	30690	0.0078
0.0136	4.9895	31000	0.0078

Framework versions

Transformers 4.51.3
Pytorch 2.5.1+cu124
Datasets 3.5.0
Tokenizers 0.21.1

yangwooko
/

smartmind-cyberone-20250410_x10

smartmind-cyberone-20250410_x10

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for yangwooko/smartmind-cyberone-20250410_x10

Evaluation results