noneUsername/Forgotten-Safeword-24B-V3.0-W8A8

vllm (pretrained=/root/autodl-tmp/Forgotten-Safeword-24B-V3.0,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.92	±	0.0172
		strict-match	5	exact_match	↑	0.92	±	0.0172

vllm (pretrained=/root/autodl-tmp/Forgotten-Safeword-24B-V3.0,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.912	±	0.0127
		strict-match	5	exact_match	↑	0.904	±	0.0132

vllm (pretrained=/root/autodl-tmp/Forgotten-Safeword-24B-V3.0,add_bos_token=true,max_model_len=4096,dtype=bfloat16,max_num_seqs=3), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.7953	±	0.0131
- humanities	2	none	acc	↑	0.8000	±	0.0270
- other	2	none	acc	↑	0.8051	±	0.0272
- social sciences	2	none	acc	↑	0.8667	±	0.0244
- stem	2	none	acc	↑	0.7404	±	0.0248

vllm (pretrained=/root/autodl-tmp/70-128-df10,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.888	±	0.0200
		strict-match	5	exact_match	↑	0.884	±	0.0203

vllm (pretrained=/root/autodl-tmp/70-128-df10,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.904	±	0.0132
		strict-match	5	exact_match	↑	0.894	±	0.0138

vllm (pretrained=/root/autodl-tmp/70-128-df10,add_bos_token=true,max_model_len=4096,dtype=bfloat16,max_num_seqs=3), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.7860	±	0.0131
- humanities	2	none	acc	↑	0.8051	±	0.0252
- other	2	none	acc	↑	0.7846	±	0.0276
- social sciences	2	none	acc	↑	0.8667	±	0.0240
- stem	2	none	acc	↑	0.7228	±	0.0255

vllm (pretrained=/root/autodl-tmp/86-512,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.94	±	0.0151
		strict-match	5	exact_match	↑	0.94	±	0.0151

vllm (pretrained=/root/autodl-tmp/86-512,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.920	±	0.0121
		strict-match	5	exact_match	↑	0.916	±	0.0124

vllm (pretrained=/root/autodl-tmp/86-512,add_bos_token=true,max_model_len=4096,dtype=bfloat16,max_num_seqs=3), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.8012	±	0.0130
- humanities	2	none	acc	↑	0.8000	±	0.0267
- other	2	none	acc	↑	0.8000	±	0.0275
- social sciences	2	none	acc	↑	0.8778	±	0.0234
- stem	2	none	acc	↑	0.7544	±	0.0246

noneUsername
/

Forgotten-Safeword-24B-V3.0-W8A8

Model tree for noneUsername/Forgotten-Safeword-24B-V3.0-W8A8