noneUsername/Mistral-Small-24B-Instruct-2501-writer-W8A8-better

vllm (pretrained=/root/autodl-tmp/Mistral-Small-24B-Instruct-2501-writer,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.924	±	0.0168
		strict-match	5	exact_match	↑	0.920	±	0.0172

vllm (pretrained=/root/autodl-tmp/Mistral-Small-24B-Instruct-2501-writer,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.918	±	0.0123
		strict-match	5	exact_match	↑	0.908	±	0.0129

vllm (pretrained=/root/autodl-tmp/Mistral-Small-24B-Instruct-2501-writer,add_bos_token=true,max_model_len=4096,dtype=bfloat16,max_num_seqs=3), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.7977	±	0.0131
- humanities	2	none	acc	↑	0.8256	±	0.0263
- other	2	none	acc	↑	0.8205	±	0.0265
- social sciences	2	none	acc	↑	0.8556	±	0.0256
- stem	2	none	acc	↑	0.7263	±	0.0249

vllm (pretrained=/root/autodl-tmp/70-512-df10,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.92	±	0.0172
		strict-match	5	exact_match	↑	0.92	±	0.0172

vllm (pretrained=/root/autodl-tmp/70-512-df10,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.910	±	0.0128
		strict-match	5	exact_match	↑	0.906	±	0.0131

vllm (pretrained=/root/autodl-tmp/70-512-df10,add_bos_token=true,max_model_len=4096,dtype=bfloat16,max_num_seqs=3), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.7836	±	0.0133
- humanities	2	none	acc	↑	0.8103	±	0.0267
- other	2	none	acc	↑	0.7949	±	0.0271
- social sciences	2	none	acc	↑	0.8556	±	0.0252
- stem	2	none	acc	↑	0.7123	±	0.0257

vllm (pretrained=/root/autodl-tmp/86-256,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.928	±	0.0164
		strict-match	5	exact_match	↑	0.924	±	0.0168

vllm (pretrained=/root/autodl-tmp/86-256,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.928	±	0.0116
		strict-match	5	exact_match	↑	0.920	±	0.0121

vllm (pretrained=/root/autodl-tmp/86-256,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.7942	±	0.0131
- humanities	2	none	acc	↑	0.8103	±	0.0270
- other	2	none	acc	↑	0.7949	±	0.0280
- social sciences	2	none	acc	↑	0.8556	±	0.0252
- stem	2	none	acc	↑	0.7439	±	0.0242

noneUsername
/

Mistral-Small-24B-Instruct-2501-writer-W8A8-better

Model tree for noneUsername/Mistral-Small-24B-Instruct-2501-writer-W8A8-better