vllm (pretrained=/root/autodl-tmp/Forgotten-Safeword-24B-V3.0,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.92 ± 0.0172
strict-match 5 exact_match ↑ 0.92 ± 0.0172

vllm (pretrained=/root/autodl-tmp/Forgotten-Safeword-24B-V3.0,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.912 ± 0.0127
strict-match 5 exact_match ↑ 0.904 ± 0.0132

vllm (pretrained=/root/autodl-tmp/Forgotten-Safeword-24B-V3.0,add_bos_token=true,max_model_len=4096,dtype=bfloat16,max_num_seqs=3), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1

Groups Version Filter n-shot Metric Value Stderr
mmlu 2 none acc ↑ 0.7953 ± 0.0131
- humanities 2 none acc ↑ 0.8000 ± 0.0270
- other 2 none acc ↑ 0.8051 ± 0.0272
- social sciences 2 none acc ↑ 0.8667 ± 0.0244
- stem 2 none acc ↑ 0.7404 ± 0.0248

vllm (pretrained=/root/autodl-tmp/70-128-df10,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.888 ± 0.0200
strict-match 5 exact_match ↑ 0.884 ± 0.0203

vllm (pretrained=/root/autodl-tmp/70-128-df10,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.904 ± 0.0132
strict-match 5 exact_match ↑ 0.894 ± 0.0138

vllm (pretrained=/root/autodl-tmp/70-128-df10,add_bos_token=true,max_model_len=4096,dtype=bfloat16,max_num_seqs=3), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1

Groups Version Filter n-shot Metric Value Stderr
mmlu 2 none acc ↑ 0.7860 ± 0.0131
- humanities 2 none acc ↑ 0.8051 ± 0.0252
- other 2 none acc ↑ 0.7846 ± 0.0276
- social sciences 2 none acc ↑ 0.8667 ± 0.0240
- stem 2 none acc ↑ 0.7228 ± 0.0255

vllm (pretrained=/root/autodl-tmp/86-512,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.94 ± 0.0151
strict-match 5 exact_match ↑ 0.94 ± 0.0151

vllm (pretrained=/root/autodl-tmp/86-512,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.920 ± 0.0121
strict-match 5 exact_match ↑ 0.916 ± 0.0124

vllm (pretrained=/root/autodl-tmp/86-512,add_bos_token=true,max_model_len=4096,dtype=bfloat16,max_num_seqs=3), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1

Groups Version Filter n-shot Metric Value Stderr
mmlu 2 none acc ↑ 0.8012 ± 0.0130
- humanities 2 none acc ↑ 0.8000 ± 0.0267
- other 2 none acc ↑ 0.8000 ± 0.0275
- social sciences 2 none acc ↑ 0.8778 ± 0.0234
- stem 2 none acc ↑ 0.7544 ± 0.0246
Downloads last month
0
Safetensors
Model size
23.6B params
Tensor type
BF16
·
I8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for noneUsername/Forgotten-Safeword-24B-V3.0-W8A8

Quantized
(4)
this model