Model Card for Notbad v1.0 Mistral 24B

๐Ÿ“ฃ New model available: Notbad v1.1 Mistral 24B

Notbad v1.0 Mistral 24B is a reasoning model trained in math and Python coding. This model is built upon the Mistral-Small-24B-Instruct-2501. and has been further trained with reinforcement learning on math and coding.

One of the key features of Notbad v1.0 is its ability to produce shorter and cleaner reasoning outputs. We used open datasets and employed reinforcement learning techniques developed continuing from our work on Quiet Star, and are similar to Dr. GRPO. The reasoning capabilities in this model are from self-improvement and not distilled from any other model. It is the result of a fine-tuning from data sampled from multiple of our RL models starting with the Mistral-Small-24B-Instruct-2501.

Special thanks to Lambda and Deep Infra for providing help with compute resources for our research and training this model.

You can try the model on chat.labml.ai.

Benchmark results

Evaluation notbad_v1_0_mistral_24b mistral-small-24B-instruct-2501 gemma-2b-27b llama-3.3-70b qwen2.5-32b gpt-4o-mini-2024-07-18
mmlu_pro 0.642 0.663 0.536 0.666 0.683 0.617
gpqa_main 0.447 0.453 0.344 0.531 0.404 0.377

Math & Coding

Evaluation notbad_v1_0_mistral_24b mistral-small-24B-instruct-2501 gemma-2b-27b llama-3.3-70b qwen2.5-32b gpt-4o-mini-2024-07-18
humaneval 0.869 0.848 0.732 0.854 0.909 0.890
math 0.752 0.706 0.535 0.743 0.819 0.761

Instruction following

Evaluation notbad_v1_0_mistral_24b mistral-small-24B-instruct-2501 gemma-2b-27b llama-3.3-70b qwen2.5-32b gpt-4o-mini-2024-07-18
ifeval 0.514 0.829 0.8065 0.8835 0.8401 0.8499

Note:

Downloads last month
20
Safetensors
Model size
23.6B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for notbadai/notbad_v1_0_mistral_24b

Finetuned
(32)
this model
Quantizations
1 model