---
base_model: unsloth/Llama-3.2-3B-Instruct-unsloth-bnb-4bit
library_name: transformers
model_name: Llama-3.2-3B-Instruct-Thinking
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- trl
- grpo
licence: apache-2.0
datasets:
- AI-MO/NuminaMath-TIR
model-index:
  - name: Llama-3.2-3B-Instruct-Thinking
    results:
      - task:
          type: text-generation
        dataset:
          name: openai/gsm8k
          type: GradeSchoolMath8K
        metrics:
          - name: GSM8k (0-Shot)
            type: GSM8k (0-Shot)
            value: 31.61%
          - name: GSM8k (Few-Shot)
            type: GSM8k (Few-Shot)
            value: 54.51%
co2_eq_emissions:
  emissions: 49600
  source: "https://mlco2.github.io/impact#compute"
  training_type: "GRPO"
  geographical_location: "North Europe"
  hardware_used: "1 x H100 96GB"

---

# Model Card for Llama-3.2-3B-Instruct-Thinking

It has been trained using [TRL](https://github.com/huggingface/trl) & Unsloth.


## Evals

| Model                                    | GSM8k 0-Shot | GSM8k Few-Shot |
|------------------------------------------|------------------|-------------------|
| Mistral-7B-v0.1                          | 10             | 41              |
| Llama-3.2-3B-Instruct-Thinking            | 31.61            | 54.51            |


## Training procedure

<img src="https://raw.githubusercontent.com/wandb/wandb/fc186783c86c33980e5c73f13363c13b2c5508b1/assets/logo-dark.svg" alt="Weights & Biases Logged" width="150" height="24"/>

<img src="https://huggingface.co/justinj92/Llama-3.2-3B-Instruct-Thinking/resolve/main/cpkt3200.png" width="1200" height="900"/>

Trained on 1xH100 96GB via Azure Cloud (North Europe). This is model at Checkpoint 3200 post which the model started to drop in accuracy across reward functions.

This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).

## System Prompt

Make sure to set the system prompt in order to set the tone and guidelines for the responses - Otherwise, it will act in a default way that might not be what you want.

Recommended System Prompt:
```
A conversation between User and Assistant. The user asks a question, and the Assistant solves it.
The assistant first thinks about the reasoning process in the mind and then provides the user with the answer.
The reasoning process and answer are enclosed within <think> </think> and <answer> </answer> tags, respectively,
i.e., <think> reasoning process here </think><answer> answer here </answer>
```

### Usage Recommendations

**Recommend adhering to the following configurations when utilizing the models, including benchmarking, to achieve the expected performance:**

1. Set the temperature within the range of 0.5-0.7 (0.6 is recommended) to prevent endless repetitions or incoherent outputs.
2. When evaluating model performance, it is recommended to conduct multiple tests and average the results.
3. This model is not enhanced for other domains apart from Maths.

### Framework versions

- TRL: 0.15.0.dev0
- Transformers: 4.49.0.dev0
- Pytorch: 2.5.1
- Datasets: 3.2.0
- Tokenizers: 0.21.0

## Citations

Cite Unsloth as:
```bibtex
@software{unsloth,
  author = {Daniel Han, Michael Han and Unsloth team},
  title = {Unsloth},
  url = {http://github.com/unslothai/unsloth},
  year = {2023}
}
```

Cite GRPO as:

```bibtex
@article{zhihong2024deepseekmath,
    title        = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
    author       = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
    year         = 2024,
    eprint       = {arXiv:2402.03300},
}
```

Cite TRL as:
    
```bibtex
@misc{vonwerra2022trl,
	title        = {{TRL: Transformer Reinforcement Learning}},
	author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
	year         = 2020,
	journal      = {GitHub repository},
	publisher    = {GitHub},
	howpublished = {\url{https://github.com/huggingface/trl}}
}
```