File size: 3,818 Bytes
c516aa5 8016e91 c516aa5 8016e91 417eae2 c516aa5 417eae2 c516aa5 2cff65e 417eae2 c516aa5 2cff65e 417eae2 c516aa5 417eae2 c516aa5 417eae2 c516aa5 417eae2 c516aa5 8016e91 c516aa5 8016e91 c516aa5 8016e91 c516aa5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 |
---
license: apache-2.0
tags:
- qwen
- math
- fine-tuned
- open-r1
- supervised-finetuning
- evaluation
datasets:
- open-r1/OpenR1-Math-220k
- Idavidrein/gpqa
- HuggingFaceH4/MATH-500
metrics:
- accuracy
base_model:
- Qwen/Qwen2.5-0.5B
pipeline_tag: text-generation
library_name: transformers
language:
- en
model-index:
- name: Qwen2.5-0.5B-Math220k (Checkpoint-15000)
results:
- task:
type: multiple-choice
dataset:
name: GPQA
type: open
metrics:
- name: Accuracy (Clean Extraction)
type: accuracy
value: 0.386
- name: Accuracy (All Extraction)
type: accuracy
value: 0.410
- task:
type: mathematical-reasoning
dataset:
name: MATH500
type: open
metrics:
- name: Accuracy
type: accuracy
value: 0.219
---
# Qwen2.5-0.5B-Math220k (Checkpoint-15000)
This model is a supervised fine-tuned variant of [Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B), trained on the **default split of math220k** for step-by-step mathematical reasoning and standardized answer formatting.
## Training
- **Base model:** Qwen2.5-0.5B
- **Dataset:** math220k `default` subset (83k train, 10k test, filtered for verified answers)
- **Training steps:** 15,000
- **Checkpoint interval:** 500 steps
- **Learning rate:** 2.5e-6 with **cosine decay scheduler**
- **Batch size:** 64
- **Prompting format:** guided step-by-step reasoning, with enforced final answer formatting (`Answer:` or `\boxed{}`)
## Evaluation
All evaluations were performed on **bootstrapped datasets (size=1000)** to ensure fair, stable comparisons.
| Dataset | Accuracy (Clean) | Accuracy (All) |
|----------------|------------------|----------------|
| GPQA (merged) | 0.386 | 0.410 |
| MATH500 | 0.219 | N/A |
- **Clean extraction:** only answers in canonical form (`Answer: X`, `\boxed{X}`)
- **All extraction:** includes fuzzy-matched final answers in phrases like “the correct answer is X”
Evaluation was performed with `eval_checkpoints_auto.py` using local bootstrapped datasets.
For detailed evaluation results and charts, see: [DexinRen/open-r1_DR_test/dexin_src/eval_output](https://github.com/DexinRen/open-r1_DR_test/tree/master/dexin_src/eval_output)
## Limitations
- The math220k dataset contains noisy, unverified solutions. The model may pick up flawed reasoning patterns.
- This checkpoint prioritizes **formatting discipline and correctness of final answers** over full reasoning transparency.
- MATH500 generalization is slightly degraded vs. the base model (expected for SFT).
## Files Included
- `model.safetensors`: model weights
- `tokenizer.json`, `vocab.json`, `config.json`: tokenizer and model config
- All files are stored using **Git LFS** for proper large file support.
## Citation
If you use this model, please cite:
Dexin Ren. "Fine-Tuning Qwen2.5-0.5B for Mathematical Reasoning." 2025. Available at: https://huggingface.co/DexinR/qwen2.5-math220k-ckpt15000
## Recommended Usage
For basic use
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("DexinR/qwen2.5-math220k-ckpt15000")
model = AutoModelForCausalLM.from_pretrained("DexinR/qwen2.5-math220k-ckpt15000", trust_remote_code=True)
```
For reproducible evaluation, use the [custom formatter and evaluation code](https://github.com/DexinRen/open-r1_DR_test):
```python
from dexin_src.utils.formatter import Formatter
tokenizer = AutoTokenizer.from_pretrained("DexinR/qwen2.5-math220k-ckpt15000")
formatter = Formatter(tokenizer)
formatted_prompt = formatter.format_prompt(example) # example is a row from your dataset
``` |