---
language: en
license: mit
tags:
- llama.cpp
- gguf
- quantized
- mimo
- reasoning
base_model: XiaomiMiMo/MiMo-7B-RL
base_model_relation: quantized
---

# MiMo-7B-RL (GGUF)

This is a GGUF quantized version of [XiaomiMiMo/MiMo-7B-RL](https://huggingface.co/XiaomiMiMo/MiMo-7B-RL), optimized for use with llama.cpp, Ollama, LM Studio, and other GGUF-compatible inference engines. The model has been converted from the original SafeTensors format to GGUF.

## Model Description

MiMo-7B-RL is a powerful 7B parameter language model developed by Xiaomi, specifically designed for enhanced reasoning capabilities in both mathematics and code. The original model matches the performance of OpenAI's o1-mini in many benchmarks.

### Model Details

- **Original Model**: MiMo-7B-RL by Xiaomi
- **Parameters**: 7 billion
- **Context Length**: 32,768 tokens
- **Architecture**: Modified transformer with 36 layers, 32 attention heads
- **Original Format**: SafeTensors
- **Converted Format**: GGUF
- **License**: MIT

Key features of the original model:

- Trained using a specialized pre-training strategy focused on reasoning tasks
- Fine-tuned with reinforcement learning on 130K mathematics and code problems
- Demonstrates superior performance in both mathematical reasoning and coding tasks
- Matches performance of much larger models in reasoning capabilities

## Usage

### With Ollama

```bash
ollama run mimo-7b-rl-q8
```

### With LM Studio

1. Load the model through the LM Studio interface
2. Select the GGUF file
3. Configure your desired settings
4. Start chatting!

### With llama.cpp

```bash
./main -m mimo-7b-rl-q8.gguf -n 4096
```

## Performance

The original model demonstrates impressive performance across various benchmarks:

| Benchmark                 | Score |
| ------------------------- | :---: |
| MATH-500 (Pass@1)         | 95.8% |
| AIME 2024 (Pass@1)        | 68.2% |
| AIME 2025 (Pass@1)        | 55.4% |
| LiveCodeBench v5 (Pass@1) | 57.8% |
| LiveCodeBench v6 (Pass@1) | 49.3% |

_Note: Performance metrics are from the original model. The GGUF conversion may show slightly different results due to quantization._

## Limitations and Biases

The model inherits any limitations and biases present in the original MiMo-7B-RL model. Additionally:

- Q8 quantization may result in slightly reduced performance compared to the original model
- The model requires careful prompt engineering for optimal results in reasoning tasks
- Performance may vary depending on the specific GGUF inference implementation used

## Training Details

The model was trained by Xiaomi using:

- Pre-training on approximately 25 trillion tokens
- Three-stage data mixture strategy
- Multiple-Token Prediction as an additional training objective
- RL fine-tuning on 130K mathematics and code problems

For detailed training information, please refer to the [original model card](https://huggingface.co/XiaomiMiMo/MiMo-7B-RL).

## Citation

If you use this model, please cite the original work:

```bibtex
@misc{xiaomi2025mimo,
      title={MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining},
      author={{Xiaomi LLM-Core Team}},
      year={2025},
      primaryClass={cs.CL},
      url={https://github.com/XiaomiMiMo/MiMo},
}
```

## Acknowledgments

Original model development by Xiaomi LLM-Core Team.