--- language: en license: mit tags: - llama.cpp - gguf - quantized - mimo - reasoning base_model: XiaomiMiMo/MiMo-7B-RL base_model_relation: quantized --- # MiMo-7B-RL (GGUF) This is a GGUF quantized version of [XiaomiMiMo/MiMo-7B-RL](https://huggingface.co/XiaomiMiMo/MiMo-7B-RL), optimized for use with llama.cpp, Ollama, LM Studio, and other GGUF-compatible inference engines. The model has been converted from the original SafeTensors format to GGUF. ## Model Description MiMo-7B-RL is a powerful 7B parameter language model developed by Xiaomi, specifically designed for enhanced reasoning capabilities in both mathematics and code. The original model matches the performance of OpenAI's o1-mini in many benchmarks. ### Model Details - **Original Model**: MiMo-7B-RL by Xiaomi - **Parameters**: 7 billion - **Context Length**: 32,768 tokens - **Architecture**: Modified transformer with 36 layers, 32 attention heads - **Original Format**: SafeTensors - **Converted Format**: GGUF - **License**: MIT Key features of the original model: - Trained using a specialized pre-training strategy focused on reasoning tasks - Fine-tuned with reinforcement learning on 130K mathematics and code problems - Demonstrates superior performance in both mathematical reasoning and coding tasks - Matches performance of much larger models in reasoning capabilities ## Usage ### With Ollama ```bash ollama run mimo-7b-rl-q8 ``` ### With LM Studio 1. Load the model through the LM Studio interface 2. Select the GGUF file 3. Configure your desired settings 4. Start chatting! ### With llama.cpp ```bash ./main -m mimo-7b-rl-q8.gguf -n 4096 ``` ## Performance The original model demonstrates impressive performance across various benchmarks: | Benchmark | Score | | ------------------------- | :---: | | MATH-500 (Pass@1) | 95.8% | | AIME 2024 (Pass@1) | 68.2% | | AIME 2025 (Pass@1) | 55.4% | | LiveCodeBench v5 (Pass@1) | 57.8% | | LiveCodeBench v6 (Pass@1) | 49.3% | _Note: Performance metrics are from the original model. The GGUF conversion may show slightly different results due to quantization._ ## Limitations and Biases The model inherits any limitations and biases present in the original MiMo-7B-RL model. Additionally: - Q8 quantization may result in slightly reduced performance compared to the original model - The model requires careful prompt engineering for optimal results in reasoning tasks - Performance may vary depending on the specific GGUF inference implementation used ## Training Details The model was trained by Xiaomi using: - Pre-training on approximately 25 trillion tokens - Three-stage data mixture strategy - Multiple-Token Prediction as an additional training objective - RL fine-tuning on 130K mathematics and code problems For detailed training information, please refer to the [original model card](https://huggingface.co/XiaomiMiMo/MiMo-7B-RL). ## Citation If you use this model, please cite the original work: ```bibtex @misc{xiaomi2025mimo, title={MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining}, author={{Xiaomi LLM-Core Team}}, year={2025}, primaryClass={cs.CL}, url={https://github.com/XiaomiMiMo/MiMo}, } ``` ## Acknowledgments Original model development by Xiaomi LLM-Core Team.