File size: 3,713 Bytes
c572e42 70b0734 c572e42 70b0734 c572e42 e5ba5fc c572e42 70b0734 c572e42 70b0734 42ed4a0 70b0734 42ed4a0 70b0734 1d31142 70b0734 106982c 1d31142 70b0734 8f47e2b 65af0df 70b0734 8f47e2b 70b0734 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 |
---
base_model:
- llama-3.2-3b-instruct-bnb-4bit
- unsloth/Llama-3.2-3B-Instruct-bnb-4bit
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- gguf
- GRPO
license: apache-2.0
language:
- en
---
<div align="center">
<img src="https://cdn-uploads.huggingface.co/production/uploads/669777597cb32718c20d97e9/4emWK_PB-RrifIbrCUjE8.png"
alt="Title card"
style="width: 500px;
height: auto;
object-position: center top;">
</div>
# Uploaded model
- **Developed by:** alphaaico
- **License:** apache-2.0
- **Finetuned from model :** llama-3.2-3b-instruct-bnb-4bit
This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
**Deep-Reason-SMALL-V0**
Overview
Deep-Reason-SMALL-V0 is a fine-tuned version of llama-3.2-3b-instruct, designed for advanced reasoning and thinking capabilities. It has been trained using Reasoning GRPO techniques and a custom dataset curated for enhancing logical inference, decision-making, and structured reasoning.
Built with Unsloth and Hugging Face’s TRL, this model is optimized for faster inference and superior logical performance.
The model is available in GGUF and 16 Bit format and has been quantized to different levels to support various hardware configurations.
**Model Details**
- Base Model: LLaMA-3 3B
- Fine-tuned By: Alpha AI
- Training Framework: Unsloth
**Quantization Levels Available**
- q4_k_m
- q5_k_m
- q8_0
- 16 Bit (https://huggingface.co/alphaaico/Deep-Reason-SMALL-V0)
**Key Features**
- Enhanced Reasoning: Fine-tuned using GRPO to improve problem-solving and structured thought processes.
- Optimized for Thinking Tasks: Excels in logical, multi-step, and causal reasoning.
- Structured XML Responses: Outputs are formatted using a structured reasoning-answer format for easy parsing. Outputs are formatted using structured <reasoning>...</think> and <answer>...</answer> sections for easy parsing.
- Efficient Deployment: Available in GGUF format for local AI deployments on consumer hardware.
**Response Format & Parsing Instructions**
Deep-Reason-SMALL-V0 follows a structured response format with designated XML-like tags for easy parsing. The XML responses will include tokens such as <reasoning>...</reasoning> and <answer>...</answer>. Users must extract the tokens accordingly when using programmatically. This ensures clarity and traceability in decision-making.
**Ideal Configuration for using the GGUF Models**
- temperature = 0.8
- top_p = 0.95
- max_tokens = 1024
- SYSTEM_PROMPT = """
Respond in the following format:
<reasoning>
...
</reasoning>
<answer>
...
</answer>
"""
**Use Cases**
Deep-Reason-SMALL-V0 is best suited for:
- Conversational AI – Improving chatbot and AI assistant reasoning.
- AI Research – Studying logical thought modeling in AI.
- Automated Decision Making – Use in AI-powered business intelligence systems.
- Education & Tutoring – Helping students and professionals with structured learning.
- Legal & Financial Analysis – Generating step-by-step arguments for case studies.
**Limitations & Considerations**
- May require further fine-tuning for domain-specific logic.
- Not a factual knowledge base – Focused on reasoning, not general knowledge retrieval.
- Potential biases – Results depend on training data.
- Computational Trade-off – Reasoning performance comes at the cost of slightly longer inference times.
**License**
This model is released under a permissible license.
**Acknowledgments**
Special thanks to the Unsloth team for providing an optimized training pipeline for LLaMA models.
|