---
base_model: unsloth/Llama-3.2-3B-Instruct
tags:
- text-generation
- instruction-tuned
- hallucination-reduction
- transformers
- unsloth
- llama
- fine-tuned
- gguf
- quantized
license: apache-2.0
language:
- en
datasets:
- skshmjn/RAG-INSTRUCT-1.1
pipeline_tag: text-generation
library_name: transformers
---

# 🚀 RAG-Instruct Llama-3.2-3B (Fine-tuned)  

- **Developed by:** skshmjn  
- **License:** apache-2.0  
- **Finetuned from model:** [unsloth/Llama-3.2-3B-Instruct](https://huggingface.co/unsloth/Llama-3.2-3B-Instruct)  
- **Dataset Used:** [skshmjn/RAG-INSTRUCT-1.1](https://huggingface.co/datasets/skshmjn/RAG-INSTRUCT-1.1)  
- **Supports:** Transformers & GGUF (for fast inference on CPU/GPU)  

---

## 📌 **Model Overview**  
This model is fine-tuned on the **RAG-INSTRUCT-1.1** dataset using **Unsloth** to enhance text generation.  
It is optimized for **instruction-following** while reducing hallucination, ensuring that responses remain factual and concise.  

- **Instruction-Tuned**: Follows structured queries effectively.  
- **Hallucination Reduction**: Avoids fabricating information when context is missing.  
- **Optimized with Unsloth**: Fast inference with GGUF quantization.  

---

## 📌 **Example Usage (Transformers)**  
```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "skshmjn/Llama-3.2-3B-RAG-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

prompt = """You are an assistant for question-answering tasks. 
Use the following pieces of retrieved context to answer the question. 
If you don't know the answer, just say that you don't know. 
Use three sentences maximum and keep the answer concise.

Question: Who discovered the first exoplanet?  
Context: [No relevant context available]  
Answer:"""

inputs = tokenizer(prompt, return_tensors="pt")
output = model.generate(**inputs, max_length=100)
response = tokenizer.decode(output[0], skip_special_tokens=True)

print(response)