---
library_name: transformers
license: mit
base_model:
- meta-llama/Llama-3.1-8B-Instruct
---

# Model Card

## Model Details

- **Developed by:** Amar-89
- **Model type:** Quantized (8-bit)
- **License:** MIT
- **Quantized from model:** meta-llama/Llama-3.1-8B-Instruct
- **Model size:** 9.1 GB

Uses the tokenizer from the base model.
No additional tweaks to model besides quantization.
Recommended: 12 GB VRAM

## How to use

```bash
pip install -q -U torch bitsandbytes transformers accelerate 
```

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "Amar-89/Llama-3.1-8B-Instruct-8bit"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

def terminal_chat(model, tokenizer, system_prompt):
    """
    Starts a terminal-based chat session with a specified model, tokenizer, and system prompt.

    Args:
        model: The Hugging Face model object.
        tokenizer: The Hugging Face tokenizer object.
        system_prompt: The system role or instruction to define the chat behavior.
    """
    from transformers import pipeline

    pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

    messages = [{"role": "system", "content": system_prompt}]
    print("Chat session started. Type 'exit' to quit.")

    while True:
        user_input = input("User: ")
        if user_input.lower() == "exit":
            print("Ending chat session. Goodbye!")
            break

        messages.append({"role": "user", "content": user_input})

        outputs = pipe(messages, max_new_tokens=256)

        response = outputs[0]["generated_text"][-1]['content']
        print(f"Assistant: {response}")

        print(messages)


system_prompt = "You are a pirate chatbot who always responds in pirate speak!"

terminal_chat(model, tokenizer, system_prompt)
```