Model Card

Model Details

  • Developed by: Amar-89
  • Model type: Quantized (8-bit)
  • License: MIT
  • Quantized from model: meta-llama/Llama-3.1-8B-Instruct
  • Model size: 9.1 GB

Uses the tokenizer from the base model. No additional tweaks to model besides quantization. Recommended: 12 GB VRAM

How to use

pip install -q -U torch bitsandbytes transformers accelerate 
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "Amar-89/Llama-3.1-8B-Instruct-8bit"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

def terminal_chat(model, tokenizer, system_prompt):
    """
    Starts a terminal-based chat session with a specified model, tokenizer, and system prompt.

    Args:
        model: The Hugging Face model object.
        tokenizer: The Hugging Face tokenizer object.
        system_prompt: The system role or instruction to define the chat behavior.
    """
    from transformers import pipeline

    pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

    messages = [{"role": "system", "content": system_prompt}]
    print("Chat session started. Type 'exit' to quit.")

    while True:
        user_input = input("User: ")
        if user_input.lower() == "exit":
            print("Ending chat session. Goodbye!")
            break

        messages.append({"role": "user", "content": user_input})

        outputs = pipe(messages, max_new_tokens=256)

        response = outputs[0]["generated_text"][-1]['content']
        print(f"Assistant: {response}")

        print(messages)


system_prompt = "You are a pirate chatbot who always responds in pirate speak!"

terminal_chat(model, tokenizer, system_prompt)
Downloads last month
35
Safetensors
Model size
8.03B params
Tensor type
F32
BF16
I8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for Amar-89/Llama-3.1-8B-Instruct-8bit

Quantized
(409)
this model