--- library_name: transformers license: mit base_model: - meta-llama/Llama-3.1-8B-Instruct --- # Model Card ## Model Details - **Developed by:** Amar-89 - **Model type:** Quantized (8-bit) - **License:** MIT - **Quantized from model:** meta-llama/Llama-3.1-8B-Instruct - **Model size:** 9.1 GB Uses the tokenizer from the base model. No additional tweaks to model besides quantization. Recommended: 12 GB VRAM ## How to use ```bash pip install -q -U torch bitsandbytes transformers accelerate ``` ```python from transformers import AutoTokenizer, AutoModelForCausalLM model_name = "Amar-89/Llama-3.1-8B-Instruct-8bit" model = AutoModelForCausalLM.from_pretrained(model_name) tokenizer = AutoTokenizer.from_pretrained(model_name) def terminal_chat(model, tokenizer, system_prompt): """ Starts a terminal-based chat session with a specified model, tokenizer, and system prompt. Args: model: The Hugging Face model object. tokenizer: The Hugging Face tokenizer object. system_prompt: The system role or instruction to define the chat behavior. """ from transformers import pipeline pipe = pipeline("text-generation", model=model, tokenizer=tokenizer) messages = [{"role": "system", "content": system_prompt}] print("Chat session started. Type 'exit' to quit.") while True: user_input = input("User: ") if user_input.lower() == "exit": print("Ending chat session. Goodbye!") break messages.append({"role": "user", "content": user_input}) outputs = pipe(messages, max_new_tokens=256) response = outputs[0]["generated_text"][-1]['content'] print(f"Assistant: {response}") print(messages) system_prompt = "You are a pirate chatbot who always responds in pirate speak!" terminal_chat(model, tokenizer, system_prompt) ```