Quantizations of https://huggingface.co/MoxoffSpA/Azzurro
From original readme
Usage
Be sure to install these dependencies before running the program
!pip install transformers torch sentencepiece
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cpu" # if you want to use the gpu make sure to have cuda toolkit installed and change this to "cuda"
model = AutoModelForCausalLM.from_pretrained("MoxoffSpA/Azzurro")
tokenizer = AutoTokenizer.from_pretrained("MoxoffSpA/Azzurro")
question = """Quanto è alta la torre di Pisa?"""
context = """
La Torre di Pisa è un campanile del XII secolo, famoso per la sua inclinazione. Alta circa 56 metri.
"""
prompt = f"Domanda: {question}, contesto: {context}"
messages = [
{"role": "user", "content": prompt}
]
encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")
model_inputs = encodeds.to(device)
model.to(device)
generated_ids = model.generate(
model_inputs, # The input to the model
max_new_tokens=128, # Limiting the maximum number of new tokens generated
do_sample=True, # Enabling sampling to introduce randomness in the generation
temperature=0.1, # Setting temperature to control the randomness, lower values make it more deterministic
top_p=0.95, # Using nucleus sampling with top-p filtering for more coherent generation
eos_token_id=tokenizer.eos_token_id # Specifying the token that indicates the end of a sequence
)
decoded_output = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
trimmed_output = decoded_output.strip()
print(trimmed_output)
- Downloads last month
- 60
Hardware compatibility
Log In
to view the estimation
1-bit
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support