|
--- |
|
license: mit |
|
datasets: |
|
- Replete-AI/code_bagel |
|
--- |
|
# Chatty-McChatterson-3-mini-128k |
|
|
|
 |
|
|
|
## Model Details |
|
|
|
**Model Name:** Chatty-McChatterson-3-mini-128k |
|
**Base Model:** [microsoft/Phi-3-mini-128k-instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct) |
|
**Fine-tuning Method:** Supervised Fine-Tuning (SFT) |
|
**Dataset:** [ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k) |
|
**Training Data:** 12884 conversations selected for being 512 input tokens or less |
|
**Training Duration:** 4 hours |
|
**Hardware:** Nvidia RTX A4500 |
|
**Epochs:** 3 |
|
|
|
## Training Procedure |
|
|
|
This model was fine-tuned to provide better instructions on code. |
|
|
|
The training was conducted using PEFT and SFTTrainer on select conversations from the Ultra Chat 200k dataset. |
|
Training was completed in 3 epochs (19326 steps) over a span of 4 hours on an Nvidia A4500 GPU. |
|
|
|
The dataset comprised of a filterd list of rows from the [Ultra Chat 200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k) dataset, where the prompt template was 512 tokens or less. |
|
|
|
## Intended Use |
|
|
|
This model is designed to improve the overall chat experience and response quality. |
|
|
|
## Getting Started |
|
|
|
## Instruct Template |
|
```bash |
|
<|system|> |
|
{system_message} <|end|> |
|
<|user|> |
|
{Prompt) <|end|> |
|
<|assistant|> |
|
``` |
|
|
|
### Transfromers |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig |
|
|
|
model_name_or_path = "thesven/Chatty-McChatterson-3-mini-128k" |
|
|
|
# BitsAndBytesConfig for loading the model in 4-bit precision |
|
bnb_config = BitsAndBytesConfig( |
|
load_in_4bit=True, |
|
bnb_4bit_quant_type="nf4", |
|
bnb_4bit_compute_dtype="float16", |
|
) |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True) |
|
model = AutoModelForCausalLM.from_pretrained( |
|
model_name_or_path, |
|
device_map="auto", |
|
trust_remote_code=False, |
|
revision="main", |
|
quantization_config=bnb_config |
|
) |
|
model.pad_token = model.config.eos_token_id |
|
|
|
prompt_template = ''' |
|
<|user|> |
|
What is the name of the big tower in Toronto?.<|end|> |
|
<|assistant|> |
|
''' |
|
|
|
input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda() |
|
output = model.generate(inputs=input_ids, temperature=0.1, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=256) |
|
|
|
generated_text = tokenizer.decode(output[0, len(input_ids[0]):], skip_special_tokens=True) |
|
print(generated_text) |
|
``` |