Update README.md
Browse files
README.md
CHANGED
@@ -30,7 +30,7 @@ The model's potential is promising, offering faster and more potent language mod
|
|
30 |
|
31 |
Introducing Mamba-Chat 🐍, the pioneering chat language model that diverges from the transformer architecture by adopting a state-space model framework.
|
32 |
|
33 |
-
Built upon the research by Albert Gu and Tri Dao, specifically their work titled "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" (paper), this model leverages their implementation.
|
34 |
|
35 |
Mamba-Chat, based on **Mamba-2.8B**, underwent fine-tuning on 16,000 samples from the **HuggingFaceH4/ultrachat_200k** dataset.
|
36 |
|
@@ -60,6 +60,18 @@ tokenizer.pad_token = tokenizer.eos_token
|
|
60 |
tokenizer.chat_template = AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-beta").chat_template
|
61 |
|
62 |
model = MambaLMHeadModel.from_pretrained("ayoubkirouane/Mamba-Chat-2.8B", device="cuda", dtype=torch.float16)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
63 |
```
|
64 |
|
65 |
|
|
|
30 |
|
31 |
Introducing Mamba-Chat 🐍, the pioneering chat language model that diverges from the transformer architecture by adopting a state-space model framework.
|
32 |
|
33 |
+
Built upon the research by Albert Gu and Tri Dao, specifically their work titled **"Mamba: Linear-Time Sequence Modeling with Selective State Spaces"** (paper), this model leverages their implementation.
|
34 |
|
35 |
Mamba-Chat, based on **Mamba-2.8B**, underwent fine-tuning on 16,000 samples from the **HuggingFaceH4/ultrachat_200k** dataset.
|
36 |
|
|
|
60 |
tokenizer.chat_template = AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-beta").chat_template
|
61 |
|
62 |
model = MambaLMHeadModel.from_pretrained("ayoubkirouane/Mamba-Chat-2.8B", device="cuda", dtype=torch.float16)
|
63 |
+
messages = []
|
64 |
+
user_message = """
|
65 |
+
Write your message here ..
|
66 |
+
"""
|
67 |
+
|
68 |
+
messages.append(dict(role="user",content=user_message))
|
69 |
+
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to("cuda")
|
70 |
+
out = model.generate(input_ids=input_ids, max_length=2000, temperature=0.9, top_p=0.7, eos_token_id=tokenizer.eos_token_id)
|
71 |
+
decoded = tokenizer.batch_decode(out)
|
72 |
+
messages.append(dict(role="assistant",content=decoded[0].split("<|assistant|>\n")[-1]))
|
73 |
+
print("Model:", decoded[0].split("<|assistant|>\n")[-1])
|
74 |
+
|
75 |
```
|
76 |
|
77 |
|