Update README.md
Browse files
README.md
CHANGED
@@ -41,11 +41,21 @@ Output Models generate text only.
|
|
41 |
- Tuned models are intended for assistant-like chat, whereas pretrained models can be adapted for a variety of natural language generation tasks.
|
42 |
- To get the expected features and performance for the chat versions, a specific formatting needs to be followed,including the INST and <<SYS>> tags, BOS and EOS tokens, and the whitespaces and breaklines in between (we recommend calling strip() on inputs to avoid double-spaces). See our reference code in github for details: chat_completion.
|
43 |
|
|
|
|
|
|
|
|
|
|
|
|
|
44 |
- ```
|
45 |
-
|
46 |
|
47 |
-
|
|
|
48 |
|
|
|
|
|
|
|
49 |
|
50 |
### Out-of-Scope Use
|
51 |
|
|
|
41 |
- Tuned models are intended for assistant-like chat, whereas pretrained models can be adapted for a variety of natural language generation tasks.
|
42 |
- To get the expected features and performance for the chat versions, a specific formatting needs to be followed,including the INST and <<SYS>> tags, BOS and EOS tokens, and the whitespaces and breaklines in between (we recommend calling strip() on inputs to avoid double-spaces). See our reference code in github for details: chat_completion.
|
43 |
|
44 |
+
- Just Install ctransformers:
|
45 |
+
```
|
46 |
+
!pip install ctransformers>=0.2.24
|
47 |
+
```
|
48 |
+
- Use the following to get started.
|
49 |
+
|
50 |
- ```
|
51 |
+
from ctransformers import AutoModelForCausalLM
|
52 |
|
53 |
+
#Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
|
54 |
+
llm = AutoModelForCausalLM.from_pretrained("Ranjanunicode/unicode-llama-2-chat-Hf-q4-2", model_file="unicode-llama-2-chat-Hf-q4-2.gguf", model_type="llama", gpu_layers=40)
|
55 |
|
56 |
+
print(llm("AI is going to"))
|
57 |
+
|
58 |
+
```
|
59 |
|
60 |
### Out-of-Scope Use
|
61 |
|