Commit
·
290ecb9
1
Parent(s):
5f0b6c1
Update README.md
Browse files
README.md
CHANGED
@@ -38,6 +38,27 @@ Bram Vanroy. (2023). Llama v2 13b: Finetuned on Dutch Conversational Data. Huggi
|
|
38 |
}
|
39 |
```
|
40 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
41 |
## Model description
|
42 |
|
43 |
I could not get the original Llama 2 13B to produce much Dutch, even though the description paper indicates that it was trained on a (small) portion of Dutch data. I therefore
|
|
|
38 |
}
|
39 |
```
|
40 |
|
41 |
+
## Usage
|
42 |
+
|
43 |
+
```python
|
44 |
+
from transformers import pipeline
|
45 |
+
|
46 |
+
|
47 |
+
# If you want to add a system message, add a dictionary with role "system". However, this will likely have little
|
48 |
+
# effect since the model was only finetuned using a single system message.
|
49 |
+
messages = [{"role": "user", "content": "Welke talen worden er in België gesproken?"}]
|
50 |
+
pipe = pipe = pipeline("text-generation", model="BramVanroy/Llama-2-13b-chat-dutch", device_map="auto")
|
51 |
+
|
52 |
+
# Just apply the template but leave the tokenization for the pipeline to do
|
53 |
+
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False)
|
54 |
+
|
55 |
+
# Only return the newly generated tokens, not prompt+new_tokens (return_full_text=False)
|
56 |
+
generated = pipe(prompt, do_sample=True, max_new_tokens=128, return_full_text=False)
|
57 |
+
|
58 |
+
generated[0]["generated_text"]
|
59 |
+
# ' De officiële talen van België zijn Nederlands, Frans en Duits. Daarnaast worden er nog een aantal andere talen gesproken, waaronder Engels, Spaans, Italiaans, Portugees, Turks, Arabisch en veel meer. '
|
60 |
+
```
|
61 |
+
|
62 |
## Model description
|
63 |
|
64 |
I could not get the original Llama 2 13B to produce much Dutch, even though the description paper indicates that it was trained on a (small) portion of Dutch data. I therefore
|