Add paper abstract to model card
#2
by
nielsr
HF Staff
- opened
README.md
CHANGED
@@ -1,15 +1,15 @@
|
|
1 |
---
|
2 |
-
license: mit
|
3 |
-
license_link: https://huggingface.co/microsoft/bitnet-b1.58-2B-4T/blob/main/LICENSE
|
4 |
language:
|
5 |
- en
|
|
|
|
|
|
|
6 |
pipeline_tag: text-generation
|
7 |
tags:
|
8 |
- chat
|
9 |
- bitnet
|
10 |
- text-generation
|
11 |
- large-language-model
|
12 |
-
library_name: transformers
|
13 |
---
|
14 |
|
15 |
# BitNet b1.58 2B4T - Scaling Native 1-bit LLM
|
@@ -22,6 +22,10 @@ Trained on a corpus of 4 trillion tokens, this model demonstrates that native 1-
|
|
22 |
|
23 |
➡️ **Official Inference Code:** [microsoft/BitNet (bitnet.cpp)](https://github.com/microsoft/BitNet)
|
24 |
|
|
|
|
|
|
|
|
|
25 |
## Model Variants
|
26 |
|
27 |
Several versions of the model weights are available on Hugging Face:
|
@@ -98,7 +102,8 @@ chat_input = tokenizer(prompt, return_tensors="pt").to(model.device)
|
|
98 |
# Generate response
|
99 |
chat_outputs = model.generate(**chat_input, max_new_tokens=50)
|
100 |
response = tokenizer.decode(chat_outputs[0][chat_input['input_ids'].shape[-1]:], skip_special_tokens=True) # Decode only the response part
|
101 |
-
print("
|
|
|
102 |
```
|
103 |
|
104 |
## How to Use (with `bitnet.cpp`)
|
@@ -141,4 +146,4 @@ BitNet b1.58 2B4T was evaluated against leading open-weight full-precision LLMs
|
|
141 |
The model weights and code are released under the [MIT License](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T/blob/main/LICENSE).
|
142 |
|
143 |
## Disclaimer
|
144 |
-
This model is intended for research and development purposes. While efforts have been made to align it using SFT and DPO, it may still produce outputs that are unexpected, biased, or inaccurate. Please use responsibly.
|
|
|
1 |
---
|
|
|
|
|
2 |
language:
|
3 |
- en
|
4 |
+
library_name: transformers
|
5 |
+
license: mit
|
6 |
+
license_link: https://huggingface.co/microsoft/bitnet-b1.58-2B-4T/blob/main/LICENSE
|
7 |
pipeline_tag: text-generation
|
8 |
tags:
|
9 |
- chat
|
10 |
- bitnet
|
11 |
- text-generation
|
12 |
- large-language-model
|
|
|
13 |
---
|
14 |
|
15 |
# BitNet b1.58 2B4T - Scaling Native 1-bit LLM
|
|
|
22 |
|
23 |
➡️ **Official Inference Code:** [microsoft/BitNet (bitnet.cpp)](https://github.com/microsoft/BitNet)
|
24 |
|
25 |
+
# Paper abstract
|
26 |
+
|
27 |
+
The abstract of the paper is the following:
|
28 |
+
|
29 |
## Model Variants
|
30 |
|
31 |
Several versions of the model weights are available on Hugging Face:
|
|
|
102 |
# Generate response
|
103 |
chat_outputs = model.generate(**chat_input, max_new_tokens=50)
|
104 |
response = tokenizer.decode(chat_outputs[0][chat_input['input_ids'].shape[-1]:], skip_special_tokens=True) # Decode only the response part
|
105 |
+
print("
|
106 |
+
Assistant Response:", response)
|
107 |
```
|
108 |
|
109 |
## How to Use (with `bitnet.cpp`)
|
|
|
146 |
The model weights and code are released under the [MIT License](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T/blob/main/LICENSE).
|
147 |
|
148 |
## Disclaimer
|
149 |
+
This model is intended for research and development purposes. While efforts have been made to align it using SFT and DPO, it may still produce outputs that are unexpected, biased, or inaccurate. Please use responsibly.
|