Update README.md
Browse files
README.md
CHANGED
@@ -65,7 +65,9 @@ YiXin-Distill-Qwen-72B was benchmarked against multiple models, including QwQ-32
|
|
65 |
|
66 |
YiXin-Distill-Qwen-72B demonstrates significant improvements across mathematical reasoning and general knowledge tasks.
|
67 |
|
68 |
-
##
|
|
|
|
|
69 |
|
70 |
```python
|
71 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
@@ -97,6 +99,34 @@ generated_ids = [
|
|
97 |
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
98 |
```
|
99 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
100 |
## Limitations
|
101 |
|
102 |
Despite its strong performance, YiXin-Distill-Qwen-72B has certain limitations:
|
|
|
65 |
|
66 |
YiXin-Distill-Qwen-72B demonstrates significant improvements across mathematical reasoning and general knowledge tasks.
|
67 |
|
68 |
+
## How to Run Locally
|
69 |
+
|
70 |
+
### Hugging Face's Transformers
|
71 |
|
72 |
```python
|
73 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
|
99 |
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
100 |
```
|
101 |
|
102 |
+
### vLLM or SGLang
|
103 |
+
|
104 |
+
For instance, you can easily start a service using [vLLM](https://github.com/vllm-project/vllm):
|
105 |
+
|
106 |
+
```shell
|
107 |
+
vllm serve YiXin-AILab/YiXin-Distill-Qwen-72B --tensor-parallel-size 4 --max-model-len 32768 --enforce-eager
|
108 |
+
```
|
109 |
+
|
110 |
+
You can also easily start a service using [SGLang](https://github.com/sgl-project/sglang)
|
111 |
+
|
112 |
+
```bash
|
113 |
+
python3 -m sglang.launch_server --model YiXin-AILab/YiXin-Distill-Qwen-72B --trust-remote-code --tp 4 --port 8000
|
114 |
+
```
|
115 |
+
|
116 |
+
Then you can access the Chat API by:
|
117 |
+
|
118 |
+
```bash
|
119 |
+
curl http://localhost:8000/v1/chat/completions \
|
120 |
+
-H "Content-Type: application/json" \
|
121 |
+
-d '{
|
122 |
+
"model": "YiXin-AILab/YiXin-Distill-Qwen-72B",
|
123 |
+
"messages": [
|
124 |
+
{"role": "system", "content": "You are a helpful and harmless assistant. You are Qwen developed by Alibaba. You should think step-by-step."},
|
125 |
+
{"role": "user", "content": "8+8=?"}
|
126 |
+
]
|
127 |
+
}'
|
128 |
+
```
|
129 |
+
|
130 |
## Limitations
|
131 |
|
132 |
Despite its strong performance, YiXin-Distill-Qwen-72B has certain limitations:
|