Qwen
/

yangapku commited on
Commit
82d62bb
·
verified ·
1 Parent(s): 9e1b55c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -95,7 +95,7 @@ print("thinking content:", thinking_content)
95
  print("content:", content)
96
  ```
97
 
98
- For deployment, you can use `sglang>=0.4.6.post1` or `vllm>=0.8.4` or to create an OpenAI-compatible API endpoint:
99
  - SGLang:
100
  ```shell
101
  python -m sglang.launch_server --model-path Qwen/Qwen3-4B --reasoning-parser qwen3
@@ -105,7 +105,7 @@ For deployment, you can use `sglang>=0.4.6.post1` or `vllm>=0.8.4` or to create
105
  vllm serve Qwen/Qwen3-4B --enable-reasoning --reasoning-parser deepseek_r1
106
  ```
107
 
108
- For local use, applications such as llama.cpp, Ollama, LMStudio, and MLX-LM have also supported Qwen3.
109
 
110
  ## Switching Between Thinking and Non-Thinking Mode
111
 
@@ -279,7 +279,7 @@ YaRN is currently supported by several inference frameworks, e.g., `transformers
279
  {
280
  ...,
281
  "rope_scaling": {
282
- "type": "yarn",
283
  "factor": 4.0,
284
  "original_max_position_embeddings": 32768
285
  }
@@ -291,12 +291,12 @@ YaRN is currently supported by several inference frameworks, e.g., `transformers
291
 
292
  For `vllm`, you can use
293
  ```shell
294
- vllm serve ... --rope-scaling '{"type":"yarn","factor":4.0,"original_max_position_embeddings":32768}' --max-model-len 131072
295
  ```
296
 
297
  For `sglang`, you can use
298
  ```shell
299
- python -m sglang.launch_server ... --json-model-override-args '{"rope_scaling":{"type":"yarn","factor":4.0,"original_max_position_embeddings":32768}}'
300
  ```
301
 
302
  For `llama-server` from `llama.cpp`, you can use
 
95
  print("content:", content)
96
  ```
97
 
98
+ For deployment, you can use `sglang>=0.4.6.post1` or `vllm>=0.8.5` or to create an OpenAI-compatible API endpoint:
99
  - SGLang:
100
  ```shell
101
  python -m sglang.launch_server --model-path Qwen/Qwen3-4B --reasoning-parser qwen3
 
105
  vllm serve Qwen/Qwen3-4B --enable-reasoning --reasoning-parser deepseek_r1
106
  ```
107
 
108
+ For local use, applications such as Ollama, LMStudio, MLX-LM, llama.cpp, and KTransformers have also supported Qwen3.
109
 
110
  ## Switching Between Thinking and Non-Thinking Mode
111
 
 
279
  {
280
  ...,
281
  "rope_scaling": {
282
+ "rope_type": "yarn",
283
  "factor": 4.0,
284
  "original_max_position_embeddings": 32768
285
  }
 
291
 
292
  For `vllm`, you can use
293
  ```shell
294
+ vllm serve ... --rope-scaling '{"rope_type":"yarn","factor":4.0,"original_max_position_embeddings":32768}' --max-model-len 131072
295
  ```
296
 
297
  For `sglang`, you can use
298
  ```shell
299
+ python -m sglang.launch_server ... --json-model-override-args '{"rope_scaling":{"rope_type":"yarn","factor":4.0,"original_max_position_embeddings":32768}}'
300
  ```
301
 
302
  For `llama-server` from `llama.cpp`, you can use