用vllm时应该是什么参数

#30

by daiwk - opened 10 days ago

10 days ago

需要加那个什么apply_chat_template不，我用的https://hf-mirror.com/lmstudio-community/QwQ-32B-GGUF 这里的gguf，好像没法抽出来tokenizer

prompt_final = [{"role": "user", "content": "xxx"}]

tensor_parallel_size=1
pipeline_parallel_size=1
ckpt_path="./QwQ-32B-Q4_K_M.gguf"

sampling_params = SamplingParams(temperature=0.6, max_tokens=1000)

batch_prompts = [prompt_final]

llm = LLM(model=ckpt_path, tensor_parallel_size=tensor_parallel_size, distributed_executor_backend="mp", pipeline_parallel_size=pipeline_parallel_size)#, 

preds = llm.chat(batch_prompts, sampling_params)
for output in preds:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}\n")

Eurayka

6 days ago

请问解决用VLLM部署了吗

daiwk

6 days ago

上面那个代码用vllm部署没有问题，现在主要问题是感觉qwq的cot特别长，相比r1来说

daiwk

6 days ago

有的问题r1-distill-qwen-32b可能五百个token就回答完了（cot+response），而qwq可能cot要想1000多个token还没想完，很影响性能

Gangda

6 days ago

vllm serve ~/.cache/models--Qwen--QwQ-32B/snapshots/f28e641280ed3228b25df45b02ce6526b472cbea/ --tokenizer ~/Downloads/QwQ-32B/ --host 0.0.0.0 --port 21434 --tensor-parallel-size 4 --max-model-len 34576 --served-model-name qwq-32b

xunberg

5 days ago

魔塔社区的官方公众号，已经发了使用 vLLM 和sgLang 的方式，大家可以去看下
vllm serve /ModelPath/QwQ-32B --port 8000 --reasoning-parser deepseek_r1 --max_model_len 4096 --enable-auto-tool-choice --tool-call-parser hermes

python -m sglang.launch_server --model-path /ModelPath/QwQ-32B --port 3001 --host 0.0.0.0 --tool-call-parser qwen25

warlock-edward

3 days ago

魔塔社区的官方公众号，已经发了使用 vLLM 和sgLang 的方式，大家可以去看下
vllm serve /ModelPath/QwQ-32B --port 8000 --reasoning-parser deepseek_r1 --max_model_len 4096 --enable-auto-tool-choice --tool-call-parser hermes

python -m sglang.launch_server --model-path /ModelPath/QwQ-32B --port 3001 --host 0.0.0.0 --tool-call-parser qwen25

The vllm documentation mentions that there are currently incompatibilities with structured output and Tool-Calling if inferential parsing is used, and this is actually the case as described in the documentation.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment