使用cuda:1报错: Expected all tensors to be on the same device
#5
by
yg1031
- opened
因为第一张卡显存不够,所以修改模型的device='cuda:1'
:
model = AutoGPTQForCausalLM.from_quantized(
'openbmb/MiniCPM-o-2_6-int4',
torch_dtype=torch.bfloat16,
device="cuda:1",
trust_remote_code=True,
disable_exllama=True,
disable_exllamav2=True
)
测试实时语音通话时报错:
2025-02-17 19:35:58.812-ERROR-[model_server_int4.py:560] - Error happened during generation: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)
Traceback (most recent call last):
File "/home/ps/yg/MiniCPM-o/web_demos/minicpm-o_2.6/model_server_int4.py", line 535, in generate
for r in self.minicpmo_model.streaming_generate(
File "/home/ps/.cache/huggingface/modules/transformers_modules/openbmb/MiniCPM-o-2_6-int4/ee9d0bc0e30e3487f8bf632650fdb83962d06492/modeling_minicpmo.py", line 1506, in _generate_mel_spec_audio_streaming
r = next(streamer)
File "/home/ps/.cache/huggingface/modules/transformers_modules/openbmb/MiniCPM-o-2_6-int4/ee9d0bc0e30e3487f8bf632650fdb83962d06492/modeling_minicpmo.py", line 1241, in llm_generate_chunk
outputs = self.llm.generate(
File "/home/ps/anaconda3/envs/yg-minicpm-o/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/ps/anaconda3/envs/yg-minicpm-o/lib/python3.12/site-packages/transformers/generation/utils.py", line 2024, in generate
result = self._sample(
File "/home/ps/anaconda3/envs/yg-minicpm-o/lib/python3.12/site-packages/transformers/generation/utils.py", line 2982, in _sample
outputs = self(**model_inputs, return_dict=True)
File "/home/ps/anaconda3/envs/yg-minicpm-o/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ps/anaconda3/envs/yg-minicpm-o/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ps/anaconda3/envs/yg-minicpm-o/lib/python3.12/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 1104, in forward
outputs = self.model(
File "/home/ps/anaconda3/envs/yg-minicpm-o/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ps/anaconda3/envs/yg-minicpm-o/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ps/anaconda3/envs/yg-minicpm-o/lib/python3.12/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 878, in forward
inputs_embeds = self.embed_tokens(input_ids)
File "/home/ps/anaconda3/envs/yg-minicpm-o/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ps/anaconda3/envs/yg-minicpm-o/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ps/anaconda3/envs/yg-minicpm-o/lib/python3.12/site-packages/torch/nn/modules/sparse.py", line 163, in forward
return F.embedding(
File "/home/ps/anaconda3/envs/yg-minicpm-o/lib/python3.12/site-packages/torch/nn/functional.py", line 2264, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)
This is because in it's base code it uses cuda:0 and you have used cuda:1 so make it to one device cuda:0