参照github给出的官方推理代码，我设计了我的推理函数，但是在运行过程中出现了以下报错，这是模型本身文件的问题吗？

我的代码：
import torch
from transformers import AutoTokenizer, AutoProcessor, AutoModelForImageTextToText, Qwen2VLForConditionalGeneration
from PIL import Image
import base64
from qwen_vl_utils import process_vision_info

Load the model in half-precision on the available device(s)

path="/home/chensq/svg/pythonProject/model/Qwen/Qwen/Qwen2-VL-2B"
model = Qwen2VLForConditionalGeneration.from_pretrained(path).to("cuda:0")
processor = AutoProcessor.from_pretrained(path)

def image_base64(image_path):
with open(image_path, "rb") as img_file:
return base64.b64encode(img_file.read()).decode("utf-8")

sy_prompt = ''' A conversation between User and Assistant. The user provides an image and image description, and the Assistant generates corresponding SVG code that strictly follows the required format. The Assistant must follow these SVG formatting rules: 1. Always start with this exact opening tag: . SVG path d command strict parsing rules: 1. Command letters and numbers must have spaces between them: e.g., "M 100 100" not "M100 100" 2. All values must be separated by spaces, not commas: e.g., "L 200 300" not "L200,300" 3. Don't mix relative and absolute commands, preferably use only uppercase commands (M, L, H, V, C, S, Q, T, A, Z) 4. Pay special attention to elliptical arc command (A) format: separate all 7 parameters with spaces, ensure flag parameters are integers 0 or 1, e.g., A 50 50 0 1 1 100 100 5. Each path must start with an M command, and each subpath must start with a separate M command 6. Avoid using abbreviated forms of path commands (omitting repeated command letters) 7. Use complete form for each command: e.g., M followed by 2 parameters, L followed by 2 parameters, such as "M 100 100 L 200 200" 8. Don't use scientific notation for values, use regular numbers 9. Ensure each arc command has all 7 complete parameters 10. Bezier curve commands need complete parameters: C needs 6, S needs 4, Q needs 4, T needs 2. Example of correct format: . The assistant first thinks about the reasoning process in the mind (limited to 100-300 words), and then provides the user with the answer. The reasoning process and answer are enclosed within and tags, respectively: [concise analysis of the image and reasoning about how to convert it to SVG paths, 100-300 words] , [complete SVG code in the required format] . max_pixels: 4194304
'''
my_prompt = ''' The image is a simple line drawing of a stylized ghost. The ghost has a rounded top with a large circle representing an eye in the upper half and a smaller circle below it, possibly indicating a mouth. The bottom of the ghost is depicted with wavy lines to give the impression of a floating or ethereal form. The drawing is in black and white, with a clear, uncluttered background. The style is reminiscent of icons or symbols used in digital media for representing ghosts or spirits in a friendly, cartoonish manner.'''
image_path = "/home/chensq/svg/pythonProject/my-open-r1/data/vgen/images/svg_1860.png"
image=image_base64(image_path)

messages = [

{

"role": "user",

"content": [

{"type": "image", "image":f"data:image/png;base64,{image}"},

{"type": "text", "text": "describe the image"},

],

},

]

messages = [
{
"role": "user",
"content": [
{"type": "image", "image":"file:///home/chensq/svg/pythonProject/my-open-r1/data/vgen/images/svg_1860.png"},
{"type": "text", "text": "describe the image"},
],
},
]

messages = [

{

"role": "user",

"content": [

{"type": "image", "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"},

{"type": "text", "text": "describe the image"},

],

},

]

image = Image.open("/data/home13/chensq/iconfont/train/train_5232/C端图标_43795_icon_26.png")

Preparation for inference

text = processor.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
text=[text],
images=image_inputs,
videos=video_inputs,
padding=True,
return_tensors="pt",
)
inputs = inputs.to("cuda:0")

Inference: Generation of the output

修改生成参数

generated_ids = model.generate(
**inputs,
max_new_tokens=6280,
do_sample=True, # 可以尝试添加这个参数
temperature=0.7, # 控制生成的随机性
use_cache=True # 确保使用缓存
)
generated_ids_trimmed = [
out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)

下面是出现的报错：
File "/home/chensq/anaconda3/envs/test/lib/python3.10/site-packages/transformers/generation/utils.py", line 2326, in generate
result = self._sample(
File "/home/chensq/anaconda3/envs/test/lib/python3.10/site-packages/transformers/generation/utils.py", line 3279, in _sample
model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
File "/home/chensq/anaconda3/envs/test/lib/python3.10/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1792, in prepare_inputs_for_generation
model_inputs = super().prepare_inputs_for_generation(
File "/home/chensq/anaconda3/envs/test/lib/python3.10/site-packages/transformers/generation/utils.py", line 419, in prepare_inputs_for_generation
or cache_position[-1] >= input_ids.shape[1] # Exception 3
IndexError: index -1 is out of bounds for dimension 0 with size 0
(test) chensq@lab-48:~$

Qwen
/

Qwen2-VL-2B

reference error

Load the model in half-precision on the available device(s)

messages = [

{

"role": "user",

"content": [

{"type": "image", "image":f"data:image/png;base64,{image}"},

{"type": "text", "text": "describe the image"},

],

},

]

messages = [

{

"role": "user",

"content": [

{"type": "image", "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"},

{"type": "text", "text": "describe the image"},

],

},

]

image = Image.open("/data/home13/chensq/iconfont/train/train_5232/C端图标_43795_icon_26.png")

Preparation for inference

Inference: Generation of the output

修改生成参数