Can't reproduce given example (no meaningful output)

#8
by pzarzycki - opened

I tried the model using provided example, but it doesn't generate any meaningfull response:

model_id = "llava-hf/llava-onevision-qwen2-0.5b-ov-hf"
...
inputs = (processor.apply_chat_template(messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt")
          .to('cuda'))
output = model.generate(**inputs, max_new_tokens=250)
print(processor.decode(output[0][2:], skip_special_tokens=True))
What is shown in this image? Name all the objects!assistant
The object is the object is the object is the object is the object is the object is the object is the object is the object ....
Llava Hugging Face org

Hey! I tried with the demo code on main branch and it worked for me. Which version of transformers you are using? Can you run generation by applying chat template in tokenize=False mode and then calling the processor directly?

Code that I used to check:

import requests
from PIL import Image

import torch
from transformers import AutoProcessor, LlavaOnevisionForConditionalGeneration

model_id = "llava-hf/llava-onevision-qwen2-0.5b-ov-hf"
model = LlavaOnevisionForConditionalGeneration.from_pretrained(
    model_id, 
    torch_dtype=torch.float16, 
    low_cpu_mem_usage=True, 
).to(0)

processor = AutoProcessor.from_pretrained(model_id)

conversation = [
    {

      "role": "user",
      "content": [
          {"type": "text", "text": "What are these?"},
          {"type": "image", "url": "http://images.cocodataset.org/val2017/000000039769.jpg"},
        ],
    },
]
inputs = processor.apply_chat_template(
    conversation,
    add_generation_prompt=True,
    tokenize=True,
    return_tensors="pt",
    return_dict=True,
).to(0, torch.float16)

output = model.generate(**inputs, max_new_tokens=200, do_sample=False)
print(processor.decode(output[0][2:], skip_special_tokens=True))
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment