Clarification on Box Coordinates Scaling

#1
by demitsuki - opened

Hi Ferret Team,

Thanks for sharing this checkpoint!

I ran the following example on your demo, using Ferret-UI-Llama8b model and default parameters:
{
"id": 0,
"image": "appstore_reminders.png",
"image_h": 2532,
"image_w": 1170,
"conversations": [
{
"from": "human",
"value": "\nWhere is the Games Tab located?"
}
]
}

The response returned: Games Tab [[0, 906, 256, 965]]. However, this box doesn’t seem to align with the "Games Tab" in the image, whether scaled or unscaled.

Could you clarify the scaling logic applied to the box and how I should interpret it?

Thanks!
Demi

Hi @demitsuki sorry for the late reply!
You can check the scaling logic here: https://github.com/apple/ml-ferret/blob/main/ferretui/ferretui/eval/model_UI.py

To speed up:

# ratio
ratio_w = VOCAB_IMAGE_W * 1.0 / image_wdith
ratio_h = VOCAB_IMAGE_H * 1.0 /image_height
def get_bbox_coor(box, ratio_w, ratio_h):
    return box[0] * ratio_w, box[1] * ratio_h, box[2] * ratio_w, box[3] * ratio_h

an example:

from PIL import Image, ImageDraw

image_path = "temp_image.png"

box_coordinates = [195, 906, 402, 970]

image = Image.open(image_path)

image_width, image_height = image.size

adjusted_box = [
    int(box_coordinates[0] / 1000 * image_width),  
    int(box_coordinates[1] / 1000 * image_height),
    int(box_coordinates[2] / 1000 * image_width),  
    int(box_coordinates[3] / 1000 * image_height), 
]


draw = ImageDraw.Draw(image)

draw.rectangle(adjusted_box, outline="red", width=3)

image.show()

@jadechoghari great project. But I am having the same problem. The model is giving

{
"id": 0,
"image": "appstore_reminders.png",
"image_h": 2532,
"image_w": 1170,
"conversations": [
{
"from": "human",
"value": "\nWhere is the Games Tab located?"
}
]
}

The response returned: Games Tab [[0, 906, 256, 965]]. However, this box doesn’t seem to align with the "Games Tab" in the image, whether scaled or unscaled.

How do you scale it. Because the coordinate is wrong.

Thank you

Sign up or log in to comment