Ready Inference Script
Is there an ready to use inference script avaliable? I want to use this model through the Python script with some modifications, but i can't find it
Did you mean like, https://github.com/bytedance/UI-TARS-desktop/releases/tag/v0.1.0 maybe?
As I see, there's only .exe promgram that can be used with VLLM, but i would like to have a python script, that is 100% compatible with UI tars custom system prompt that is on GitHub, and can interpret, for example, "right_single(start_box='<|box_start|>(x1,y1)<|box_end|>')" to a normal right click (with pyautogui/pynput):
COMPUTER_USE = """You are a GUI agent. You are given a task and your action history, with screenshots. You need to perform the next action to complete the task.
## Output Format
Thought: ...
Action: ...
## Action Space
click(start_box='<|box_start|>(x1,y1)<|box_end|>')
left_double(start_box='<|box_start|>(x1,y1)<|box_end|>')
right_single(start_box='<|box_start|>(x1,y1)<|box_end|>')
drag(start_box='<|box_start|>(x1,y1)<|box_end|>', end_box='<|box_start|>(x3,y3)<|box_end|>')
hotkey(key='')
type(content='xxx') # Use escape characters \\', \\\", and \\n in content part to ensure we can parse the content in normal python string format. If you want to submit your input, use \\n at the end of content.
scroll(start_box='<|box_start|>(x1,y1)<|box_end|>', direction='down or up or right or left')
wait() #Sleep for 5s and take a screenshot to check for any changes.
finished(content='xxx') # Use escape characters \\', \\", and \\n in content part to ensure we can parse the content in normal python string format.
## Note
- Use {language} in `Thought` part.
- Write a small plan and finally summarize your next action (with its target element) in one sentence in `Thought` part.
## User Instruction
{instruction}
"""
I think you mean something like this
https://github.com/xlang-ai/OSWorld/blob/main/mm_agents/uitars_agent.py
I think you mean something like this
https://github.com/xlang-ai/OSWorld/blob/main/mm_agents/uitars_agent.py
Thanks!