Running 270 270 Qwen2.5 Omni 7B Demo ๐ Generate text and speech responses from text, images, or audio input
Running on Zero 62 62 VLM R1 Referral Expression ๐ฌ Mark regions in images based on text descriptions