29 248 28

Orr Zohar PRO

orrzohar

https://orrzohar.github.io

AI & ML interests

Large Multi-Modal Models, Foundation Models, Video Understanding

Recent Activity

upvoted a paper about 21 hours ago

Transformers without Normalization

upvoted a paper 2 days ago

Long Context Tuning for Video Generation

upvoted a paper 2 days ago

VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search

View all activity

Organizations

orrzohar's activity

upvoted a paper about 21 hours ago

Transformers without Normalization

Paper • 2503.10622 • Published 3 days ago • 85

upvoted 4 papers 2 days ago

Long Context Tuning for Video Generation

Paper • 2503.10589 • Published 3 days ago • 13

VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search

Paper • 2503.10582 • Published 3 days ago • 16

Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models

Paper • 2503.09669 • Published 4 days ago • 31

CoSTAast: Cost-Sensitive Toolpath Agent for Multi-turn Image Editing

Paper • 2503.10613 • Published 3 days ago • 61

upvoted a paper 3 days ago

VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary

Paper • 2503.09402 • Published 4 days ago • 6

upvoted a paper 4 days ago

Video Action Differencing

Paper • 2503.07860 • Published 6 days ago • 28

upvoted 3 papers 9 days ago

upvoted an article 12 days ago

Article

A Deepdive into Aya Vision: Advancing the Frontier of Multilingual Multimodality

13 days ago

• 66

upvoted a collection 12 days ago

C4AI Aya Vision

Collection

Aya Vision is a state-of-the-art family of vision models that brings multimodal capabilities to 23 languages. • 5 items • Updated 12 days ago • 64

upvoted 2 papers 19 days ago

Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models

Paper • 2502.16033 • Published 23 days ago • 16

Audio-FLAN: A Preliminary Release

Paper • 2502.16584 • Published 21 days ago • 34

upvoted a collection 23 days ago

SigLIP2

Collection

36 items • Updated 4 days ago • 64

upvoted a paper 23 days ago

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published 24 days ago • 129

upvoted a collection 24 days ago

SmolVLM2 📺 Smallest video LM ever 🤏🏻

Collection

11 items • Updated 19 days ago • 59

upvoted an article 24 days ago

Article

SmolVLM2: Bringing Video Understanding to Every Device

25 days ago

• 207

upvoted a paper 25 days ago

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published 28 days ago • 143

upvoted a paper about 1 month ago

Distillation Scaling Laws

Paper • 2502.08606 • Published Feb 12 • 46