VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos Paper • 2505.23693 • Published 5 days ago • 56
Sherlock: Self-Correcting Reasoning in Vision-Language Models Paper • 2505.22651 • Published 6 days ago • 50
OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data Paper • 2505.18445 • Published 11 days ago • 63
PixelHacker: Image Inpainting with Structural and Semantic Consistency Paper • 2504.20438 • Published Apr 29 • 43
SEAP: Training-free Sparse Expert Activation Pruning Unlock the Brainpower of Large Language Models Paper • 2503.07605 • Published Mar 10 • 69
The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding Paper • 2502.08946 • Published Feb 13 • 194
Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models Paper • 2409.11136 • Published Sep 17, 2024 • 24
EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer Paper • 2409.10819 • Published Sep 17, 2024 • 20
ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds Paper • 2409.09213 • Published Sep 13, 2024 • 13
Guiding Vision-Language Model Selection for Visual Question-Answering Across Tasks, Domains, and Knowledge Types Paper • 2409.09269 • Published Sep 14, 2024 • 9
jina-embeddings-v3: Multilingual Embeddings With Task LoRA Paper • 2409.10173 • Published Sep 16, 2024 • 33
One missing piece in Vision and Language: A Survey on Comics Understanding Paper • 2409.09502 • Published Sep 14, 2024 • 26
Ferret: Federated Full-Parameter Tuning at Scale for Large Language Models Paper • 2409.06277 • Published Sep 10, 2024 • 16
Click2Mask: Local Editing with Dynamic Mask Generation Paper • 2409.08272 • Published Sep 12, 2024 • 6