On Path to Multimodal Generalist: General-Level and General-Bench Paper • 2505.04620 • Published 4 days ago • 66
DC-SAM: In-Context Segment Anything in Images and Videos via Dual Consistency Paper • 2504.12080 • Published 26 days ago • 7
RelationBooth: Towards Relation-Aware Customized Object Generation Paper • 2410.23280 • Published Oct 30, 2024 • 1
MIMAFace: Face Animation via Motion-Identity Modulated Appearance Feature Learning Paper • 2409.15179 • Published Sep 23, 2024
PredFormer: Transformers Are Effective Spatial-Temporal Predictive Learners Paper • 2410.04733 • Published Oct 7, 2024
Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs Paper • 2501.04670 • Published Jan 8
Point Cloud Mamba: Point Cloud Learning via State Space Model Paper • 2403.00762 • Published Mar 1, 2024
An Open and Comprehensive Pipeline for Unified Object Grounding and Detection Paper • 2401.02361 • Published Jan 4, 2024
MambaAD: Exploring State Space Models for Multi-class Unsupervised Anomaly Detection Paper • 2404.06564 • Published Apr 9, 2024
SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal Model Paper • 2412.04292 • Published Dec 5, 2024
Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation Paper • 2410.10676 • Published Oct 14, 2024
Decouple and Track: Benchmarking and Improving Video Diffusion Transformers for Motion Transfer Paper • 2503.17350 • Published Mar 21 • 1
PVUW 2025 Challenge Report: Advances in Pixel-level Understanding of Complex Videos in the Wild Paper • 2504.11326 • Published 26 days ago • 6
Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding Paper • 2504.10465 • Published 27 days ago • 28
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer Paper • 2504.10462 • Published 27 days ago • 15
Bag of Design Choices for Inference of High-Resolution Masked Generative Transformer Paper • 2411.10781 • Published Nov 16, 2024
Advancing Fine-Grained Visual Understanding with Multi-Scale Alignment in Multi-Modal Models Paper • 2411.09691 • Published Nov 14, 2024
CoRe^2: Collect, Reflect and Refine to Generate Better and Faster Paper • 2503.09662 • Published Mar 12 • 34
POSTA: A Go-to Framework for Customized Artistic Poster Generation Paper • 2503.14908 • Published Mar 19