SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published 30 days ago • 180
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 11 items • Updated 8 days ago • 461
Arbitrary-steps Image Super-resolution via Diffusion Inversion Paper • 2412.09013 • Published Dec 12, 2024 • 13
ColPali: Efficient Document Retrieval with Vision Language Models Paper • 2407.01449 • Published Jun 27, 2024 • 48
World Model on Million-Length Video And Language With RingAttention Paper • 2402.08268 • Published Feb 13, 2024 • 39
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision Paper • 2312.09390 • Published Dec 14, 2023 • 33
Diffusion Model Alignment Using Direct Preference Optimization Paper • 2311.12908 • Published Nov 21, 2023 • 50
Adapting Large Language Models via Reading Comprehension Paper • 2309.09530 • Published Sep 18, 2023 • 78
Retentive Network: A Successor to Transformer for Large Language Models Paper • 2307.08621 • Published Jul 17, 2023 • 170
Evaluating the Ripple Effects of Knowledge Editing in Language Models Paper • 2307.12976 • Published Jul 24, 2023 • 12