-
LLMs for Engineering: Teaching Models to Design High Powered Rockets
Paper • 2504.19394 • Published • 13 -
Generative AI for Character Animation: A Comprehensive Survey of Techniques, Applications, and Future Directions
Paper • 2504.19056 • Published • 16 -
100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models
Paper • 2505.00551 • Published • 36 -
The Leaderboard Illusion
Paper • 2504.20879 • Published • 69
Collections
Discover the best community collections!
Collections including paper arxiv:2505.14683
-
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset
Paper • 2505.09568 • Published • 85 -
Qwen3 Technical Report
Paper • 2505.09388 • Published • 168 -
GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning
Paper • 2505.11049 • Published • 58 -
Emerging Properties in Unified Multimodal Pretraining
Paper • 2505.14683 • Published • 124
-
CoRAG: Collaborative Retrieval-Augmented Generation
Paper • 2504.01883 • Published • 10 -
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Paper • 2504.08837 • Published • 42 -
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Paper • 2504.10068 • Published • 30 -
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Paper • 2504.10481 • Published • 84
-
CoLLM: A Large Language Model for Composed Image Retrieval
Paper • 2503.19910 • Published • 14 -
LOCATEdit: Graph Laplacian Optimized Cross Attention for Localized Text-Guided Image Editing
Paper • 2503.21541 • Published • 1 -
HumanDreamer-X: Photorealistic Single-image Human Avatars Reconstruction via Gaussian Restoration
Paper • 2504.03536 • Published • 13 -
FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis
Paper • 2504.04842 • Published • 36
-
microsoft/Phi-4-multimodal-instruct
Automatic Speech Recognition • Updated • 426k • 1.41k -
microsoft/Phi-4-mini-instruct
Text Generation • Updated • 358k • 489 -
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion
Paper • 2503.11576 • Published • 108 -
Emerging Properties in Unified Multimodal Pretraining
Paper • 2505.14683 • Published • 124
-
MLLM-as-a-Judge for Image Safety without Human Labeling
Paper • 2501.00192 • Published • 31 -
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 107 -
Xmodel-2 Technical Report
Paper • 2412.19638 • Published • 27 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 105
-
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models
Paper • 2410.02740 • Published • 55 -
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging
Paper • 2410.01215 • Published • 33 -
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Paper • 2409.17146 • Published • 114 -
EuroLLM: Multilingual Language Models for Europe
Paper • 2409.16235 • Published • 26
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 59 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 53 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 43 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 61