-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 28 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 13 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
Collections
Discover the best community collections!
Collections including paper arxiv:2504.17192
-
MLLM-as-a-Judge for Image Safety without Human Labeling
Paper • 2501.00192 • Published • 31 -
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 108 -
Xmodel-2 Technical Report
Paper • 2412.19638 • Published • 27 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 102
-
ReZero: Enhancing LLM search ability by trying one-more-time
Paper • 2504.11001 • Published • 14 -
FonTS: Text Rendering with Typography and Style Controls
Paper • 2412.00136 • Published -
GenEx: Generating an Explorable World
Paper • 2412.09624 • Published • 97 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 149
-
PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters
Paper • 2504.08791 • Published • 125 -
TTRL: Test-Time Reinforcement Learning
Paper • 2504.16084 • Published • 91 -
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
Paper • 2504.17192 • Published • 90
-
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
Paper • 2503.24290 • Published • 63 -
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 118 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 111 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 122
-
BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing
Paper • 2503.13434 • Published • 26 -
Edit Transfer: Learning Image Editing via Vision In-Context Relations
Paper • 2503.13327 • Published • 29 -
WideRange4D: Enabling High-Quality 4D Reconstruction with Wide-Range Movements and Scenes
Paper • 2503.13435 • Published • 17 -
MediaTek-Research/Llama-Breeze2-8B-Instruct
Updated • 2.45k • 35
-
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search
Paper • 2502.02508 • Published • 23 -
Chain of Draft: Thinking Faster by Writing Less
Paper • 2502.18600 • Published • 48 -
Chain of Agents: Large Language Models Collaborating on Long-Context Tasks
Paper • 2406.02818 • Published -
Chain-of-Retrieval Augmented Generation
Paper • 2501.14342 • Published • 56
-
RuCCoD: Towards Automated ICD Coding in Russian
Paper • 2502.21263 • Published • 133 -
Unified Reward Model for Multimodal Understanding and Generation
Paper • 2503.05236 • Published • 123 -
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
Paper • 2503.05179 • Published • 46 -
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
Paper • 2503.05592 • Published • 27