Collections
Discover the best community collections!
Collections including paper arxiv:2505.12504
-
CPGD: Toward Stable Rule-based Reinforcement Learning for Language Models
Paper • 2505.12504 • Published • 23 -
Web-Shepherd: Advancing PRMs for Reinforcing Web Agents
Paper • 2505.15277 • Published • 98 -
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT
Paper • 2505.00703 • Published • 42 -
OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning
Paper • 2505.08617 • Published • 41
-
CoRAG: Collaborative Retrieval-Augmented Generation
Paper • 2504.01883 • Published • 10 -
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Paper • 2504.08837 • Published • 43 -
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Paper • 2504.10068 • Published • 30 -
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Paper • 2504.10481 • Published • 84
-
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 128 -
VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks
Paper • 2504.05118 • Published • 25 -
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning
Paper • 2504.08600 • Published • 29 -
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce
Paper • 2504.11343 • Published • 17
-
LoRA+: Efficient Low Rank Adaptation of Large Models
Paper • 2402.12354 • Published • 6 -
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 22 -
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Paper • 2402.13249 • Published • 13 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 70