Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2504.16084

RL+reason model

about 7 hours ago

RL + Transformer = A General-Purpose Problem Solver

Paper • 2501.14176 • Published Jan 24 • 28
Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published Jan 27 • 30
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 120
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization

Paper • 2412.12098 • Published Dec 16, 2024 • 5

about 1 hour ago

MLLM-as-a-Judge for Image Safety without Human Labeling

Paper • 2501.00192 • Published Dec 31, 2024 • 31
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1 • 107
Xmodel-2 Technical Report

Paper • 2412.19638 • Published Dec 27, 2024 • 27
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Paper • 2412.18925 • Published Dec 25, 2024 • 101

about 6 hours ago

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Paper • 2504.13837 • Published 6 days ago • 94
TTRL: Test-Time Reinforcement Learning

Paper • 2504.16084 • Published 2 days ago • 70
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models

Paper • 2503.24235 • Published 24 days ago • 53

about 6 hours ago

TTRL: Test-Time Reinforcement Learning

Paper • 2504.16084 • Published 2 days ago • 70

about 20 hours ago

TTRL: Test-Time Reinforcement Learning

Paper • 2504.16084 • Published 2 days ago • 70

TTRL: Test-Time Reinforcement Learning

Paper • 2504.16084 • Published 2 days ago • 70
Learning to Reason under Off-Policy Guidance

Paper • 2504.14945 • Published 3 days ago • 66

about 13 hours ago

TTRL: Test-Time Reinforcement Learning

Paper • 2504.16084 • Published 2 days ago • 70
Describe Anything: Detailed Localized Image and Video Captioning

Paper • 2504.16072 • Published 2 days ago • 45

PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters

Paper • 2504.08791 • Published 17 days ago • 123
TTRL: Test-Time Reinforcement Learning

Paper • 2504.16084 • Published 2 days ago • 70

To Read collection

interesting papers to read

Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model

Paper • 2503.24290 • Published 24 days ago • 62
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders

Paper • 2503.18878 • Published about 1 month ago • 118
START: Self-taught Reasoner with Tools

Paper • 2503.04625 • Published Mar 6 • 111
DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18 • 122

about 23 hours ago

Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published Jan 27 • 30
DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18 • 122
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Paper • 2504.13837 • Published 6 days ago • 94
Learning to Reason under Off-Policy Guidance

Paper • 2504.14945 • Published 3 days ago • 66

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs