Meng Qu
mnqu
AI & ML interests
None yet
Recent Activity
upvoted
a
paper
15 days ago
AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via
Reinforcement Learning
upvoted
a
paper
28 days ago
RM-R1: Reward Modeling as Reasoning
upvoted
a
paper
2 months ago
Exploring Data Scaling Trends and Effects in Reinforcement Learning from
Human Feedback
Organizations
None yet
models
0
None public yet
datasets
0
None public yet