J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning Paper • 2505.10320 • Published 19 days ago • 22
J4R: Learning to Judge with Equivalent Initial State Group Relative Policy Optimization Paper • 2505.13346 • Published 15 days ago • 2
Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective Paper • 2505.15045 • Published 14 days ago • 53
d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning Paper • 2504.12216 • Published Apr 16 • 2
CS-Sum: A Benchmark for Code-Switching Dialogue Summarization and the Limits of Large Language Models Paper • 2505.13559 • Published 15 days ago • 13
Group Think: Multiple Concurrent Reasoning Agents Collaborating at Token Level Granularity Paper • 2505.11107 • Published 18 days ago • 28
view article Article Gotchas in Tokenizer Behavior Every Developer Should Know By qgallouedec • Apr 18 • 37