9 79 155

YangWang92

yangwang92

AI & ML interests

None yet

Recent Activity

liked a dataset 1 day ago

nvidia/HelpSteer2

liked a model 1 day ago

nvidia/Llama-3_3-Nemotron-Super-49B-v1

liked a dataset 3 days ago

rwkv-x-dev/rwkv-world-3-subsample-preview

View all activity

Organizations

yangwang92's activity

upvoted a paper 14 days ago

Process-based Self-Rewarding Language Models

Paper • 2503.03746 • Published 15 days ago • 36

upvoted a collection 18 days ago

Qwen2.5-1M

Collection

The long-context version of Qwen2.5, supporting 1M-token context lengths • 3 items • Updated 23 days ago • 107

upvoted a paper 28 days ago

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published 29 days ago • 166

upvoted a paper about 1 month ago

Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model

Paper • 2502.10248 • Published Feb 14 • 51

upvoted a collection about 1 month ago

CodeI/O

Collection

Collection for CodeI/O @ https://codei-o.github.io/ • 15 items • Updated Feb 13 • 6

upvoted a paper about 1 month ago

CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction

Paper • 2502.07316 • Published Feb 11 • 47

upvoted an article about 1 month ago

Article

Open-R1: a fully open reproduction of DeepSeek-R1

Jan 28

• 819

upvoted 2 papers about 1 month ago

Matryoshka Quantization

Paper • 2502.06786 • Published Feb 10 • 30

QuEST: Stable Training of LLMs with 1-Bit Weights and Activations

Paper • 2502.05003 • Published Feb 7 • 43

upvoted a collection about 2 months ago

Reasoning Datasets

Collection

Distilled synthetic Reasoning datasets • 7 items • Updated Feb 2 • 57

upvoted 7 papers about 2 months ago

Proximal Policy Optimization Algorithms

Paper • 1707.06347 • Published Jul 20, 2017 • 8

RL + Transformer = A General-Purpose Problem Solver

Paper • 2501.14176 • Published Jan 24 • 25

Sigma: Differential Rescaling of Query, Key and Value for Efficient Language Models

Paper • 2501.13629 • Published Jan 23 • 44

Kimi k1.5: Scaling Reinforcement Learning with LLMs

Paper • 2501.12599 • Published Jan 22 • 105

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 354

Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models

Paper • 2501.11873 • Published Jan 21 • 63

Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation

Paper • 2501.12202 • Published Jan 21 • 36