23\ Info - pages
1. TinyGSM: achieving >80% on GSM8k with small language models
Paper • 2312.09241 • Published • 39Note 2. ShortGPT: Layers in Large Language Models are More Redundant Than You Expect
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect
Paper • 2403.03853 • Published • 65Note 3. Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction
Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction
Paper • 2403.18795 • Published • 21Note 4. Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models
Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models
Paper • 2404.04478 • Published • 13Note 5. Rho-1: Not All Tokens Are What You Need
Rho-1: Not All Tokens Are What You Need
Paper • 2404.07965 • Published • 94Note 6. Universal Guidance for Diffusion Models
Universal Guidance for Diffusion Models
Paper • 2302.07121 • PublishedNote 7. 2BP: 2-Stage Backpropagation
2BP: 2-Stage Backpropagation
Paper • 2405.18047 • Published • 27Note 8. LinFusion: 1 GPU, 1 Minute, 16K Image
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper • 2409.02097 • Published • 35Note 9. LVCD: Reference-based Lineart Video Colorization with Diffusion Models
LVCD: Reference-based Lineart Video Colorization with Diffusion Models
Paper • 2409.12960 • Published • 25Note 10. GRIN: GRadient-INformed MoE
GRIN: GRadient-INformed MoE
Paper • 2409.12136 • Published • 16Note 11. Addition is All You Need for Energy-efficient Language Models
Addition is All You Need for Energy-efficient Language Models
Paper • 2410.00907 • Published • 151Note 12. Reinforcement Learning Textbook
Reinforcement Learning Textbook
Paper • 2201.09746 • PublishedNote 13. Training-Free Long-Context Scaling of Large Language Models
Training-Free Long-Context Scaling of Large Language Models
Paper • 2402.17463 • Published • 25