Think Only When You Need with Large Hybrid-Reasoning Models Paper • 2505.14631 • Published 14 days ago • 19
Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning Paper • 2505.13866 • Published 15 days ago • 16
LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention Paper • 2502.14866 • Published Feb 20 • 13
Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities? Paper • 2502.12215 • Published Feb 17 • 16
Skrr: Skip and Re-use Text Encoder Layers for Memory Efficient Text-to-Image Generation Paper • 2502.08690 • Published Feb 12 • 44
Can LLMs Maintain Fundamental Abilities under KV Cache Compression? Paper • 2502.01941 • Published Feb 4 • 15
FastKV: KV Cache Compression for Fast Long-Context Processing with Token-Selective Propagation Paper • 2502.01068 • Published Feb 3 • 17
SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks Paper • 2402.09025 • Published Feb 14, 2024 • 8
Mixture-of-Agents Enhances Large Language Model Capabilities Paper • 2406.04692 • Published Jun 7, 2024 • 60