Softpick: No Attention Sink, No Massive Activations with Rectified Softmax Paper • 2504.20966 • Published 3 days ago • 17
Softpick: No Attention Sink, No Massive Activations with Rectified Softmax Paper • 2504.20966 • Published 3 days ago • 17
The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs Paper • 2504.17768 • Published 8 days ago • 12