A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency Paper • 2505.01658 • Published May 3 • 35
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness Paper • 2205.14135 • Published May 27, 2022 • 13