RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale Paper • 2505.03005 • Published 6 days ago • 26
[Llama 3.3] Model Rock Smashing Collection Merges of Recent Llama 3.3 models • 8 items • Updated 10 days ago