Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models Paper • 2504.02821 • Published Apr 3 • 10
TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos Paper • 2504.17343 • Published Apr 24 • 12
ViSMaP: Unsupervised Hour-long Video Summarisation by Meta-Prompting Paper • 2504.15921 • Published Apr 22 • 7
Distilling semantically aware orders for autoregressive image generation Paper • 2504.17069 • Published Apr 23 • 6