MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder Paper • 2505.07916 • Published 22 days ago • 123
Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence Paper • 2505.23747 • Published 5 days ago • 65
SoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative Pipeline Paper • 2505.19314 • Published 9 days ago • 4
OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data Paper • 2505.18445 • Published 11 days ago • 63
Shifting AI Efficiency From Model-Centric to Data-Centric Compression Paper • 2505.19147 • Published 9 days ago • 141
Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning Paper • 2505.16410 • Published 12 days ago • 55
Vox-Profile: A Speech Foundation Model Benchmark for Characterizing Diverse Speaker and Speech Traits Paper • 2505.14648 • Published 14 days ago • 8