DiarizationLM: Speaker Diarization Post-Processing with Large Language Models Paper • 2401.03506 • Published Jan 7, 2024 • 14
The VoxCeleb Speaker Recognition Challenge: A Retrospective Paper • 2408.14886 • Published Aug 27, 2024 • 11
Can you Remove the Downstream Model for Speaker Recognition with Self-Supervised Speech Features? Paper • 2402.00340 • Published Feb 1, 2024 • 2
SuperBPE Collection SuperBPE tokenizers and models trained with them • 8 items • Updated 24 days ago • 14
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders Paper • 2410.22366 • Published Oct 28, 2024 • 83
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27, 2024 • 615
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model Paper • 2402.03766 • Published Feb 6, 2024 • 15
Gemini vs GPT-4V: A Preliminary Comparison and Combination of Vision-Language Models Through Qualitative Cases Paper • 2312.15011 • Published Dec 22, 2023 • 18
Boundary Attention: Learning to Find Faint Boundaries at Any Resolution Paper • 2401.00935 • Published Jan 1, 2024 • 18
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis Paper • 2311.12454 • Published Nov 21, 2023 • 31
PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction Paper • 2311.12024 • Published Nov 20, 2023 • 20
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning Paper • 2311.12631 • Published Nov 21, 2023 • 15
Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression Paper • 2311.10794 • Published Nov 17, 2023 • 28
Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning Paper • 2311.10709 • Published Nov 17, 2023 • 26
UnifiedVisionGPT: Streamlining Vision-Oriented AI through Generalized Multimodal Framework Paper • 2311.10125 • Published Nov 16, 2023 • 6