RAHUL YASHWANTKUMAR GUPTA
ryg81
AI & ML interests
None yet
Recent Activity
reacted
to
KaiChen1998's
post
with โค๏ธ
about 20 hours ago
๐ข Our EMOVA paper has been accepted by CVPR 2025, and we are glad to release all resources, including code (training & inference), datasets (training & evaluation), and checkpoints (EMOVA-3B/7B/72B)!
๐ค EMOVA is a novel end-to-end omni-modal LLM that can see, hear and speak. Given omni-modal (i.e., textual, visual and speech) inputs, EMOVA can generate both textual and speech responses with vivid emotional controls by utilizing the speech decoder and a style controller.
โจ EMOVA Highlights
โ
State-of-the-art omni-modality: EMOVA achieves SoTA comparable results on both vision-language and speech benchmarks simultaneously.
โ
Device adaptation: our codebase supports training/inference on both NVIDIA GPUs (e.g., A800 & H20) and Ascend NPUs (e.g., 910B3)!
โ
Modular design: we integrate multiple implementations of vision encoder, vision projector, and language model, even including the most recent DeepSeekMoE-tiny!
๐ฅ You are all welcome to try and star!
- Project page: https://emova-ollm.github.io/
- Github: https://github.com/emova-ollm/EMOVA
- Demo: https://huggingface.co/spaces/Emova-ollm/EMOVA-demo
new activity
1 day ago
spacepxl/Wan2.1-control-loras:Got an error
new activity
2 days ago
THUDM/CogView4-6B:Why not small fp8 models
Organizations
None yet
Collections
18
models
None public yet
datasets
None public yet