Emova-ollm
/

emova_speech_tokenizer_hf

Feature Extraction

EMOVASpeechTokenizer

Model card Files Files and versions Community

KaiChen1998 commited on 18 days ago

Commit

33e645f

·

verified ·

1 Parent(s): 362cf7e

Update README.md

Files changed (1) hide show

README.md +8 -0

README.md CHANGED Viewed

@@ -11,8 +11,16 @@ base_model:
 - Emova-ollm/emova_speech_tokenizer
 ---
 # EMOVA Speech Tokenizer HF
 ## Model Summary
 This repo contains the discrete speech tokenizer used to train the [EMOVA](https://emova-ollm.github.io/) series of models.  With a semantic-acoustic disentangled design, it not only facilitates seamless omni-modal alignment among vision, language and audio modalities, but also empowers flexible speech style controls including emotions and pitches. It contains a **speech-to-unit (S2U)** tokenizer to convert speech signals to discrete speech units, and a **unit-to-speech (U2S)** de-tokenizer to reconstruct speech signals from the speech units.

 - Emova-ollm/emova_speech_tokenizer
 ---
+<div align="center">
+<img src="./examples/images/emova_icon2.png" width="300em"></img>
 # EMOVA Speech Tokenizer HF
+🤗 [HuggingFace](https://huggingface.co/Emova-ollm/emova_speech_tokenizer_hf) | 💻 [EMOVA-Main-Repo](https://github.com/emova-ollm/EMOVA) | 📄 [EMOVA-Paper](https://arxiv.org/abs/2409.18042) | 🌐 [Project-Page](https://emova-ollm.github.io/)
+</div>
 ## Model Summary
 This repo contains the discrete speech tokenizer used to train the [EMOVA](https://emova-ollm.github.io/) series of models.  With a semantic-acoustic disentangled design, it not only facilitates seamless omni-modal alignment among vision, language and audio modalities, but also empowers flexible speech style controls including emotions and pitches. It contains a **speech-to-unit (S2U)** tokenizer to convert speech signals to discrete speech units, and a **unit-to-speech (U2S)** de-tokenizer to reconstruct speech signals from the speech units.