KaiChen1998 commited on
Commit
33e645f
·
verified ·
1 Parent(s): 362cf7e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -0
README.md CHANGED
@@ -11,8 +11,16 @@ base_model:
11
  - Emova-ollm/emova_speech_tokenizer
12
  ---
13
 
 
 
 
 
14
  # EMOVA Speech Tokenizer HF
15
 
 
 
 
 
16
  ## Model Summary
17
 
18
  This repo contains the discrete speech tokenizer used to train the [EMOVA](https://emova-ollm.github.io/) series of models. With a semantic-acoustic disentangled design, it not only facilitates seamless omni-modal alignment among vision, language and audio modalities, but also empowers flexible speech style controls including emotions and pitches. It contains a **speech-to-unit (S2U)** tokenizer to convert speech signals to discrete speech units, and a **unit-to-speech (U2S)** de-tokenizer to reconstruct speech signals from the speech units.
 
11
  - Emova-ollm/emova_speech_tokenizer
12
  ---
13
 
14
+ <div align="center">
15
+
16
+ <img src="./examples/images/emova_icon2.png" width="300em"></img>
17
+
18
  # EMOVA Speech Tokenizer HF
19
 
20
+ 🤗 [HuggingFace](https://huggingface.co/Emova-ollm/emova_speech_tokenizer_hf) | 💻 [EMOVA-Main-Repo](https://github.com/emova-ollm/EMOVA) | 📄 [EMOVA-Paper](https://arxiv.org/abs/2409.18042) | 🌐 [Project-Page](https://emova-ollm.github.io/)
21
+
22
+ </div>
23
+
24
  ## Model Summary
25
 
26
  This repo contains the discrete speech tokenizer used to train the [EMOVA](https://emova-ollm.github.io/) series of models. With a semantic-acoustic disentangled design, it not only facilitates seamless omni-modal alignment among vision, language and audio modalities, but also empowers flexible speech style controls including emotions and pitches. It contains a **speech-to-unit (S2U)** tokenizer to convert speech signals to discrete speech units, and a **unit-to-speech (U2S)** de-tokenizer to reconstruct speech signals from the speech units.