readme
Browse files
README.md
CHANGED
@@ -15,15 +15,13 @@ short_description: Conversational speech generation
|
|
15 |
|
16 |
**2025/03/13** - We are releasing the 1B CSM variant. Code is available on GitHub: [SesameAILabs/csm](https://github.com/SesameAILabs/csm). Checkpoint is [hosted on HuggingFace](https://huggingface.co/sesame/csm-1b).
|
17 |
|
18 |
-
Try out the interactive demo of our fine-tuned version [sesame.com/voicedemo](https://www.sesame.com/voicedemo).
|
19 |
-
|
20 |
-
Generate from the open-source base model [hosted on HuggingFace](https://huggingface.co/spaces/sesame/csm-1b).
|
21 |
-
|
22 |
---
|
23 |
|
24 |
-
CSM (Conversational Speech Model) is a speech generation model from [Sesame](sesame.com) that generates RVQ audio codes from text and audio inputs.
|
|
|
|
|
25 |
|
26 |
-
|
27 |
|
28 |
## Misuse and abuse ⚠️
|
29 |
|
@@ -40,4 +38,4 @@ Conversational prompts are from the [EdAcc dataset](https://groups.inf.ed.ac.uk/
|
|
40 |
Read speech prompts are form the [LibriTTS-R dataset](https://google.github.io/df-conformer/librittsr/)
|
41 |
|
42 |
**Authors**
|
43 |
-
Johan Schalkwyk, Ankit Kumar, Dan Lyth, Sefik Emre Eskimez, Zack Hodari, Cinjon Resnick, Ramon Sanabria, Raven Jiang, and the Sesame team.
|
|
|
15 |
|
16 |
**2025/03/13** - We are releasing the 1B CSM variant. Code is available on GitHub: [SesameAILabs/csm](https://github.com/SesameAILabs/csm). Checkpoint is [hosted on HuggingFace](https://huggingface.co/sesame/csm-1b).
|
17 |
|
|
|
|
|
|
|
|
|
18 |
---
|
19 |
|
20 |
+
CSM (Conversational Speech Model) is a speech generation model from [Sesame](sesame.com) that generates RVQ audio codes from text and audio inputs. The model architecture employs a [Llama](https://www.llama.com/) backbone and a smaller audio decoder that produces [Mimi](https://huggingface.co/kyutai/mimi) audio codes.
|
21 |
+
|
22 |
+
A fine-tuned variant of CSM powers the [interactive voice demo](https://www.sesame.com/voicedemo) shown in our [blog post](https://www.sesame.com/research/crossing_the_uncanny_valley_of_voice).
|
23 |
|
24 |
+
A hosted [HuggingFace space](https://huggingface.co/spaces/sesame/csm-1b) is also available for testing audio generation.
|
25 |
|
26 |
## Misuse and abuse ⚠️
|
27 |
|
|
|
38 |
Read speech prompts are form the [LibriTTS-R dataset](https://google.github.io/df-conformer/librittsr/)
|
39 |
|
40 |
**Authors**
|
41 |
+
Johan Schalkwyk, Ankit Kumar, Dan Lyth, Sefik Emre Eskimez, Zack Hodari, Cinjon Resnick, Ramon Sanabria, Raven Jiang, and the Sesame team.
|
app.py
CHANGED
@@ -22,7 +22,8 @@ Generate from CSM 1B (Conversational Speech Model).
|
|
22 |
Code is available on GitHub: [SesameAILabs/csm](https://github.com/SesameAILabs/csm).
|
23 |
Checkpoint is [hosted on HuggingFace](https://huggingface.co/sesame/csm-1b).
|
24 |
|
25 |
-
Try out
|
|
|
26 |
|
27 |
The model has some capacity for non-English languages due to data contamination in the training
|
28 |
data, but it is likely not to perform well.
|
|
|
22 |
Code is available on GitHub: [SesameAILabs/csm](https://github.com/SesameAILabs/csm).
|
23 |
Checkpoint is [hosted on HuggingFace](https://huggingface.co/sesame/csm-1b).
|
24 |
|
25 |
+
Try out our interactive demo [sesame.com/voicedemo](https://www.sesame.com/voicedemo),
|
26 |
+
this uses a fine-tuned variant of CSM.
|
27 |
|
28 |
The model has some capacity for non-English languages due to data contamination in the training
|
29 |
data, but it is likely not to perform well.
|