Spaces:
Bradarr
/
Running on Zero

Zackh commited on
Commit
ef55fce
·
1 Parent(s): d794e1d
Files changed (2) hide show
  1. README.md +5 -7
  2. app.py +2 -1
README.md CHANGED
@@ -15,15 +15,13 @@ short_description: Conversational speech generation
15
 
16
  **2025/03/13** - We are releasing the 1B CSM variant. Code is available on GitHub: [SesameAILabs/csm](https://github.com/SesameAILabs/csm). Checkpoint is [hosted on HuggingFace](https://huggingface.co/sesame/csm-1b).
17
 
18
- Try out the interactive demo of our fine-tuned version [sesame.com/voicedemo](https://www.sesame.com/voicedemo).
19
-
20
- Generate from the open-source base model [hosted on HuggingFace](https://huggingface.co/spaces/sesame/csm-1b).
21
-
22
  ---
23
 
24
- CSM (Conversational Speech Model) is a speech generation model from [Sesame](sesame.com) that generates RVQ audio codes from text and audio inputs. A fine-tuned version of this model powers the interactive demo in our [technical blog post](https://www.sesame.com/research/crossing_the_uncanny_valley_of_voice).
 
 
25
 
26
- The model architecture employs a [Llama](https://www.llama.com/) backbone and a smaller audio decoder that produces [Mimi](https://huggingface.co/kyutai/mimi) audio codes.
27
 
28
  ## Misuse and abuse ⚠️
29
 
@@ -40,4 +38,4 @@ Conversational prompts are from the [EdAcc dataset](https://groups.inf.ed.ac.uk/
40
  Read speech prompts are form the [LibriTTS-R dataset](https://google.github.io/df-conformer/librittsr/)
41
 
42
  **Authors**
43
- Johan Schalkwyk, Ankit Kumar, Dan Lyth, Sefik Emre Eskimez, Zack Hodari, Cinjon Resnick, Ramon Sanabria, Raven Jiang, and the Sesame team.
 
15
 
16
  **2025/03/13** - We are releasing the 1B CSM variant. Code is available on GitHub: [SesameAILabs/csm](https://github.com/SesameAILabs/csm). Checkpoint is [hosted on HuggingFace](https://huggingface.co/sesame/csm-1b).
17
 
 
 
 
 
18
  ---
19
 
20
+ CSM (Conversational Speech Model) is a speech generation model from [Sesame](sesame.com) that generates RVQ audio codes from text and audio inputs. The model architecture employs a [Llama](https://www.llama.com/) backbone and a smaller audio decoder that produces [Mimi](https://huggingface.co/kyutai/mimi) audio codes.
21
+
22
+ A fine-tuned variant of CSM powers the [interactive voice demo](https://www.sesame.com/voicedemo) shown in our [blog post](https://www.sesame.com/research/crossing_the_uncanny_valley_of_voice).
23
 
24
+ A hosted [HuggingFace space](https://huggingface.co/spaces/sesame/csm-1b) is also available for testing audio generation.
25
 
26
  ## Misuse and abuse ⚠️
27
 
 
38
  Read speech prompts are form the [LibriTTS-R dataset](https://google.github.io/df-conformer/librittsr/)
39
 
40
  **Authors**
41
+ Johan Schalkwyk, Ankit Kumar, Dan Lyth, Sefik Emre Eskimez, Zack Hodari, Cinjon Resnick, Ramon Sanabria, Raven Jiang, and the Sesame team.
app.py CHANGED
@@ -22,7 +22,8 @@ Generate from CSM 1B (Conversational Speech Model).
22
  Code is available on GitHub: [SesameAILabs/csm](https://github.com/SesameAILabs/csm).
23
  Checkpoint is [hosted on HuggingFace](https://huggingface.co/sesame/csm-1b).
24
 
25
- Try out the interactive demo of our fine-tuned model [sesame.com/voicedemo](https://www.sesame.com/voicedemo).
 
26
 
27
  The model has some capacity for non-English languages due to data contamination in the training
28
  data, but it is likely not to perform well.
 
22
  Code is available on GitHub: [SesameAILabs/csm](https://github.com/SesameAILabs/csm).
23
  Checkpoint is [hosted on HuggingFace](https://huggingface.co/sesame/csm-1b).
24
 
25
+ Try out our interactive demo [sesame.com/voicedemo](https://www.sesame.com/voicedemo),
26
+ this uses a fine-tuned variant of CSM.
27
 
28
  The model has some capacity for non-English languages due to data contamination in the training
29
  data, but it is likely not to perform well.