csm-1b

Running on Zero

Zackh commited on 3 days ago

Commit

ef55fce

1 Parent(s): d794e1d

readme

Files changed (2) hide show

README.md CHANGED Viewed

@@ -15,15 +15,13 @@ short_description: Conversational speech generation
 **2025/03/13** - We are releasing the 1B CSM variant. Code is available on GitHub: [SesameAILabs/csm](https://github.com/SesameAILabs/csm). Checkpoint is [hosted on HuggingFace](https://huggingface.co/sesame/csm-1b).
-Try out the interactive demo of our fine-tuned version [sesame.com/voicedemo](https://www.sesame.com/voicedemo).
-Generate from the open-source base model [hosted on HuggingFace](https://huggingface.co/spaces/sesame/csm-1b).
 ---
-CSM (Conversational Speech Model) is a speech generation model from [Sesame](sesame.com) that generates RVQ audio codes from text and audio inputs. A fine-tuned version of this model powers the interactive demo in our [technical blog post](https://www.sesame.com/research/crossing_the_uncanny_valley_of_voice).
-The model architecture employs a [Llama](https://www.llama.com/) backbone and a smaller audio decoder that produces [Mimi](https://huggingface.co/kyutai/mimi) audio codes.
 ## Misuse and abuse ⚠️
@@ -40,4 +38,4 @@ Conversational prompts are from the [EdAcc dataset](https://groups.inf.ed.ac.uk/
 Read speech prompts are form the [LibriTTS-R dataset](https://google.github.io/df-conformer/librittsr/)
 **Authors**
-Johan Schalkwyk, Ankit Kumar, Dan Lyth, Sefik Emre Eskimez, Zack Hodari, Cinjon Resnick, Ramon Sanabria, Raven Jiang, and the Sesame team.

 **2025/03/13** - We are releasing the 1B CSM variant. Code is available on GitHub: [SesameAILabs/csm](https://github.com/SesameAILabs/csm). Checkpoint is [hosted on HuggingFace](https://huggingface.co/sesame/csm-1b).
 ---
+CSM (Conversational Speech Model) is a speech generation model from [Sesame](sesame.com) that generates RVQ audio codes from text and audio inputs. The model architecture employs a [Llama](https://www.llama.com/) backbone and a smaller audio decoder that produces [Mimi](https://huggingface.co/kyutai/mimi) audio codes.
+A fine-tuned variant of CSM powers the [interactive voice demo](https://www.sesame.com/voicedemo) shown in our [blog post](https://www.sesame.com/research/crossing_the_uncanny_valley_of_voice).
+A hosted [HuggingFace space](https://huggingface.co/spaces/sesame/csm-1b) is also available for testing audio generation.
 ## Misuse and abuse ⚠️
 Read speech prompts are form the [LibriTTS-R dataset](https://google.github.io/df-conformer/librittsr/)
 **Authors**
+Johan Schalkwyk, Ankit Kumar, Dan Lyth, Sefik Emre Eskimez, Zack Hodari, Cinjon Resnick, Ramon Sanabria, Raven Jiang, and the Sesame team.

app.py CHANGED Viewed

@@ -22,7 +22,8 @@ Generate from CSM 1B (Conversational Speech Model).
 Code is available on GitHub: [SesameAILabs/csm](https://github.com/SesameAILabs/csm).
 Checkpoint is [hosted on HuggingFace](https://huggingface.co/sesame/csm-1b).
-Try out the interactive demo of our fine-tuned model [sesame.com/voicedemo](https://www.sesame.com/voicedemo).
 The model has some capacity for non-English languages due to data contamination in the training
 data, but it is likely not to perform well.

 Code is available on GitHub: [SesameAILabs/csm](https://github.com/SesameAILabs/csm).
 Checkpoint is [hosted on HuggingFace](https://huggingface.co/sesame/csm-1b).
+Try out our interactive demo [sesame.com/voicedemo](https://www.sesame.com/voicedemo),
+this uses a fine-tuned variant of CSM.
 The model has some capacity for non-English languages due to data contamination in the training
 data, but it is likely not to perform well.