|
--- |
|
pipeline_tag: text-to-audio |
|
library_name: audiocraft |
|
language: en |
|
tags: |
|
- text-to-audio |
|
- musicgen |
|
- songstarter |
|
license: cc-by-nc-4.0 |
|
--- |
|
|
|
# Model Card for musicgen-songstarter-v0.2 |
|
|
|
<a target="_blank" href="https://colab.research.google.com/gist/nateraw/0cb4c242b70af10044e9ae73f4617c86/songstarter-v0-2-demo.ipynb"> |
|
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/> |
|
</a> |
|
|
|
musicgen-songstarter-v0.2 is a [`musicgen-stereo-melody-large`](https://huggingface.co/facebook/musicgen-stereo-melody-large) fine-tuned on a dataset of melody loops from my Splice sample library. It's intended to be used to generate song ideas that are useful for music producers. It generates stereo audio in 32khz. |
|
|
|
Compared to [`musicgen-songstarter-v0.1`](https://huggingface.co/nateraw/musicgen-songstarter-v0.1), this new version: |
|
- was trained on 3x more unique, manually-curated samples that I painstakingly purchased on Splice |
|
- Is twice the size, bumped up from size `medium` ➡️ `large` transformer LM |
|
|
|
If you find this model interesting, please consider: |
|
- following me on [GitHub](https://github.com/nateraw) |
|
- following me on [Twitter](https://twitter.com/nateraw) |
|
|
|
## Usage |
|
|
|
Install [audiocraft](https://github.com/facebookresearch/audiocraft): |
|
|
|
``` |
|
pip install -U git+https://github.com/facebookresearch/audiocraft#egg=audiocraft |
|
``` |
|
|
|
Then, you should be able to load this model just like any other musicgen checkpoint here on the Hub: |
|
|
|
```python |
|
import torchaudio |
|
from audiocraft.models import MusicGen |
|
from audiocraft.data.audio import audio_write |
|
|
|
model = MusicGen.get_pretrained('nateraw/musicgen-songstarter-v0.2') |
|
model.set_generation_params(duration=8) # generate 8 seconds. |
|
wav = model.generate_unconditional(4) # generates 4 unconditional audio samples |
|
descriptions = ['acoustic, guitar, melody, trap, d minor, 90 bpm'] * 3 |
|
wav = model.generate(descriptions) # generates 3 samples. |
|
|
|
melody, sr = torchaudio.load('./assets/bach.mp3') |
|
# generates using the melody from the given audio and the provided descriptions. |
|
wav = model.generate_with_chroma(descriptions, melody[None].expand(3, -1, -1), sr) |
|
|
|
for idx, one_wav in enumerate(wav): |
|
# Will save under {idx}.wav, with loudness normalization at -14 db LUFS. |
|
audio_write(f'{idx}', one_wav.cpu(), model.sample_rate, strategy="loudness", loudness_compressor=True) |
|
``` |
|
|
|
## Prompt Format |
|
|
|
Follow the following prompt format: |
|
|
|
``` |
|
{tag_1}, {tag_1}, ..., {tag_n}, {key}, {bpm} bpm |
|
``` |
|
|
|
For example: |
|
|
|
``` |
|
hip hop, soul, piano, chords, jazz, neo jazz, G# minor, 140 bpm |
|
``` |
|
|
|
## Samples |
|
|
|
<table style="width:100%; text-align:center;"> |
|
<tr> |
|
<th>Audio Prompt</th> |
|
<th>Text Prompt</th> |
|
<th>Output</th> |
|
</tr> |
|
<tr> |
|
<td> |
|
<audio controls> |
|
<source src="https://huggingface.co/nateraw/musicgen-songstarter-v0.2/resolve/main/assets/kalhonaho.wav?download=true" type="audio/wav"> |
|
Your browser does not support the audio element. |
|
</audio> |
|
</td> |
|
<td> |
|
trap, synthesizer, songstarters, dark, G# minor, 140 bpm |
|
</td> |
|
<td> |
|
<audio controls> |
|
<source src="https://huggingface.co/nateraw/musicgen-songstarter-v0.2/resolve/main/assets/kalhonaho_trap.wav?download=true" type="audio/wav"> |
|
Your browser does not support the audio element. |
|
</audio> |
|
</td> |
|
</tr> |
|
<tr> |
|
<td> |
|
<audio controls> |
|
<source src="https://huggingface.co/nateraw/musicgen-songstarter-v0.2/resolve/main/assets/bach.mp3?download=true" type="audio/mp3"> |
|
Your browser does not support the audio element. |
|
</audio> |
|
</td> |
|
<td> |
|
acoustic, guitar, melody, trap, D minor, 90 bpm |
|
</td> |
|
<td> |
|
<audio controls> |
|
<source src="https://huggingface.co/nateraw/musicgen-songstarter-v0.2/resolve/main/assets/bach_guitar.wav?download=true" type="audio/wav"> |
|
Your browser does not support the audio element. |
|
</audio> |
|
</td> |
|
</tr> |
|
</table> |
|
|
|
## Acknowledgements |
|
|
|
This work would not have been possible without: |
|
|
|
- [Lambda Labs](https://lambdalabs.com/), for subsidizing larger training runs by providing some compute credits |
|
- [Replicate](https://replicate.com/), for early development compute resources |
|
|
|
Thank you ❤️ |
|
|