File size: 6,079 Bytes
c138056 367b2c6 72ce004 c138056 046d174 d49eed0 163813b f3dce0c c138056 a778c94 6961a9f 046d174 6961a9f 046d174 a778c94 383670b 3768190 046d174 3768190 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 |
---
pipeline_tag: text-to-audio
library_name: audiocraft
language: en
tags:
- text-to-audio
- musicgen
- songstarter
license: cc-by-nc-4.0
---
# Model Card for musicgen-songstarter-v0.2
[](https://replicate.com/nateraw/musicgen-songstarter-v0.2) [](https://colab.research.google.com/gist/nateraw/0cb4c242b70af10044e9ae73f4617c86/songstarter-v0-2-demo.ipynb) [](https://huggingface.co/spaces/nateraw/singing-songstarter)
musicgen-songstarter-v0.2 is a [`musicgen-stereo-melody-large`](https://huggingface.co/facebook/musicgen-stereo-melody-large) fine-tuned on a dataset of melody loops from my Splice sample library. It's intended to be used to generate song ideas that are useful for music producers. It generates stereo audio in 32khz.
**👀 Update:** I wrote a [blogpost](https://nateraw.com/posts/training_musicgen_songstarter.html) detailing how and why I trained this model, including training details, the dataset, Weights and Biases logs, etc.
Compared to [`musicgen-songstarter-v0.1`](https://huggingface.co/nateraw/musicgen-songstarter-v0.1), this new version:
- was trained on 3x more unique, manually-curated samples that I painstakingly purchased on Splice
- Is twice the size, bumped up from size `medium` ➡️ `large` transformer LM
If you find this model interesting, please consider:
- following me on [GitHub](https://github.com/nateraw)
- following me on [Twitter](https://twitter.com/_nateraw)
## Usage
Install [audiocraft](https://github.com/facebookresearch/audiocraft):
```
pip install -U git+https://github.com/facebookresearch/audiocraft#egg=audiocraft
```
Then, you should be able to load this model just like any other musicgen checkpoint here on the Hub:
```python
import torchaudio
from audiocraft.models import MusicGen
from audiocraft.data.audio import audio_write
model = MusicGen.get_pretrained('nateraw/musicgen-songstarter-v0.2')
model.set_generation_params(duration=8) # generate 8 seconds.
wav = model.generate_unconditional(4) # generates 4 unconditional audio samples
descriptions = ['acoustic, guitar, melody, trap, d minor, 90 bpm'] * 3
wav = model.generate(descriptions) # generates 3 samples.
melody, sr = torchaudio.load('./assets/bach.mp3')
# generates using the melody from the given audio and the provided descriptions.
wav = model.generate_with_chroma(descriptions, melody[None].expand(3, -1, -1), sr)
for idx, one_wav in enumerate(wav):
# Will save under {idx}.wav, with loudness normalization at -14 db LUFS.
audio_write(f'{idx}', one_wav.cpu(), model.sample_rate, strategy="loudness", loudness_compressor=True)
```
## Prompt Format
Follow the following prompt format:
```
{tag_1}, {tag_2}, ..., {tag_n}, {key}, {bpm} bpm
```
For example:
```
hip hop, soul, piano, chords, jazz, neo jazz, G# minor, 140 bpm
```
For some example tags, [see the prompt format section of musicgen-songstarter-v0.1's readme](https://huggingface.co/nateraw/musicgen-songstarter-v0.1#prompt-format). The tags there are for the smaller v1 dataset, but should give you an idea of what the model saw.
## Samples
<table style="width:100%; text-align:center;">
<tr>
<th>Audio Prompt</th>
<th>Text Prompt</th>
<th>Output</th>
</tr>
<tr>
<td>
<audio controls>
<source src="https://huggingface.co/nateraw/musicgen-songstarter-v0.2/resolve/main/assets/kalhonaho.wav?download=true" type="audio/wav">
Your browser does not support the audio element.
</audio>
</td>
<td>
trap, synthesizer, songstarters, dark, G# minor, 140 bpm
</td>
<td>
<audio controls>
<source src="https://huggingface.co/nateraw/musicgen-songstarter-v0.2/resolve/main/assets/kalhonaho_trap.wav?download=true" type="audio/wav">
Your browser does not support the audio element.
</audio>
</td>
</tr>
<tr>
<td>
<audio controls>
<source src="https://huggingface.co/nateraw/musicgen-songstarter-v0.2/resolve/main/assets/bach.mp3?download=true" type="audio/mp3">
Your browser does not support the audio element.
</audio>
</td>
<td>
acoustic, guitar, melody, trap, D minor, 90 bpm
</td>
<td>
<audio controls>
<source src="https://huggingface.co/nateraw/musicgen-songstarter-v0.2/resolve/main/assets/bach_guitar.wav?download=true" type="audio/wav">
Your browser does not support the audio element.
</audio>
</td>
</tr>
</table>
## Training Details
For more verbose details, you can check out the [blogpost](https://nateraw.com/posts/training_musicgen_songstarter.html#training).
- **code**:
- Repo is [here](https://github.com/nateraw/audiocraft). It's an undocumented fork of [facebookresearch/audiocraft](https://github.com/facebookresearch/audiocraft) where I rewrote the training loop with PyTorch Lightning, which worked a bit better for me.
- **data**:
- around 1700-1800 samples I manually listened to + purchased via my personal [Splice](https://splice.com) account. About 7-8 hours of audio.
- Given the licensing terms, I cannot share the data.
- **hardware**:
- 8xA100 40GB instance from Lambda Labs
- **procedure**:
- trained for 10k steps, which took about 6 hours
- reduced segment duration at train time to 15 seconds
- **hparams/logs**:
- See the wandb [run](https://wandb.ai/nateraw/musicgen-songstarter-v0.2/runs/63gh4l7m), which includes training metrics, logs, hardware metrics at train time, hyperparameters, and the exact command I used when I ran the training script.
## Acknowledgements
This work would not have been possible without:
- [Lambda Labs](https://lambdalabs.com/), for subsidizing larger training runs by providing some compute credits
- [Replicate](https://replicate.com/), for early development compute resources
Thank you ❤️
|