nateraw's picture
Update README.md
383670b verified
|
raw
history blame
3.4 kB
metadata
pipeline_tag: text-to-audio
library_name: audiocraft
language: en
tags:
  - text-to-audio
  - musicgen
  - songstarter
license: cc-by-nc-4.0

Model Card for musicgen-songstarter-v0.2

musicgen-songstarter-v0.2 is a musicgen-stereo-melody-large fine-tuned on a dataset of melody loops from my Splice sample library. It's intended to be used to generate song ideas that are useful for music producers. It generates stereo audio in 32khz.

Usage

Install audiocraft:

pip install -U git+https://github.com/facebookresearch/audiocraft#egg=audiocraft

Then, you should be able to load this model just like any other musicgen checkpoint here on the Hub:

import torchaudio
from audiocraft.models import MusicGen
from audiocraft.data.audio import audio_write

model = MusicGen.get_pretrained('nateraw/musicgen-songstarter-v0.2')
model.set_generation_params(duration=8)  # generate 8 seconds.
wav = model.generate_unconditional(4)    # generates 4 unconditional audio samples
descriptions = ['acoustic, guitar, melody, trap, d minor, 90 bpm'] * 3
wav = model.generate(descriptions)  # generates 3 samples.

melody, sr = torchaudio.load('./assets/bach.mp3')
# generates using the melody from the given audio and the provided descriptions.
wav = model.generate_with_chroma(descriptions, melody[None].expand(3, -1, -1), sr)

for idx, one_wav in enumerate(wav):
    # Will save under {idx}.wav, with loudness normalization at -14 db LUFS.
    audio_write(f'{idx}', one_wav.cpu(), model.sample_rate, strategy="loudness", loudness_compressor=True)

Prompt Format

Follow the following prompt format:

{tag_1}, {tag_1}, ..., {tag_n}, {key}, {bpm} bpm

For example:

hip hop, soul, piano, chords, jazz, neo jazz, G# minor, 140 bpm

Samples

Audio Prompt Text Prompt Output
trap, synthesizer, songstarters, dark, G# minor, 140 bpm
acoustic, guitar, melody, trap, D minor, 90 bpm

See the ./assets folder for some melody conditioned samples 🎶