Model Card for RepeaTTS-level-3

See Emotive Icelandic for more information about this model and the data that it is trained on. The RepeaTTS series is trained on the same data as Emotive Icelandic, but without emotive content disclosure.

This model, level-3, corresponds to a model with a double-refined subset of the original training corpus. The model can be, additionally, prompted
with a "neutral" label, or an intensity label:

  • low intensity: voice is low expressive
  • high intensity: voice is very expressive

Usage

Use the code below to get started with the model.

import torch
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer
import soundfile as sf

device = "cuda:0" if torch.cuda.is_available() else "cpu"
model = ParlerTTSForConditionalGeneration.from_pretrained("atlithor/RepeaTTS-level-3").to(device)
tokenizer = AutoTokenizer.from_pretrained("atlithor/EmotiveIcelandic")
description_tokenizer = AutoTokenizer.from_pretrained(model.config.text_encoder._name_or_path)

prompt = "Þetta er frábær hugmynd!" # E: this is a great idea!
description = "The recording is of very high quality, with Ingrid's voice sounding clear and very close up. Ingrid speaks at very high intensity."

input_ids = description_tokenizer(description, return_tensors="pt").input_ids.to(device)
prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
audio_arr = generation.cpu().numpy().squeeze()
sf.write("ingrid_intense.wav", audio_arr, model.config.sampling_rate)

Citation

coming later

BibTeX:

[More Information Needed]

APA:

[More Information Needed]

Downloads last month
1
Safetensors
Model size
938M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for atlithor/RepeaTTS-level-3

Finetuned
(5)
this model

Dataset used to train atlithor/RepeaTTS-level-3