--- datasets: - KTH/hungarian-single-speaker-tts language: - hu base_model: - microsoft/speecht5_tts license: mit pipeline_tag: text-to-speech --- An experimental speechT5 finetune for Hungarian. ``` from transformers import SpeechT5Processor, SpeechT5ForTextToSpeech, SpeechT5HifiGan from datasets import load_dataset import torch from IPython.display import Audio processor = SpeechT5Processor.from_pretrained("microsoft/speecht5_tts") model = SpeechT5ForTextToSpeech.from_pretrained("GaborMadarasz/speecht5_tts_KTH_hu") vocoder = SpeechT5HifiGan.from_pretrained("microsoft/speecht5_hifigan") inputs = processor(text="Azóta, hogy nem láttuk őket, nyolc esztendő telt el.", return_tensors="pt") # load xvector containing speaker's voice characteristics from a dataset embeddings_dataset = load_dataset("Matthijs/cmu-arctic-xvectors", split="validation") speaker_embeddings = torch.tensor(embeddings_dataset[7406]["xvector"]).unsqueeze(0) speech = model.generate_speech(inputs["input_ids"], speaker_embeddings, vocoder=vocoder) # Jupyter audio Audio(speech, rate=16000) ```