StyleTTS 2 - lite
Online Demo
Explore the model on Hugging Face Spaces:
https://huggingface.co/spaces/dangtr0408/StyleTTS2-lite-space
Fine-tune
https://github.com/dangtr0408/StyleTTS2-lite
Training Details
- Base Checkpoint: Initialized from the official StyleTTS 2 weights pre-trained on LibriTTS.
- Components Removal: PLBert, Diffusion, Prosodic Encoder, SLM, and Spectral Normalization.
- Training Data: LibriTTS corpus.
- Training Schedule: Trained for 100,000 steps.
Model Architecture
Component | Parameters |
---|---|
Decoder | 54 ,289 ,492 |
Predictor | 16 ,194 ,612 |
Style Encoder | 13 ,845 ,440 |
Text Encoder | 5,612 ,320 |
Total | 89 ,941 ,576 |
Prerequisites
- Python: Version 3.7 or higher
- Git: To clone the repository
Installation & Setup
- Clone the repository
git clone https://huggingface.co/dangtr0408/StyleTTS2-lite
cd StyleTTS2-lite
- Install dependencies:
pip install -r requirements.txt
- On Linux, manually install espeak:
sudo apt-get install espeak-ng
Usage Example
See run.ipynb file.
Disclaimer
Before using these pre-trained models, you agree to inform the listeners that the speech samples are synthesized by the pre-trained models, unless you have the permission to use the voice you synthesize. That is, you agree to only use voices whose speakers grant the permission to have their voice cloned, either directly or by license before making synthesized voices public, or you have to publicly announce that these voices are synthesized if you do not have the permission to use these voices.
References
License
Code: MIT License
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for dangtr0408/StyleTTS2-lite
Base model
yl4579/StyleTTS2-LibriTTS