valkiscute commited on
Commit
ad44c65
·
verified ·
1 Parent(s): 2892f55

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -9
README.md CHANGED
@@ -51,13 +51,13 @@ async def main(model_type, model_path):
51
  f.write(next(wavs))
52
  print(f"Speech generated in {output_path}")
53
  ```
54
- You need to specify the ```prompt speech```, including the ```ref_wav_path``` and its ```prompt text```, and the ```text``` to be synthesized. The synthesized speech is saved by default to ```logs/tts.wav```.
55
 
56
  Additionally, you need to specify ```model_type``` as either ```base``` or ```sft```, with the default being ```base```.
57
 
58
- When you specify the ```model_type``` to be ```base```, you can change the ```prompt speech``` to arbitrary speaker for zero-shot TTS synthesis.
59
 
60
- When you specify the ```model_type``` to be ```sft```, you need to keep the ```prompt speech``` unchanged because the ```sft``` model is trained on Claire's voice.
61
 
62
  ## API Usage
63
  ```sh
@@ -88,7 +88,7 @@ print(time.time() - start)
88
 
89
  By default, the synthesized speech will be saved at ```logs/tts.wav```.
90
 
91
- Additionally, you need to specify ```model_type``` as either ```base``` or ```sft```, with the default being ```base```.
92
 
93
  ## Training
94
 
@@ -96,11 +96,17 @@ We use ```LibriSpeech``` as an example. You can use your own dataset instead, bu
96
 
97
  If you haven't downloaded ```LibriSpeech``` yet, you can download the dev-clean set using:
98
  ```sh
99
- wget https://www.openslr.org/resources/12/dev-clean.tar.gz -P path/to/save
 
 
 
 
100
  ```
101
- After downloading, specify the ```librispeech_dir``` in ```prepare_sft_dataset.py``` to match the download location. Then run ```./train.sh```, which will automatically process the data and generate ```data/tts_sft_data.json```. We will use the first speaker from the LibriSpeech subset for fine-tuning. You can also specify a different speaker as needed in ```data_process/text_format_conversion.py```.
 
 
102
 
103
- Note that if an error occurs during the process, resolve the error, delete the existing contents of the data folder, and then rerun ```train.sh```.
104
 
105
  After generating ```data/tts_sft_data.json```, train.sh will automatically copy it to ```llama-factory/data``` and add the following field to ```dataset_info.json```:
106
  ```json
@@ -108,6 +114,13 @@ After generating ```data/tts_sft_data.json```, train.sh will automatically copy
108
  "file_name": "tts_sft_data.json"
109
  }
110
  ```
111
- Finally, it will automatically execute the ```llamafactory-cli train``` command to start training. You can adjust training settings using ```training/sft.yaml```. By default, the trained weights will be saved to ```pretrained_models/Muyan-TTS-new-SFT```.
 
 
 
 
 
 
 
112
 
113
- You can directly deploy your trained model using the API tool above. During inference, you need to specify the ```model_type``` to be ```sft``` and replace the ```ref_wav_path``` and ```prompt text``` with a sample of the speaker's voice you trained on.
 
51
  f.write(next(wavs))
52
  print(f"Speech generated in {output_path}")
53
  ```
54
+ You need to specify the prompt speech, including the ```ref_wav_path``` and its ```prompt_text```, and the ```text``` to be synthesized. The synthesized speech is saved by default to ```logs/tts.wav```.
55
 
56
  Additionally, you need to specify ```model_type``` as either ```base``` or ```sft```, with the default being ```base```.
57
 
58
+ When you specify the ```model_type``` to be ```base```, you can change the prompt speech to arbitrary speaker for zero-shot TTS synthesis.
59
 
60
+ When you specify the ```model_type``` to be ```sft```, you need to keep the prompt speech unchanged because the ```sft``` model is trained on Claire's voice.
61
 
62
  ## API Usage
63
  ```sh
 
88
 
89
  By default, the synthesized speech will be saved at ```logs/tts.wav```.
90
 
91
+ Similarly, you need to specify ```model_type``` as either ```base``` or ```sft```, with the default being ```base```.
92
 
93
  ## Training
94
 
 
96
 
97
  If you haven't downloaded ```LibriSpeech``` yet, you can download the dev-clean set using:
98
  ```sh
99
+ wget --no-check-certificate https://www.openslr.org/resources/12/dev-clean.tar.gz
100
+ ```
101
+ After uncompressing the data, specify the ```librispeech_dir``` in ```prepare_sft_dataset.py``` to match the download location. Then run:
102
+ ```sh
103
+ ./train.sh
104
  ```
105
+ This will automatically process the data and generate ```data/tts_sft_data.json```.
106
+
107
+ Note that we use a specific speaker ID of "3752" from dev-clean of LibriSpeech (which can be specified in ```data_process/text_format_conversion.py```) as an example because its data size is relatively large. If you organize your own dataset for training, please prepare at least a dozen of minutes of speech from the target speaker.
108
 
109
+ If an error occurs during the process, resolve the error, delete the existing contents of the data folder, and then rerun ```train.sh```.
110
 
111
  After generating ```data/tts_sft_data.json```, train.sh will automatically copy it to ```llama-factory/data``` and add the following field to ```dataset_info.json```:
112
  ```json
 
114
  "file_name": "tts_sft_data.json"
115
  }
116
  ```
117
+ Finally, it will automatically execute the ```llamafactory-cli train``` command to start training. You can adjust training settings using ```training/sft.yaml```.
118
+
119
+ By default, the trained weights will be saved to ```pretrained_models/Muyan-TTS-new-SFT```.
120
+
121
+ After training, you need to copy the ```sovits.pth``` of base/sft model to your trained model path before inference:
122
+ ```sh
123
+ cp pretrained_models/Muyan-TTS/sovits.pth pretrained_models/Muyan-TTS-new-SFT
124
+ ```
125
 
126
+ You can directly deploy your trained model using the API tool above. During inference, you need to specify the ```model_type``` to be ```sft``` and replace the ```ref_wav_path``` and ```prompt_text``` with a sample of the speaker's voice you trained on.