few questions and suggestions
Hello Ibrahim,
I have a few questions and suggestions, and I'd like to hear your opinion on them.
Wouldn't training the model from scratch instead of fine-tuning yield better results?
Or, what if we fine-tune a model other than F5-TTS, perhaps one specifically designed for the Arabic language?
Also, wouldn’t preprocessing the text with diacritics before training lead to better outcomes?
And if we create a large custom dataset using ElevenLabs, given its strong TTS capabilities, could that make the model even more powerful?
Sorry for the late reply.
Wouldn’t training the model from scratch instead of fine-tuning lead to better results? It depends,if we have enough data, starting from scratch could be an option; otherwise, finetuning is likely the better approach. A well finetuned model can fully leverage its prior learning to its advantage. For example, if you have Arabic data, I’d suggest continuing the fine-tuning from the checkpoints here. With proper hyperparameter tuning, you should achieve the results you’re aiming for.
What if we finetune a model other than F5-TTS, perhaps one tailored specifically for Arabic? I’d love to see this explored. Keep in mind that most existing models are generic and not language specific.they’re designed to be language agnostic. But I understand your point: you want a model built to handle Arabic, particularly at the text stage, like text embeddings and representation. I think this could indeed make a significant difference. However, at the acoustic modeling stage, I believe sticking to the modern TTS approaches (which are generic, as I mentioned) is the way to go. I don’t see embedding Arabic-specific speech patterns at the acoustic level outperforming current models.They are great learners by themselves.
Also, wouldn’t preprocessing the text with diacritics before training improve the outcomes? Absolutely, this is spot on.
And if we create a large custom dataset using ElevenLabs, given its strong TTS capabilities, could that make the model even more powerful? Definitely.especially considering the scarcity of good Arabic datasets, it is the way to go actually.
Thank you for your response and clarifications! I have many ElevenLabs accounts, so I can collect a large dataset. I will try training the model on it and monitor the results.