When Fine-Tuning Fails and when it Generalises: Role of Data Diversity and Mixed Training in LLM-based TTS
This paper demonstrates that LoRA fine-tuning of compact LLM backbones significantly enhances voice cloning performance in terms of perceptual quality, speaker fidelity, and signal-to-noise ratio, provided the training data possesses sufficient acoustic diversity.