This repository provides synthesized samples, training and evaluation data, source code, and parameters for the paper One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech. It contains an implementation of Tacotron 2 that supports multilingual experiments and that implements different approaches to encoder parameter sharing. It presents a model combining ideas from Learning to speak fluently in a foreign language: Multilingual speech synthesis and cross-language voice cloning, End-to-End Code-Switched TTS with Mix of Monolingual Recordings, and Contextual Parameter Generation for Universal Neural Machine Translation. We provide data for comparison of three multilingual text-to-speech models. The first shares the whole encoder and uses an adversarial classifier to remove speaker-dependent information from the encoder. The second has separate encoders for each language.
Features
- Interactive demos introducing code-switching abilities and joint multilingual training of the generated model (trained on an enhanced CSS10 dataset) are available
- Many samples synthesized using the three compared models
- Our best model supporting code-switching or voice-cloning can be downloaded
- We provide data for comparison of three multilingual text-to-speech models
- It contains an implementation of Tacotron 2 that supports multilingual experiments
- Implements different approaches to encoder parameter sharing