Ryota Komatsu

Institute of Science Tokyo

The LibriTTS-R dataset is made available by Google LLC under the CC BY 4.0.

Y. Koizumi, H. Zen, S. Karita, Y. Ding, K. Yatabe, N. Morioka, M. Bacchiani, Y. Zhang, W. Han, and A. Bapna, "LibriTTS-R: A restored multi-speaker text-to-speech corpus," in Proc. Interspeech, 2023, pp. 5496–5500.
M. Le, A. Vyas, B. Shi, B. Karrer, L. Sari, R. Moritz, M. Williamson, V. Manohar, Y. Adi, J. Mahadeokar, and W.-N. Hsu, "Voicebox: Text-guided multilingual universal speech generation at scale," in Proc. Thirty-seventh Conference on Neural Information Processing Systems, vol. 36, 2023, pp. 14005–14034.
S. gil Lee, W. Ping, B. Ginsburg, B. Catanzaro, and S. Yoon, "BigVGAN: A universal neural vocoder with large-scale training," in Proc. International Conference on Learning Representations, 2023.
T. A. Nguyen, W.-N. Hsu, A. D’Avirro, B. Shi, I. Gat, M. Fazel-Zarani, T. Remez, J. Copet, G. Synnaeve, M. Hassid, F. Kreuk, Y. Adi, and E. Dupoux, "Expresso: A benchmark and analysis of discrete expressive speech resynthesis," in Proc. Interspeech, 2023, pp. 4823–4827.
R. Kumar, P. Seetharaman, A. Luebs, I. Kumar, and K. Kumar, "High-Fidelity Audio Compression with Improved RVQGAN," in Proc. NeurIPS, 2023, pp. 27980-27993.