Hi, I’m Ryota Komatsu. Currently, I work on spoken language processing and its applications to spoken dialogue systems.
Education
- 2025.04 - 2028.03, doctoral student, Institute of Science Tokyo, advised by Prof. Takahiro Shinozaki.
- 2023.03, M.Eng., Tokyo Institute of Technology, advised by Prof. Takahiro Shinozaki.
- 2021.03, B.Eng., Tokyo Institute of Technology, advised by Prof. Isao Yamada.
Work experience
- 2023.04 - 2025.01, Research & Development Group, Hitachi, Ltd.
Publications
Speaker-Disentangled Chunk-Wise Regression for Syllabic Tokenization
R. Komatsu, K. Kawakita, T. Okamoto, and T. Shinozaki, "Speaker-Disentangled Chunk-Wise Regression for Syllabic Tokenization," IEEE Open Journal of Signal Processing, vol. 7, 2026.
Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT
R. Komatsu and T. Shinozaki, "Self-supervised syllable discovery based on speaker-disentangled HuBERT," in Proc. IEEE Spoken Language Technology Workshop (SLT), Dec. 2024, pp. 1131–1136.
Continuous Action Space-Based Spoken Language Acquisition Agent Using Residual Sentence Embedding and Transformer Decoder
R. Komatsu, Y. Kimura, T. Okamoto, and T. Shinozaki, "Continuous action space-based spoken language acquisition agent using residual sentence embedding and transformer decoder," in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Jun. 2023.
Automatic Spoken Language Acquisition Based on Observation and Dialogue
R. Komatsu, S. Gao, W. Hou, M. Zhang, T. Tanaka, K. Toyoda, Y. Kimura, K. Hino, Y. Iwamoto, K. Mori, T. Okamoto, and T. Shinozaki, "Automatic spoken language acquisition based on observation and dialogue," IEEE Journal of Selected Topics in Signal Processing (JSTSP), vol. 16, no. 6, pp. 1480–1492, 2022.
Pronunciation Adaptive Self Speaking Agent Using WaveGrad
T. Tanaka, R. Komatsu, T. Okamoto, and T. shinozaki, "Pronunciation adaptive self speaking agent using wavegrad," in Proc. The 2nd Workshop on Self-supervised Learning for Audio and Speech Processing, Feb. 2022.
A Graph Regularized RPCA by Generalized Moreau Enhanced Model
R Komatsu, M Yamagishi, and I Yamada, "A Graph Regularized RPCA by Generalized Moreau Enhanced Model," in Proc. European Signal Processing Conference (EUSIPCO), Aug. 2021, pp. 2129-2133.
Invited Talks
Syllable-Text Interleavingに基づく音声言語モデルの効率的スケーリング
Talk at 日本音響学会九州支部第3回オンラインセミナー,
A comprehensive overview of audio language models
Talk at Sixth Joint Meeting Acoustical Society of America and Acoustical Society of Japan, Honolulu, Hawaii
Introduction to Multimodal Large Language Models
Tutorial at OTOGAKU Symposium 2025, Waseda University, Tokyo, Japan
Awards
- Best Student Presentation Award, Acoustical Society of Japan (ASJ), 2023.
Scholarship
- Science Tokyo Tsubame Scholarship for Doctoral Students

Hugging Face