Academic Homepage
Kailai Shen
Media Engine Algorithm Engineer (Speech) @ Juphoon | M.S. in EECS, Ningbo University
Research focus: speaker recognition, speech quality assessment, and robust speech systems.
About
I am Kailai Shen, currently working as a Media Engine Algorithm Engineer (Speech) at Juphoon. I received my M.S. degree from the School of Electrical Engineering and Computer Science, Ningbo University. My research focuses on speaker recognition, speech quality assessment, and text-to-speech.
Email: kailai.shen@juphoon.com
Recent News
- 2026-02 Congratulations! Our paper "Pseudo-Reference Driven Non-Intrusive Speech Quality Assessment Via Multi-Task Learning" was published in IEEE TASLP. [IEEE Xplore]
- 2026-02 Personal homepage is online with updated publication records and links.
Research Interests
- Speaker Recognition
- Speech Quality Assessment
- Text-to-Speech
Competitions
- 2025 2nd Place (utterance-level SRCC for overall music quality), AudioMOS Challenge 2025, Track 1.
- 2023 1st Place, VoiceMOS Challenge 2023, Track 2.
Publications
Journal Articles
- Kailai Shen, Diqun Yan, Li Dong, Rangding Wang, Xiaoxun Wu, Xiang Xia, Xiaojiong Qian. Pseudo-Reference Driven Non-Intrusive Speech Quality Assessment Via Multi-Task Learning. IEEE Transactions on Audio, Speech and Language Processing, pages 1-15, 2026. doi link
- Kailai Shen, Diqun Yan, Jing Hu, Zhe Ye. Non-intrusive speech quality assessment: A survey. Neurocomputing, 580:127471, 2024. doi
- Kailai Shen, Diqun Yan, Li Dong. MSQAT: A multi-dimension non-intrusive speech quality assessment transformer utilizing self-supervised representations. Applied Acoustics, 212:109584, 2023. doi link
- Kailai Shen, Diqun Yan, Zhe Ye, Xianbo Xu, Jinxing Gao, Li Dong, Chengbin Peng, Kun Yang. Non-intrusive speech quality assessment with attention-based ResNet-BiLSTM. Signal, Image and Video Processing, 17(7):3377-3385, 2023. doi
Conference Papers
- Xiaoxun Wu, Kailai Shen, Yuheng Huang, Naiyuan Li, Diqun Yan. DyMEvalNet: Dynamic Text-Audio-Personalization Fusion for Multimodal Music Quality Assessment. ASRU, 2025 (accepted).
- Zhe Ye, Diqun Yan, Li Dong, Kailai Shen. Breaking Speaker Recognition with Paddingback. ICASSP, 4435-4439, 2024. doi
- Ying Ren, Kailai Shen, Zhe Ye, Diqun Yan. EventTrojan: Manipulating Non-Intrusive Speech Quality Assessment via Imperceptible Events. ICME, 1-6, 2024. doi
- Kailai Shen, Diqun Yan, Li Dong, Ying Ren, Xiaoxun Wu, Jing Hu. SQAT-LD: Speech Quality Assessment Transformer Utilizing Listener Dependent Modeling for Zero-Shot Out-of-Domain MOS Prediction. ASRU, 1-6, 2023. doi