| 2025 | "Alexa, can you forget me?" Machine Unlearning Benchmark in Spoken Language Understanding. Alkis Koudounas, Claudio Savelli, Flavio Giobergia, Elena Baralis |
| 2025 | "Dyadosyncrasy", Idiosyncrasy and Demographic Factors in Turn-Taking. Julio Cesar Cavalcanti, Gabriel Skantze |
| 2025 | "KAN you hear me?" Exploring Kolmogorov-Arnold Networks for Spoken Language Understanding. Alkis Koudounas, Moreno La Quatra, Eliana Pastor, Sabato Marco Siniscalchi, Elena Baralis |
| 2025 | 26th Annual Conference of the International Speech Communication Association, Interspeech 2025, Rotterdam, The Netherlands, 17-21 August 2025. Odette Scharenborg, Catharine Oertel, Khiet Truong |
| 2025 | 2D Immersed Boundary Method in Vocal Tract Acoustics: An Eulerian-Lagrangian Model for Simulation of Diphthongs. Rongshuai Wu, Debasish Ray Mohapatra, Sidney Fels |
| 2025 | 75-Speaker Annot-16: A benchmark dataset for speech articulatory rt-MRI annotation with articulator contours and phonetic alignment. Xuan Shi, Yubin Zhang, Yijing Lu, Marcus Ma, Tiantian Feng, Asterios Toutios, Haley Hsu, Louis Goldstein, Shrikanth Narayanan |
| 2025 | A Bayesian Approach to L2 Fluency Ratings by Native and Nonnative Listeners. Kakeru Yazawa, Takayuki Konishi |
| 2025 | A Cascaded Multimodal Framework for Automatic Social Communication Severity Assessment in Children with Autism Spectrum Disorder. Jihyun Mun, Sunhee Kim, Minhwa Chung |
| 2025 | A Chinese Heart Failure Status Speech Database with Universal and Personalised Classification. Yue Pan, Liwei Liu, Changxin Li, Xingyao Wang, Yili Xia, Hanyue Zhang, Ming Chu |
| 2025 | A Comparative Study on Proactive and Passive Detection of Deepfake Speech. Chia-Hua Wu, Wanying Ge, Xin Wang, Junichi Yamagishi, Yu Tsao, Hsin-Min Wang |
| 2025 | A Comprehensive Real-World Assessment of Audio Watermarking Algorithms: Will They Survive Neural Codecs? Yigitcan Özer, Woosung Choi, Joan Serrà, Mayank Kumar Singh, Wei-Hsiang Liao, Yuki Mitsufuji |
| 2025 | A Cookbook for Community-driven Data Collection of Impaired Speech in Low-Resource Languages. Sumaya Ahmed Salihs, Isaac Wiafe, Jamal-Deen Abdulai, Elikem Doe Atsakpo, Gifty Ayoka, Richard Cave, Akon Obu Ekpezu, Catherine Holloway, Katrin Tomanek, Fiifi Baffoe Payin Winful |
| 2025 | A Copula-Based Generative Score-Level Fusion Model for Speaker Verification. Sandro Cumani |
| 2025 | A Data-Driven Diffusion-based Approach for Audio Deepfake Explanations. Petr Grinberg, Ankur Kumar, Surya Koppisetti, Gaurav Bharaj |
| 2025 | A Dataset for Automatic Assessment of TTS Quality in Spanish. Alejandro Sosa Welford, Leonardo Pepino |
| 2025 | A Deformable Convolution GAN Approach for Speech Dereverberation in Cochlear Implant Users. Hsin-Tien Chiang, John H. L. Hansen |
| 2025 | A Domain Robust Pre-Training Method with Local Prototypes for Speaker Verification. Qing Gu, Yan Song, Haoyu Song, Nan Jiang, Lirong Dai, Ian McLoughlin |
| 2025 | A Gradient Effect of Hand Beat Timing on Spoken Word Recognition. Chengjia Ye, James M. McQueen, Hans Rutger Bosker |
| 2025 | A Hybrid Approach to Combining Role Diarization with ASR for Professional Conversations. Bongjun Kim, Arindam Ghosh, Mark C. Fuhs, Anurag Chowdhury, Deblin Bagchi, Monika Woszczyna |
| 2025 | A Joint Network for Singing Melody Extraction from Polyphonic Music with Attention Aggregation and Self-Consistency Training. Jiabo Jing, Ying Hu, Hao Huang, Liang He, Zhijian Ou |
| 2025 | A Lightweight Hybrid Dual Channel Speech Enhancement System under Low-SNR Conditions. Zheng Wang, Xiaobin Rong, Yu Sun, Tianchi Sun, Zhibin Lin, Jing Lu |
| 2025 | A Multi-Dialectal Dataset for German Dialect ASR and Dialect-to-Standard Speech Translation. Verena Blaschke, Miriam Winkler, Constantin Förster, Gabriele Wenger-Glemser, Barbara Plank |
| 2025 | A Multi-Stream Framework Utilizing 3D Human Reconstruction for Cued Speech Recognition. Katerina Papadimitriou, Gerasimos Potamianos |
| 2025 | A Multimodal Chinese Dataset for Cross-lingual Sarcasm Detection. Xiyuan Gao, Bruce Xiao Wang, Meiling Zhang, Shuming Huang, Zhu Li, Shekhar Nayak, Matt Coler |
| 2025 | A Naturally Elicited Multimodal Stress Database and Speech Breathing Based Stress Detection. Karumannil Mohamed Ismail Yasar Arafath, Mohammed Abeer K. C., Aurobinda Routray |
| 2025 | A Neural Codec Approach for Noise-Robust Bandwidth Expansion. Xi Liu, Mu Yang, Szu-Jui Chen, John H. L. Hansen |
| 2025 | A Novel Deep Learning Framework for Efficient Multichannel Acoustic Feedback Control. Yuan-Kuei Wu, Juan Azcarreta Ortiz, Kashyap Patel, Buye Xu, Jung-Suk Lee, Sanha Lee, Ashutosh Pandey |
| 2025 | A Perception-Based L2 Speech Intelligibility Indicator: Leveraging a Rater's Shadowing and Sequence-to-sequence Voice Conversion. Haopeng Geng, Daisuke Saito, Nobuaki Minematsu |
| 2025 | A Practitioner's Guide to Building ASR Models for Low-Resource Languages: A Case Study on Scottish Gaelic. Ondrej Klejch, William Lamb, Peter Bell |
| 2025 | A Robust Hybrid ACC-PM Approach for Personal Sound Zones. Yaqi Zhu, Lei Zhou, Hongqing Liu, Liming Shi, Lu Gan |
| 2025 | A Self-Training Approach for Whisper to Enhance Long Dysarthric Speech Recognition. Shiyao Wang, Jiaming Zhou, Shiwan Zhao, Yong Qin |
| 2025 | A Semantic Information-based Hierarchical Speech Enhancement Method Using Factorized Codec and Diffusion Model. Yang Xiang, Canan Huang, Desheng Hu, Jingguang Tian, Xinhui Hu, Chao Zhang |
| 2025 | A Siamese Network-Based Framework for Voice Mimicry Proficiency Assessment Using X-Vector Embeddings. Bhasi K. C., Rajeev Rajan |
| 2025 | A Silent Speech Decoding System from EEG and EMG with Heterogenous Electrode Configurations. Masakazu Inoue, Motoshige Sato, Kenichi Tomeoka, Nathania Nah, Eri Hatakeyama, Kai Arulkumaran, Ilya Horiguchi, Shuntaro Sasai |
| 2025 | A Simple-Yet-Effective Data Augmentation Method for Speaker Identification in Novels. Wenjie Zhong, Jason Naradowsky, Yusuke Miyao |
| 2025 | A Study of Real-world Audio-Visual Corpus Design and Production: A Perspective from MISP Challenges. Hang Chen, Jun Du, Qing Wang, Juan Xie, Shi-Fu XIong |
| 2025 | A Study of Speech Embedding Similarities Between Australian Aboriginal and High-Resource Languages. Eliathamby Ambikairajah, Jingyao Wu, Ting Dang, Vidhyasaharan Sethu |
| 2025 | A Study on Speech Assessment with Visual Cues. Shafique Ahmed, Ryandhimas E. Zezario, Nasir Saleem, Amir Hussain, Hsin-Min Wang, Yu Tsao |
| 2025 | A Study on The Impact of Foundation Models on Automatic Depression Detection from Speech Signals. Bubai Maji, Monorama Swain, Shazia Nasreen, Debabrata Majumdar, Rajlakshmi Guha, Aurobinda Routray, Anders Søgaard |
| 2025 | A Three-Stage Beamforming with Harmonic Guidance for Multi-Channel Speech Enhancement. Nurali Alip, Tianrui Wang, Rui Cao, Meng Ge, Jingru Lin, Longbiao Wang, Jianwu Dang |
| 2025 | A Two-Stage Hierarchical Deep Filtering Framework for Real-Time Speech Enhancement. Shenghui Lu, Hukai Huang, Jinanglong Yao, Kaidi Wang, Qingyang Hong, Lin Li |
| 2025 | A Watermark for Auto-Regressive Speech Generation Models. Yihan Wu, Ruibo Chen, Georgios Milis, Junfeng Guo, Heng Huang |
| 2025 | A real-time MRI study on asymmetry in velum dynamics during VCV production with nasal sounds. Chetan Sharma, Vaishnavi Chandwanshi, Shreya Shrikant Karkun, Aditya Anand Gupta, Prasanta Kumar Ghosh |
| 2025 | A semi-automatic pipeline for transcribing and segmenting child speech. Polychronia Christodoulidou, James Tanner, Jane Stuart-Smith, Michael McAuliffe, Mridhula Murali, Amy Smith, Lauren Taylor, Joanne Cleland, Anja Kuschmann |
| 2025 | A simple method for predicting Clinical Scores in Huntington's Disease by leveraging ASR's uncertainty on spontaneous speech. Hadrien Titeux, Quang Tuan Rémy Nguyen, Andres Gil-Salcedo, Anne-Catherine Bachoud-Lévi, Emmanuel Dupoux |
| 2025 | A-SMiLE: Affective Sparse Mixture-of-Experts Adapter with Multi-Task Learning for Spoken Dialogue Models. Yi-Wen Chao, Yizhou Peng, Dianwen Ng, Yukun Ma, Chongjia Ni, Bin Ma, Eng Siong Chng |
| 2025 | AA-SLLM: An Acoustically Augmented Speech Large Language Model for Speech Emotion Recognition. Jialong Mai, Xiaofen Xing, Weidong Chen, Yuanbo Fang, Xiangmin Xu |
| 2025 | ABHINAYA - A System for Speech Emotion Recognition In Naturalistic Conditions Challenge. Soumya Dutta, Smruthi Balaji, Varada R, Viveka Salinamakki, Sriram Ganapathy |
| 2025 | AC/DC: LLM-based Audio Comprehension via Dialogue Continuation. Yusuke Fujita, Tomoya Mizumoto, Atsushi Kojima, Lianbo Liu, Yui Sudo |
| 2025 | ADCeleb: A Longitudinal Speech Dataset from Public Figures for Early Detection of Alzheimer's Disease. Kunxiao Gao, Anna Favaro, Najim Dehak, Laureano Moro-Velázquez |
| 2025 | ADI-20: Arabic Dialect Identification dataset and models. Haroun Elleuch, Salima Mdhaffar, Yannick Estève, Fethi Bougares |
| 2025 | AF-Vocoder: Artifact-Free Neural Vocoder with Global Artifact Filter. Zhuangqi Chen, Xianjun Xia, Xiaohuai Le, Siyu Sun, Chuanzeng Huang |
| 2025 | AISHELL-5: The First Open-Source In-Car Multi-Channel Multi-Speaker Speech Dataset for Automatic Speech Diarization and Recognition. Yuhang Dai, He Wang, Xingchen Li, Zihan Zhang, Shuiyuan Wang, Lei Xie, Xin Xu, Hongxiao Guo, Shaoji Zhang, Hui Bu, Wei Chen |
| 2025 | APTTS: Adversarial Post-training in Latent Flow Matching for Fast and High-fidelity Text-to-Speech. Hyungchan Yoon, Chanwoo Lee, Hoodong Lee, Stanley Jungkyu Choi |
| 2025 | ARiSE: Auto-Regressive Multi-Channel Speech Enhancement. Pengjie Shen, Xueliang Zhang, Zhong-Qiu Wang |
| 2025 | ASDA: Audio Spectrogram Differential Attention Mechanism for Self-Supervised Representation Learning. Junyu Wang, Tianrui Wang, Meng Ge, Longbiao Wang, Jianwu Dang |
| 2025 | ASR Confidence Estimation using True Class Lexical Similarity Score. Nagarathna Ravi, Thishyan Raj T, Ravi Teja Chaganti, Vipul Arora |
| 2025 | ASR-FAIRBENCH: Measuring and Benchmarking Equity Across Speech Recognition Systems. Anand Kumar Rai, Satyam Rahangdale, Utkarsh Anand, Animesh Mukherjee |
| 2025 | ASR-based segmentation for the analysis of larger child-speech datasets: Performance evaluation on vowels from Australian-English speaking children aged 4 to 11 years. Rui Cai, Titia Benders |
| 2025 | ASVspoof2019 vs. ASVspoof5: Assessment and Comparison. Avishai Weizman, Yehuda Ben-Shimol, Itshak Lapidot |
| 2025 | ATMM-SAGA: Alternating Training for Multi-Module with Score-Aware Gated Attention SASV system. Amro Asali, Yehuda Ben-Shimol, Itshak Lapidot |
| 2025 | Accelerating Autoregressive Speech Synthesis Inference With Speech Speculative Decoding. Zijian Lin, Yang Zhang, Yougen Yuan, Yuming Yan, Jinjiang Liu, Zhiyong Wu, Pengfei Hu, Qun Yu |
| 2025 | Accelerating Diffusion-based Text-to-Speech Model Trainingwith Dual Modality Alignment. Jeongsoo Choi, Zhikang Niu, Ji-Hoon Kim, Chunhui Wang, Joon Son Chung, Xie Chen |
| 2025 | Accelerating Flow-Matching-Based Text-to-Speech via Empirically Pruned Step Sampling. Qixi Zheng, Yushen Chen, Zhikang Niu, Ziyang Ma, Xiaofei Wang, Kai Yu, Xie Chen |
| 2025 | Accent Normalization Using Self-Supervised Discrete Tokens with Non-Parallel Data. Qibing Bai, Sho Inoue, Shuai Wang, Zhongjie Jiang, Yannan Wang, Haizhou Li |
| 2025 | Accessible Delivery of Visual-Acoustic Biofeedback for Speech Sound Disorder. Tara McAllister, Peter Traver, Amanda Eads, William Haack, Helen Carey, Yi Shan, Wendy Liang, Tae Hong Park |
| 2025 | Accessible Real-time Eye-gaze Tracking for Neurocognitive Health Assessment: A Multimodal Web-based Approach. Daniel Tisdale, Jackson Liscombe, David Pautler, Michael Neumann, Vikram Ramanarayanan |
| 2025 | Accurate, fast, cheap: Choose three. Replacing Multi-Head-Attention with Bidirectional Recurrent Attention for Long-Form ASR. Martin Ratajczak, Jean-Philippe Robichaud, Jennifer Drexler Fox |
| 2025 | Acoustic Detection of UAV Abnormality Using One Ground-Based Acoustic Vector Sensor. Dengjian Zhou, Jianghan Hai, Sijia Liao, Yue Ivan Wu, Kainam Thomas Wong, Xiujuan Zheng |
| 2025 | Acoustic Features of Mandarin Tone Production in Noise: A Comparison Between Chinese Native Speakers and Korean L2 Learners. Jinxin Ji, Yiying Hu, Xiaohu Yang, Gang Peng |
| 2025 | Acoustic Representation and Realization of Weak Elements Subcategories: In the Case of Tianjin Mandarin. Zhijie Li, Hui Feng |
| 2025 | Acoustic and Linguistic Biomarkers for Cognitive Impairment Detection from Speech. Catarina Botelho, David Gimeno-Gómez, Francisco Teixeira, John Mendonça, Patrícia Pereira, Diogo A. P. Nunes, Thomas Rolland, Anna Pompili, Rubén Solera-Ureña, Maria Ponte, David Martins de Matos, Carlos D. Martínez-Hinarejos, Isabel Trancoso, Alberto Abad |
| 2025 | Acoustic scattering AI for non-invasive object classifications: A case study on hair assessment. Long-Vu Hoang, Tuan Nguyen, Huy Dat Tran |
| 2025 | Acoustic similarities, articulatory uniqueness: Speech production mechanisms in individuals with congenital lip paralysis. Anne Hermes, Ivana Didirková, Philipp Buech, Gilles Vannuscorps |
| 2025 | Acquiring Pronunciation from Speech Audio via Multi-task Learning. Siqi Sun, Korin Richmond |
| 2025 | AdaKWS: Towards Robust Keyword Spotting with Test-Time Adaptation. Yang Xiao, Tianyi Peng, Yanghao Zhou, Rohan Kumar Das |
| 2025 | Adapting Whisper for Parameter-efficient Code-Switching Speech Recognition via Soft Prompt Tuning. Hongli Yang, Yizhou Peng, Hao Huang, Sheng Li |
| 2025 | Adapting Whisper for Streaming Speech Recognition via Two-Pass Decoding. Haoran Zhou, Xingchen Song, Brendan Fahy, Qiaochu Song, Binbin Zhang, Zhendong Peng, Anshul Wadhawan, Denglin Jiang, Apurv Verma, Vinay Ramesh, Srivas Prasad, Michele M. Franceschini |
| 2025 | Adapting Whisper for low-resource Hindi-English Code-Mix speech with on-the-fly Augmentation & LLM-Synthesised Data. Astik Biswas, Oleg Shevelev, Amine Abdaoui, Vivek Tyagi, Abdelmoumene Boumadane |
| 2025 | Adaptive Across-Subcenter Representation Learning for Imbalanced Anomalous Sound Detection. Dong Wang, Jiqing Han, Guibin Zheng, Tieran Zheng, Yongjun He |
| 2025 | Adaptive Differential Denoising for Respiratory Sounds Classification. Gaoyang Dong, Zhicheng Zhang, Ping Sun, Minghui Zhang |
| 2025 | Adaptive Knowledge Distillation for Device-Directed Speech Detection. Hyung-Gun Chi, Florian Pesce, Wonil Chang, Oggi Rudovic, Arturo Argueta, Stefan Braun, Vineet Garg, Ahmed Hussen Abdelaziz |
| 2025 | Addressing Task Conflicts in Stuttering Detection via MMoE-Based Multi-Task Learning. Xiaokang Liu, Xingfeng Li, Yudong Yang, Lan Wang, Nan Yan |
| 2025 | Advancing Emotion Recognition via Ensemble Learning: Integrating Speech, Context, and Text Representations. Xiaohan Shi, Jinyi Mi, Xingfeng Li, Tomoki Toda |
| 2025 | Advancing Pediatric ASR: The Role of Voice Generation in Disordered Speech. Karen Rosero, Ali N. Salman, Shreeram Suresh Chandra, Berrak Sisman, Cortney Van't Slot, Alex A. Kane, Rami R. Hallac, Carlos Busso |
| 2025 | Adversarial Attacks on Text-dependent Speaker Verification System. Sreekanth Sankala, Venkatesh Parvathala, Ramesh Gundluru, K. Sri Rama Murty |
| 2025 | Adversarial Deep Metric Learning for Cross-Modal Audio-Text Alignment in Open-Vocabulary Keyword Spotting. Youngmoon Jung, Yong-Hyeok Lee, Myunghun Jung, Jaeyoung Roh, Chang Woo Han, Hoon-Young Cho |
| 2025 | AfriHuBERT: A self-supervised speech representation model for African languages. Jesujoba O. Alabi, Xuechen Liu, Dietrich Klakow, Junichi Yamagishi |
| 2025 | Age-related changes in multisensory integration of emotions in an audiovisual face-prosody-semantics Stroop task. Yi Lin, Shumeng Ni, Yangfan Lu |
| 2025 | Agent-based modelling, sound change, and metaphony in Southern Italian varieties of Italo-Romance. Lilian von Bressensdorf, Pia Greca, Jonathan Harrington |
| 2025 | Aligning ASR Evaluation with Human and LLM Judgments: Intelligibility Metrics Using Phonetic, Semantic, and NLI Approaches. Bornali Phukon, Xiuwen Zheng, Mark Hasegawa-Johnson |
| 2025 | Alzheimer's Dementia Detection Using Perplexity from Paired Large Language Models. Yao Xiao, Heidi Christensen, Stefan Goetze |
| 2025 | Alzheimer's Disease Detection Using Co-Attention Mechanism for Acoustic and ASR-Transcribed Text Features. Yongqi Shao, Tao Fang |
| 2025 | Amplifying Artifacts with Speech Enhancement in Voice Anti-spoofing. Thanapat Trachu, Thanathai Lertpetchpun, Ekapol Chuangsuwanich |
| 2025 | An Effective Anomalous Sound Detection Method Based on Global and Local Attribute Mining. Nan Jiang, Yan Song, Qing Gu, Haoyu Song, Lirong Dai, Ian McLoughlin |
| 2025 | An Effective Training Framework for Light-Weight Automatic Speech Recognition Models. Abdul Hannan, Alessio Brutti, Shah Nawaz, Mubashir Noman |
| 2025 | An Exploration of Interpretable Deep Learning Models for the Assessment of Mild Cognitive Impairment. Emma Cathrine Liisborg Leschly, Oliver Roesler, Michael Neumann, Jackson Liscombe, Abhishek Hosamath, Lakshmi Arbatti, Line H. Clemmensen, Melanie Ganz, Vikram Ramanarayanan |
| 2025 | An Exploratory Framework for LLM-assisted Human Annotation of Speech Datasets. Alexander Johnson, Harsh Deshpande, Emmy Phung, Ahmad Emami |
| 2025 | An Investigative Study on Recent Sharpness- and Flatness-Based Optimizers for Enhanced Self-Supervised Speaker Verification. Abderrahim Fathan, Jahangir Alam, Xiaolin Zhu |
| 2025 | An approach to measuring the performance of Automatic Speech Recognition(ASR) models in the context of Large Language Model(LLM) powered applications. Sujith Pulikodan, Sahapthan K, Prasanta Kumar Ghosh, Visruth Sanka, Nihar Desai |
| 2025 | An interpretable speech foundation model for depression detection by revealing prediction-relevant acoustic features from long speech. Qingkun Deng, Saturnino Luz, Sofia de la Fuente Garcia |
| 2025 | Analysis and Evaluation of Synthetic Data Generation in Speech Dysfluency Detection. Jinming Zhang, Xuanru Zhou, Jiachen Lian, Shuhe Li, William Li, Zoe Ezzes, Rian Bogley, Lisa Wauters, Zachary A. Miller, Jet Vonk, Brittany Morin, Maria Luisa Gorno-Tempini, Gopala Anumanchipalli |
| 2025 | Analysis and Extension of a Near-End Listening Enhancement Method Based on Long-Term Fractile Noise Statistics. Filippo Villani, Wai-Yip Chan, Zheng-Hua Tan, Jan Østergaard, Jesper Jensen |
| 2025 | Analysis of ABC Frontend Audio Systems for the NIST-SRE24. Sara Barahona, Anna Silnova, Ladislav Mosner, Junyi Peng, Oldrich Plchot, Johan Rohdin, Lin Zhang, Jiangyu Han, Petr Pálka, Federico Landini, Lukás Burget, Themos Stafylakis, Sandro Cumani, Dominik Bobos, Miroslav Hlavácek, Martin Kodovsky, Tomás Pavlícek |
| 2025 | Analysis of Avian Biphonic Vocalization Using Computational Modelling. Noumida A, Rajeev Rajan |
| 2025 | Analysis of Phonetic Level Similarities Across Languages in Emotional Speech. Pravin Mote, Abinay Reddy Naini, Donita Robinson, Elizabeth Richerson, Carlos Busso |
| 2025 | Analysis of Semantic and Acoustic Token Variability Across Speech, Music, and Audio Domains. Takanori Ashihara, Marc Delcroix, Tsubasa Ochiai, Kohei Matsuura, Shota Horiguchi |
| 2025 | Analysis of the ABC Classification Backends for NIST SRE24. Sandro Cumani, Anna Silnova, Sara Barahona, Ladislav Mosner, Oldrich Plchot, Johan Rohdin |
| 2025 | Analyzing Mitigation Strategies for Catastrophic Forgetting in End-to-End Training of Spoken Language Models. Chi-Yuan Hsiao, Ke-Han Lu, Kai-Wei Chang, Chih-Kai Yang, Wei-Chih Chen, Hung-yi Lee |
| 2025 | Analyzing the Impact of Accent on English Speech: Acoustic and Articulatory Perspectives. Gowtham Premananth, Vinith Kugathasan, Carol Y. Espy-Wilson |
| 2025 | Analyzing the Importance of Blank for CTC-Based Knowledge Distillation. Benedikt Hilmes, Nick Rossenbach, Ralf Schlüter |
| 2025 | Anne Rowling Neurological Speech Corpus: clinically annotated longitudinal dataset for developing speech biomarkers in neurodegenerative disorders. Johnny Tam, Christine Weaver, Oliver Watts, Siddharthan Chandran, Suvankar Pal, Rowling Speech Consortium |
| 2025 | Anomalous Sound Detection Based Feature Fusion and Dual-path Non-linear Independent Components Estimation. Yawei Wang, Qiaoling Zhang, Yi Zhang, Junyao Hu |
| 2025 | Apical vs. Regular Vowel Duration: A Corpus-based Analysis of Contextual Influences in Standard Mandarin. Jingyi Sun, Bowei Shao, Martine Adda-Decker |
| 2025 | Approaching Dialogue State Tracking via Aligning Speech Encoders and LLMs. Simon Sedlácek, Bolaji Yusuf, Jan Svec, Pradyoth Hegde, Santosh Kesiraju, Oldrich Plchot, Jan Cernocký |
| 2025 | ArVoice: A Multi-Speaker Dataset for Arabic Speech Synthesis. Hawau Olamide Toyin, Rufael Marew, Humaid Alblooshi, Samar M. Magdy, Hanan Aldarmaki |
| 2025 | Are You Being Sarcastic? Prosodic Cues to Irony Perception in German. Sophia Fünfgeld, Angelika Braun, Katharina Zahner-Ritter |
| 2025 | Are loan sequences different from foreign sequences? A perception study with Japanese listeners on coronal obstruent - high front vowel sequences. Silke Hamann, Andrea Alicehajic |
| 2025 | ArticulateX: End-to-End Monolingual Speech Translation in Articulator Space. Vishal Kumar, Vinayak Abrol |
| 2025 | Articulatory Feature Prediction from Surface EMG during Speech Production. Jihwan Lee, Kevin Huang, Kleanthis Avramidis, Simon Pistrosch, Monica González Machorro, Yoonjeong Lee, Björn W. Schuller, Louis Goldstein, Shrikanth Narayanan |
| 2025 | Articulatory Strategy in Vowel Production as a Basis for Speaker Discrimination. Justin J. H. Lo, Patrycja Strycharczuk, Sam Kirkham |
| 2025 | Articulatory Vowel Distinctiveness in Spanish. Kristin Teplansky, Emily Rangel, Mimi LaValley, Jinuk Kwon, Beiming Cao, Jun Wang |
| 2025 | Articulatory clarity and variability before and after surgery for tongue cancer. Thomas Tienkamp, Fleur van Ast, Roos van der Veen, Teja Rebernik, Raoul Buurke, Nikki Hoekzema, Katharina Polsterer, Hedwig Sekeres, Rob van Son, Martijn Wieling, Max J. H. Witjes, Sebastiaan A. H. J. de Visscher, Defne Abur |
| 2025 | Articulatory modeling of the S-shaped F2 trajectories observed in Öhman's spectrographic analysis of VCV syllables. Frédéric Berthommier |
| 2025 | Articulatory variations in Apical Vowels in Southwestern Mandarin. Jing Huang, Feng-fan Hsieh, Yueh-Chin Chang |
| 2025 | Assessing the Performance and Efficiency of Mamba ASR in Low-Resource Scenarios. Rodolfo Zevallos, Martí Cortada Garcia, Sarah Solito, Carlos Mena, Alex Peiró Lilja, Javier Hernando |
| 2025 | Assessing the feasibility of Large Language Models for detecting micro-behaviors in team interactions during space missions. Ankush Raut, Projna Paromita, Sydney R. Begerowski, Suzanne T. Bell, Theodora Chaspari |
| 2025 | Assessment of L2 Oral Proficiency using Speech Large Language Models. Rao Ma, Mengjie Qian, Siyuan Tang, Stefano Bannò, Kate M. Knill, Mark J. F. Gales |
| 2025 | Assessment of the synthetic quality and controllability of laughing onset in speech-laugh synthesis. Ryo Setoguchi, Yoshiko Arimoto |
| 2025 | Attention Is Not Always the Answer: Optimizing Voice Activity Detection with Simple Feature Fusion. Kumud Tripathi, Chowdam Venkata Kumar, Pankaj Wasnik |
| 2025 | Attention Models and Auditory Transduction Features for Noise Robustness. Cathal Ó Faoláin, Andrew Hines |
| 2025 | Attention-Free Dual-Mode ASR with Latency-Controlled Selective State Spaces. Takafumi Moriya, Masato Mimura, Kiyoaki Matsui, Hiroshi Sato, Kohei Matsuura |
| 2025 | AttentiveMOS: A Lightweight Attention-Only Model forSpeech Quality Prediction. Imran E. Kibria, Donald S. Williamson |
| 2025 | Attractor-Based Speech Separation of Multiple Utterances by Unknown Number of Speakers. Yuzhu Wang, Archontis Politis, Konstantinos Drossos, Tuomas Virtanen |
| 2025 | Audio Deepfake Source Tracing using Multi-Attribute Open-Set Identification and Verification. Pierre Falez, Tony Marteau, Damien Lolive, Arnaud Delhay |
| 2025 | Audio-Based Classification and Geographic Regression of Austrian Dialects. Lorenz Gutscher, Michael Pucher |
| 2025 | Audiobox TTA-RAG: Improving Zero-Shot and Few-Shot Text-To-Audio with Retrieval-Augmented Generation. Mu Yang, Bowen Shi, Matthew Le, Wei-Ning Hsu, Andros Tjandra |
| 2025 | Augment Mandarin to Cantonese Speech Databases via Retrieval-Augmented Generation and Speech Synthesis. Fan Liu, Cheng Gong, Boyu Zhu, Ruihao Jing, Chunyu Qiang, Tianrui Wang, Xiao-Lei Zhang, Xuelong Li |
| 2025 | AuralNet: Hierarchical Attention-based 3D Binaural Localization of Overlapping Speakers. Linya Fu, Yu Liu, Zhijie Liu, Zedong Yang, Zhong-Qiu Wang, Youfu Li, He Kong |
| 2025 | AusKidTalk: Using Strategic Data Collection and Out-of-Domain Tools to Semi-Automate Novel Corpora Annotation. Tünde Szalay, Mostafa Shahin, Tharmakulasingam Sirojan, Zheng Nan, Renata Huang, Kirrie J. Ballard, Beena Ahmed |
| 2025 | Auto-Landmark: Acoustic Landmark Dataset and Open-Source Toolkit for Landmark Extraction. Xiangyu Zhang, Daijiao Liu, Tianyi Xiao, Cihan Xiao, Tünde Szalay, Mostafa Shahin, Beena Ahmed, Julien Epps |
| 2025 | Automated evaluation of children's speech fluency for low-resource languages. Bowen Zhang, Nur Afiqah Abdul Latiff, Justin Kan, Rong Tong, Donny Soh, Xiaoxiao Miao, Ian McLoughlin |
| 2025 | Automatic Detection and Sub-typing of Primary Progressive Aphasia from Speech: Integrating Task-Specific Features and Spatio-Semantic Graphs. Fritz Peters, W. Richard Bevan-Jones, Grace Threlfall, Jenny M. Harris, Julie S. Snowden, Matthew Jones, Jennifer C. Thompson, Daniel J. Blackburn, Heidi Christensen |
| 2025 | Automatic Dialectal Transcription: An Evaluation on Finnish and Norwegian. Olli Kuparinen |
| 2025 | Automatic Labeling and Correction of Noisy Labels for Robust Self-Supervised Speaker Verification. Abderrahim Fathan, Jahangir Alam |
| 2025 | Automatic Speech Recognition Biases in Newcastle English: an Error Analysis. Dana Serditova, Kevin Tang, Jochen Steffens |
| 2025 | Automatic Speech Recognition for Low-Resourced Middle Eastern Languages. Razhan Hameed, Sina Ahmadi, Hanah Hadi, Rico Sennrich |
| 2025 | Automatic Speech Recognition of African American English: Lexical and Contextual Effects. Hamid Mojarad, Kevin Tang |
| 2025 | Automatic classification of stop realisation with wav2vec2.0. James Tanner, Morgan Sonderegger, Jane Stuart-Smith, Jeff Mielke, Tyler Kendall |
| 2025 | Automatic detection of speech sound disorders in German-speaking children: augmenting the data with typically developed speech. Darline Monika Marx, Marco Matassoni, Alessio Brutti |
| 2025 | AxLSTMs: learning self-supervised audio representations with xLSTMs. Sarthak Yadav, Sergios Theodoridis, Zheng-Hua Tan |
| 2025 | BR-ASR: Efficient and Scalable Bias Retrieval Framework for Contextual Biasing ASR in Speech LLM. Xun Gong, Anqi Lv, Wangyou Zhang, Zhiming Wang, Huijia Zhu, Yanmin Qian |
| 2025 | Backchannel prediction for natural spoken dialog systems using general speaker and listener information. Yoshinori Fukunaga, Ryota Nishimura, Kengo Ohta, Norihide Kitaoka |
| 2025 | Band-SCNet: A Causal, Lightweight Model for High-Performance Real-Time Music Source Separation. Junqi Yang, Yuhong Yang, Weiping Tu, Xin Zhao, Cedar Lin |
| 2025 | Band-Split Self-supervised Mamba for Infant-centered Audio Analysis. Xulin Fan, Jialu Li, Mark Hasegawa-Johnson, Nancy L. McElwain |
| 2025 | Bayesian Learning for Domain-Invariant Speaker Verification and Anti-Spoofing. Jin Li, Man-Wai Mak, Johan Rohdin, Kong Aik Lee, Hynek Hermansky |
| 2025 | Beat gestures made by human-like avatars affect speech perception. Matteo Maran, Renske Rötjes, Anna R. E. Schreurs, Hans Rutger Bosker |
| 2025 | Benchmarking Neural Speech Codec Intelligibility with SITool. Anna Leschanowsky, Kishor Kayyar Lakshminarayana, Anjana Rajasekhar, Lyonel Behringer, Ibrahim Kilinc, Guillaume Fuchs, Emanuël A. P. Habets |
| 2025 | Benchmarking Time-localized Explanations for Audio Classification Models. Cecilia Bolaños, Leonardo Pepino, Martín Meza, Luciana Ferrer |
| 2025 | Benchmarking and Confidence Evaluation of LALMs For Temporal Reasoning. Debarpan Bhattacharya, Apoorva Kulkarni, Sriram Ganapathy |
| 2025 | Better Pseudo-labeling with Multi-ASR Fusion and Error Correction by SpeechLLM. Jeena J. Prakash, Blessingh Kumar, Kadri Hacioglu, Bidisha Sharma, Sindhuja Gopalan, Malolan Chetlur, Shankar Venkatesan, Andreas Stolcke |
| 2025 | Better Semi-supervised Learning for Multi-domain ASR Through Incremental Retraining and Data Filtering. Andrés Carofilis, Pradeep Rangappa, Srikanth R. Madikeri, Shashi Kumar, Sergio Burdisso, Jeena J. Prakash, Esaú Villatoro-Tello, Petr Motlícek, Bidisha Sharma, Kadri Hacioglu, Shankar Venkatesan, Saurabh Vyas, Andreas Stolcke |
| 2025 | Beyond Attacks: Advancing Fake Speech Detection with Attack-Agnostic Methods. Shilpa Chandra, Akansha Tyagi, Shiven Patel, Padmanabhan Rajan |
| 2025 | Beyond Conventional Metrics: using Entropic Triangles to Explain Balancing Methods in Acoustic Scene Classification. Claudia Montero-Ramírez, Alba Martínez-Serrano, Jorge Garcelán-Gómez, Francisco J. Valverde-Albacete, Carmen Peláez-Moreno |
| 2025 | Beyond Hard Sharing: Efficient Multi-Task Speech-to-Text Modeling with Supervised Mixture of Experts. Hojun Jin, Eunsoo Hong, Ziwon Hyung, Sungjun Lim, Seungjin Lee, Keunseok Cho |
| 2025 | Beyond Manual Transcripts: The Potential of Automated Speech Recognition Errors in Improving Alzheimer's Disease Detection. Yin-Long Liu, Rui Feng, Jia-Xin Chen, Yi-Ming Wang, Jia-Hong Yuan, Zhen-Hua Ling |
| 2025 | Beyond Similarity Scoring: Detecting Entailment and Contradiction in Multilingual and Multimodal Contexts. Othman Istaiteh, Salima Mdhaffar, Yannick Estève |
| 2025 | Beyond Traditional Speech Modifications : Utilizing Self Supervised Features for Enhanced Zero-Shot Children ASR. Abhijit Sinha, Hemant Kumar Kathania, Mikko Kurimo |
| 2025 | BiCrossMamba-ST: Speech Deepfake Detection with Bidirectional Mamba Spectro-Temporal Cross-Attention. Yassine El Kheir, Tim Polzehl, Sebastian Möller |
| 2025 | Bidirectional Spoken-Written Text Conversion with Large Language Models. Muyeol Choi, HyunJung Choi, Yohan Lim, Jeong-Uk Bang, Minkyu Lee, Seon Hui Kim, Seung Yun, Donghyun Kim, Minsoo Kim, Sanghun Kim |
| 2025 | Bilingual Speakers Exhibit Cognitive Fatigue: A Speech Disfluencies Case Study on Research Talks. Ashwin Ram, Marisol Muñoz, Zoi Gkalitsiou, Alexandros G. Dimakis |
| 2025 | BitTTS: Highly Compact Text-to-Speech Using 1.58-bit Quantization and Weight Indexing. Masaya Kawamura, Takuya Hasumi, Yuma Shirahata, Ryuichi Yamamoto |
| 2025 | Bona fide Cross Testing Reveals Weak Spot in Audio Deepfake Detection Systems. Kwok Chin Yuen, Jia Qi Yip, Zhen Qiu, Chi-Hung Chi, Kwok-Yan Lam |
| 2025 | Boosting StoRM Convergence with Metric Guidance and Non-uniform State-Sampling for Optimal Dereverberation. Chandra Mohan Sharma, Arnab Kumar Roy, Anupam Mandal, Prasanta Kumar Ghosh, Prasanna Kumar Kr |
| 2025 | Boundary-Conscious Pruning: Hard Set-Aware Model Compression for Efficient Speaker Recognition. Seongkyu Mun, Jubum Han |
| 2025 | Brain-tuned Speech Models Better Reflect Speech Processing Stages in the Brain. Omer Moussa, Mariya Toneva |
| 2025 | Breaking Resource Barriers in Speech Emotion Recognition via Data Distillation. Yi Chang, Zhao Ren, Zhonghao Zhao, Thanh Tam Nguyen, Kun Qian, Tanja Schultz, Björn W. Schuller |
| 2025 | Bridging ASR and LLMs for Dysarthric Speech Recognition: Benchmarking Self-Supervised and Generative Approaches. Ahmed Aboeitta, Ahmed Sharshar, Youssef Nafea, Shady Shehata |
| 2025 | Bridging Audio and Vision: Zero-Shot Audiovisual Segmentation by Connecting Pretrained Models. Seung-jae Lee, Paul Hongsuck Seo |
| 2025 | Bridging Speech and Singing: Multi-stage Speech-Prompted Singing Voice Conversion with Speaker Embedding Adaptation. Mingda Liu, Jiatong Shi |
| 2025 | Bridging the Training-Inference Gap in TTS: Training Strategies for Robust Generative Postprocessing for Low-Resource Speakers. Frank Zalkow, Paolo Sani, Kishor Kayyar Lakshminarayana, Emanuël A. P. Habets, Nicola Pia, Christian Dittmar |
| 2025 | Bringing Interpretability to Neural Audio Codecs. Samir Sadok, Julien Hauret, Éric Bavu |
| 2025 | Building an Accurate Open-Source Hebrew ASR System through Crowdsourcing. Yanir Marmor, Yair Lifshitz, Yoad Snapir, Kinneret Misgav |
| 2025 | CAGCRN: Real-Time Speech Enhancement with a Lightweight Model for Joint Acoustic Echo Cancellation and Noise Suppression. Yuyang Wang, Yonghui Liu, Jianbing Liu, Kai Niu, Zhiqiang He |
| 2025 | CAMER: Contribution-Aware Multimodal Emotion Recognition. Sun-Kyung Lee, Jong-Hwan Kim |
| 2025 | CAPR: Confidence-Aware Prompt Refinement in Large Language Models. Jen-Tzung Chien, Po-Chun Huang |
| 2025 | CBA-Whisper: Curriculum Learning-Based AdaLoRA Fine-Tuning on Whisper for Low-Resource Dysarthric Speech Recognition. Tianyi Tan, Xin'an Chen, Xiaohuai Le, Wenzhi Fan, Xianjun Xia, Chuanzeng Huang, Jing Lu |
| 2025 | CBA: Backdoor Attack on Deep Speech Classification via Audio Compression. Yuheng Huang, Ying Ren, Wenjie Zhang, Diqun Yan |
| 2025 | CEREALES : a new dataset of Quebec French accented speech with applications to speech recognition. Lucas Maison, Thomas Soulas, Marie-Jean Meurs |
| 2025 | CHSER: A Dataset and Case Study on Generative Speech Error Correction for Child ASR. Natarajan Balaji Shankar, Zilai Wang, Kaiyuan Zhang, Mohan Shi, Abeer Alwan |
| 2025 | CLAP-ART: Automated Audio Captioning with Semantic-rich Audio Representation Tokenizer. Daiki Takeuchi, Binh Thien Nguyen, Masahiro Yasuda, Yasunori Ohishi, Daisuke Niizumi, Noboru Harada |
| 2025 | CLEP-DG: Contrastive Learning for Speech Emotion Domain Generalization via Soft Prompt Tuning. Jiacheng Shi, Yanfu Zhang, Ye Gao |
| 2025 | CMSP-ST: Cross-modal Mixup with Speech Purification for End-to-End Speech Translation. Jiale Ou, Hongying Zan |
| 2025 | CMT-LLM: Contextual Multi-Talker ASR Utilizing Large Language Models. Jiajun He, Naoki Sawada, Koichi Miyazaki, Tomoki Toda |
| 2025 | CNVSRC 2024: The Second Chinese Continuous Visual Speech Recognition Challenge. Zehua Liu, Xiaolou Li, Chen Chen, Lantian Li, Dong Wang |
| 2025 | CS-FLEURS: A Massively Multilingual and Code-Switched Speech Dataset. Brian Yan, Injy Hamed, Shuichiro Shimizu, Vasista Sai Lodagala, William Chen, Olga Iakovenko, Bashar Talafha, Amir Hussein, Alexander Polok, Kalvin Chang, Dominik Klement, Sara Althubaiti, Puyuan Peng, Matthew Wiesner, Thamar Solorio, Ahmed Ali, Sanjeev Khudanpur, Shinji Watanabe |
| 2025 | CabinSep: IR-Augmented Mask-Based MVDR for Real-Time In-car Speech Separation with Distributed Heterogeneous Arrays. Runduo Han, Yanxin Hu, Yihui Fu, Zihan Zhang, Yukai Jv, Li Chen, Lei Xie |
| 2025 | Calm-Whisper: Reduce Whisper Hallucination On Non-Speech By Calming Crazy Heads Down. Yingzhi Wang, Anas Alhmoud, Saad Alsahly, Muhammad Alqurishi, Mirco Ravanelli |
| 2025 | Can AI Understand Mandarin Speech Prosody? A Framework and Benchmark Showcase. Zilong Wang, Xiaoxue Zhang, Xinyang Jiang, Kaitao Song, Jue Yu |
| 2025 | Can ASR generate valid measures of child reading fluency? Wieke Harmsen, Roeland van Hout, Catia Cucchiarini, Helmer Strik |
| 2025 | Can Emotion Fool Anti-spoofing? Aurosweta Mahapatra, Ismail Rasim Ulgen, Abinay Reddy Naini, Carlos Busso, Berrak Sisman |
| 2025 | Can Multimodal Foundation Models Help Analyze Child-Inclusive Autism Diagnostic Videos? Aditya Kommineni, Digbalay Bose, Tiantian Feng, So Hyun Kim, Helen Tager-Flusberg, Somer Bishop, Catherine Lord, Sudarsana Kadiri, Shrikanth Narayanan |
| 2025 | Can Quantized Audio Language Models Perform Zero-Shot Spoofing Detection? Bikash Dutta, Rishabh Ranjan, Shyam Sathvik, Mayank Vatsa, Richa Singh |
| 2025 | Can Speech Accurately Detect Depression in Patients With Comorbid Dementia? An Approach for Mitigating Confounding Effects of Depression and Dementia. Sophie Young, Fuxiang Tao, Bahman Mirheidari, Madhurananda Pahar, Markus Reuber, Heidi Christensen |
| 2025 | Can We Reconstruct a Dysarthric Voice with the Large Speech Model Parler TTS? Ariadna Sanchez, Simon King |
| 2025 | Can We Trust Machine Learning? The Reliability of Features from Open-Source Speech Analysis Tools for Speech Modeling. Tahiya Chowdhury, Verónica Romero |
| 2025 | Can we train ASR systems on Code-switch without real code-switch data? Case study for Singapore's languages. Tuan Nguyen, Huy Dat Tran |
| 2025 | Cantonese Punctuation Restoration using LLM Annotated Data. King Yiu Suen, Rudolf Chow, Albert Y. S. Lam |
| 2025 | Causal Structure Discovery for Error Diagnostics of Children's ASR. Vishwanath Pratap Singh, Md. Sahidullah, Tomi Kinnunen |
| 2025 | Chain-of-Thought Distillation with Fine-Grained Acoustic Cues for Speech Emotion Recognition. Jialong Mai, Xiaofen Xing, Yangbiao Li, Xiangmin Xu |
| 2025 | Chain-of-Thought Training for Open E2E Spoken Dialogue Systems. Siddhant Arora, Jinchuan Tian, Hayato Futami, Jee-weon Jung, Jiatong Shi, Yosuke Kashiwagi, Emiru Tsunoo, Shinji Watanabe |
| 2025 | Challenges and practical guidelines for atypical speech data collection, annotation, usage and sharing: A multi-project perspective. Zhengjun Yue, Mara Barberis, Tanvina Patel, Judith Dineley, Willemijn Doedens, Lottie Stipdonk, Yuanyuan Zhang, Elke De Witte, Erfan Loweimi, Hugo Van hamme, Djaina Satoer, Marina B. Ruiter, Laureano Moro-Velázquez, Nicholas Cummins, Odette Scharenborg |
| 2025 | Challenges in Automated Processing of Speech from Child Wearables: The Case of Voice Type Classifier. Tarek Kunze, Marianne Métais, Hadrien Titeux, Lucas Elbert, Joseph Coffey, Emmanuel Dupoux, Alejandrina Cristià, Marvin Lavechin |
| 2025 | Character Error Rate Estimation for Semi-Supervised Training of Speech Recognition for Arabic Dialects. Chanho Park, Oscar Saz |
| 2025 | Characterization of voice cue sensitivity and vocal emotion recognition across the adult lifespan. Laura Rachman, Deniz Baskent |
| 2025 | Children's Voice Privacy: First Steps and Emerging Challenges. Ajinkya Kulkarni, Francisco Teixeira, Enno Hermann, Thomas Rolland, Isabel Trancoso, Mathew Magimai-Doss |
| 2025 | ClapFM-EVC: High-Fidelity and Flexible Emotional Voice Conversion with Dual Control from Natural Language and Speech. Yu Pan, Yanni Hu, Yuguang Yang, Jixun Yao, Jianhao Ye, Hongbin Zhou, Lei Ma, Jianjun Zhao |
| 2025 | ClaritySpeech: Dementia Obfuscation in Speech. Dominika C. Woszczyk, Ranya Aloufi, Soteris Demetriou |
| 2025 | ClearerVoice-Studio: Bridging Advanced Speech Processing Research and Practical Deployment. Shengkui Zhao, Zexu Pan, Bin Ma |
| 2025 | Clinical Annotations for Automatic Stuttering Severity Assessment. Ana Rita Valente, Rufael Marew, Hawau Olamide Toyin, Hamdan Al-Ali, Anelise Bohnen, Inma Becerra, Elsa Marta Soares, Gonçalo Leal, Hanan Aldarmaki |
| 2025 | Clustering-based Hard Negative Sampling for Supervised Contrastive Speaker Verification. Piotr Masztalski, Michal Romaniuk, Jakub Zak, Mateusz Matuszewski, Konrad Kowalczyk |
| 2025 | Co-Speech Motion for Virtual Agents in Dialogue Using LLM-Driven Primitive Action Selection. Muhammad Yeza Baihaqi, Angel F. Garcia Contreras, Seiya Kawano, Koichiro Yoshino |
| 2025 | Co-registration of real-time MRI and respiration for speech research. Yubin Zhang, Prakash Kumar, Ye Tian, Ziwei Zhao, Xuan Shi, Kevin Huang, Kevin Lee, Haley Hsu, Shrikanth Narayanan, Krishna S. Nayak, Louis Goldstein |
| 2025 | Cocktail-Party Audio-Visual Speech Recognition. Thai-Binh Nguyen, Ngoc-Quan Pham, Alexander Waibel |
| 2025 | Code Mix TTS: An Approach to Infer Human Like Speech for Multi-Lingual Input Texts. Vishal Gourav, Phanindra Mankale |
| 2025 | Codec-Based Deepfake Source Tracing via Neural Audio Codec Taxonomy. Xuanjun Chen, I-Ming Lin, Lin Zhang, Jiawei Du, Haibin Wu, Hung-yi Lee, Jyh-Shing Roger Jang |
| 2025 | Collecting, Curating, and Annotating Good Quality Speech deepfake dataset for Famous Figures: Process and Challenges. Hashim Ali, Surya Subramani, Raksha Varahamurthy, Nithin Sai Adupa, Lekha Bollinani, Hafiz Malik |
| 2025 | CommissionsQC: a Québec French Speech Corpus for Automatic Speech Recognition. Coralie Serrand, Amira Morsli, Gilles Boulianne |
| 2025 | Comparative Analysis of Fast and High-Fidelity Neural Vocoders for Low-Latency Streaming Synthesis in Resource-Constrained Environments. Reo Yoneyama, Masaya Kawamura, Ryo Terashima, Ryuichi Yamamoto, Tomoki Toda |
| 2025 | Comparative Evaluation of Acoustic Feature Extraction Tools for Clinical Speech Analysis. Anna Seo Gyeong Choi, Alexander Richardson, Ryan Partlan, Sunny X. Tang, Sunghye Cho |
| 2025 | Comparison of Acoustic and Textual Features for Dysarthria Severity Classification in Amyotrophic Lateral Sclerosis. Y. S. Upendra Vishwanath, Tanuka Bhattacharjee, Deekshitha G, Sathvik Udupa, Chowdam Venkata Thirumala Kumar, Madassu Keerthipriya, Darshan Chikktimmegowda, Dipti Baskar, Yamini Belur, Seena Vengalil, Atchayaram Nalini, Prasanta Kumar Ghosh |
| 2025 | Comparison-Based Automatic Evaluation for Meeting Summarization. Ziwei Gong, Lin Ai, Harsh Deshpande, Alexander Johnson, Emmy Phung, Zehui Wu, Ahmad Emami, Julia Hirschberg |
| 2025 | Concurrent Speech and Auditory Tag Clouds for Non-Visual Web Interaction. Dhia Eddine Merzougui, Nilesh Tete, Fabrice Maurel, Gaël Dias, Mohammed Hasanuzzaman, Aurélien Bournonville, Edgar Madelaine, Thomas Berthelin Le Tellier, François Ledoyen, Laure Poutrain-Lejeune, François Rioult, Jérémie Pantin |
| 2025 | Conformer-based Ultrasound-to-Speech Conversion. Ibrahim Ibrahimov, Csaba Zainkó, Gábor Gosztolya |
| 2025 | Constrained LDDMM for Dynamic Vocal Tract Morphing: Integrating Volumetric and Real-Time MRI. Tharinda Piyadasa, Joan Glaunès, Amelia Gully, Michael Proctor, Kirrie J. Ballard, Tünde Szalay, Naeim Sanaei, Sheryl Foster, David Waddington, Craig T. Jin |
| 2025 | Context is all you need? Low-resource conversational ASR profits from context, coming from the same or from the other speaker. Julian Linke, Jana Winkler, Barbara Schuppler |
| 2025 | Context-Driven Dynamic Pruning for Large Speech Foundation Models. Masao Someki, Shikhar Bharadwaj, Atharva Anand Joshi, Chyi-Jiunn Lin, Jinchuan Tian, Jee-weon Jung, Markus Müller, Nathan Susanj, Jing Liu, Shinji Watanabe |
| 2025 | Contextual Paralinguistic Data Creation for Multi-Modal Speech-LLM: Data Condensation and Spoken QA Generation. Qiongqiong Wang, Hardik B. Sailor, Tianchi Liu, Ai Ti Aw |
| 2025 | Contextual predictability effects on acoustic distinctiveness in read Polish speech. Zofia Malisz, Jan Foremski, Malgorzata Kul |
| 2025 | Contextualized Automatic Speech Recognition with Dynamic Vocabulary Prediction and Activation. Zhennan Lin, Kaixun Huang, Wei Ren, Linju Yang, Lei Xie |
| 2025 | Continual Speech Learning with Fused Speech Features. Guitao Wang, Jinming Zhao, Hao Yang, Guilin Qi, Tongtong Wu, Gholamreza Haffari |
| 2025 | Continuous Learning for Children's ASR: Overcoming Catastrophic Forgetting with Elastic Weight Consolidation and Synaptic Intelligence. Edem Ahadzi, Vishwanath Pratap Singh, Tomi Kinnunen, Ville Hautamäki |
| 2025 | Continuous prediction of backchannel timing for human-robot interaction. Michael Paierl, Martin Hagmüller, Barbara Schuppler |
| 2025 | Contrastive Learning-based Syllable-Level Mispronunciation Detection and Diagnosis for Speech Audiometry. Longbin Jin, Donghun Min, Jung Eun Shin, Eun Yi Kim |
| 2025 | Conveying Gender Through Speech: Insights from Trans Men. Alice Ross, Cliodhna Hughes, Eddie L. Ungless, Catherine Lai |
| 2025 | Coping with segmental-prosodic incongruity in spoken word recognition in Japanese. Terumichi Ariga |
| 2025 | Corpus-Based Insights into Mandarin Neutral Tone: Effects of Tonal Context and Structural Patterns in Spontaneous Speech. Jingyi Sun, Nicolas Audibert, Yaru Wu, Martine Adda-Decker |
| 2025 | Count Your Speakers! Multitask Learning for Multimodal Speaker Diarization. Prabhav Singh, Jesús Villalba, Najim Dehak |
| 2025 | Counterfactual Activation Editing for Post-hoc Prosody and Mispronunciation Correction in TTS Models. Kyowoon Lee, Artyom Stitsyuk, Gunu Jho, Inchul Hwang, Jaesik Choi |
| 2025 | Creaky Voice Facilitates More Efficient Phonological Processing of Mandarin Tone 3. Zixia Fan, Ronny Ibrahim, Joshua Penney, Felicity Cox |
| 2025 | Cross-Attention-Based Target Sound Extraction by Fully Leveraging Enrollment in a Shared Latent Space. Xue Yang, Guiru Shen, Yu Yang |
| 2025 | Cross-Modal Watermarking for Authentic Audio Recovery and Tamper Localization in Synthesized Audiovisual Forgeries. Minyoung Kim, Sehwan Park, Sungmin Cha, Paul Hongsuck Seo |
| 2025 | Cross-attention and Self-attention for Audio-visual Speaker Diarization in MISP-Meeting Challenge. Zhaoyang Li, Haodong Zhou, Longjie Luo, Xiaoxiao Li, Yongxin Chen, Lin Li, Qingyang Hong |
| 2025 | Cross-corpus open-set Speech Emotion Recognition Method Based on Spatiotemporal Features with Inverse-Entropy Regularization. Zhaohui Zhou, Hui Luo |
| 2025 | Cross-lingual Data Selection Using Clip-level Acoustic Similarity for Enhancing Low-resource Automatic Speech Recognition. Shunsuke Mitsumori, Sara Kashiwagi, Keitaro Tanaka, Shigeo Morishima |
| 2025 | Cross-modal Knowledge Transfer Learning as Graph Matching Based on Optimal Transport for ASR. Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai |
| 2025 | CrossPhon: An Auto Phone Mapping Tool to Streamline Cross-language Modeling for Phone Alignment of Low-resource Languages. Hongchen Wu, Yixin Gu |
| 2025 | Crowdsourcing MUSHRA Tests in the Age of Generative Speech Technologies: A Comparative Analysis of Subjective and Objective Testing Methods. Laura Lechler, Chamran Moradi, Ivana Balic |
| 2025 | Cryfish: On deep audio analysis with Large Language Models. Anton Mitrofanov, Sergey Novoselov, Tatiana Prisyach, Vladislav Marchevskiy, Arseniy Karelin, Nikita Khmelev, Dmitry Dutov, Stepan Malykh, Igor Agafonov, Aleksandr Nikitin, Oleg Petrov |
| 2025 | D-GAT: Dual Graph Attention Network for Global HRTF Interpolation. Junsheng Hu, Shaojie Li, Qintuya Si, De Hu |
| 2025 | DAFMSVC: One-Shot Singing Voice Conversion with Dual Attention Mechanism and Flow Matching. Wei Chen, Binzhu Sha, Dan Luo, Jing Yang, Zhuo Wang, Fan Fan, Zhiyong Wu |
| 2025 | DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models. Heng-Jui Chang, Hongyu Gong, Changhan Wang, James R. Glass, Yu-An Chung |
| 2025 | DGMO: Training-Free Audio Source Separation through Diffusion-Guided Mask Optimization. Geonyoung Lee, Geonhee Han, Paul Hongsuck Seo |
| 2025 | DLF-EEND: Dynamic Layer Fusion for End-to-End Speaker Diarization. Wooil Kim, Bongsu Jung |
| 2025 | DRI-GAN: A Novel Dual Real Input GAN with Triplet Loss for Cross-Lingual and Noisy SLU. Ankit Kumar, Munir Georges |
| 2025 | DS-Codec: Dual-Stage Training with Mirror-to-NonMirror Architecture Switching for Speech Codec. Peijie Chen, Wenhao Guan, Kaidi Wang, Weijie Wu, Hukai Huang, Qingyang Hong, Lin Li |
| 2025 | DYNAC: Dynamic Vocabulary-based Non-Autoregressive Contextualization for Speech Recognition. Yui Sudo, Yosuke Fukumoto, Muhammad Shakeel, Yifan Peng, Chyi-Jiunn Lin, Shinji Watanabe |
| 2025 | Data Augmentation using Speech Synthesis for Speaker-Independent Dysarthria Severity Classification. Minseop Kim, Minsu Han, Seokyoung Hong, Myoung-Wan Koo |
| 2025 | Data-driven approaches to pitch modelling in two Mexican Spanish ethnolects: K-means Clustering & GAMMs. Gilly Marchini, Jeremy Steffman |
| 2025 | Decoding Alzheimer's: Interpretable Visual and Logical Attention in Picture Description Tasks. Ning Wang, Bingyang Wen, Minghui Wu, Yang Sun, Zongru Shao, Haojie Zhou, K. P. Subbalakshmi |
| 2025 | Decoding Listener's Identity: Person Identification from EEG Signals Using a Lightweight Spiking Transformer. Zheyuan Lin, Siqi Cai, Haizhou Li |
| 2025 | Decoding Speaker-Normalized Pitch from EEG for Mandarin Perception. Jia-Xin Chen, Yi-Ming Wang, Ziyu Zhang, Jiayang Han, Yin-Long Liu, Rui Feng, Xiuyuan Liang, Zhen-Hua Ling, Jia-Hong Yuan |
| 2025 | Deep learning based spatial aliasing reduction in beamforming for audio capture. Mateusz Guzik, Giulio Cengarle, Daniel Arteaga |
| 2025 | Deep-Simplex Multichannel Speech Separation. Tzlil Avidan, Bracha Laufer-Goldshtein |
| 2025 | DeepFilterGAN: A Full-band Real-time Speech Enhancement System with GAN-based Stochastic Regeneration. Sanberk Serbest, Tijana Stojkovic, Milos Cernak, Andrew Harper |
| 2025 | Defend for Self-Vocoding: A Novel Enhanced Decoder Network for Watermark Recovery. Yu-Sheng Lin, Ching-Yu Yang, Hsing-Hang Chou, Ya-Tse Wu, Bo-Hao Su, Chi-Chun Lee |
| 2025 | Defending Speech-enabled LLMs Against Adversarial Jailbreak Threats. Antonios Alexos, Raghuveer Peri, Sai Muralidhar Jayanthi, Metehan Cekic, Srikanth Vishnubhotla, Kyu J. Han, Srikanth Ronanki |
| 2025 | Defending Unauthorized Voice Cloning with Watermark-Aware Codecs. Jiankun Zhao, Lingwei Meng, Chengxi Deng, Helen Meng, Xixin Wu |
| 2025 | Delayed-KD: Delayed Knowledge Distillation based CTC for Low-Latency Streaming ASR. Longhao Li, Yangze Li, Hongfei Xue, Jie Liu, Shuai Fang, Kai Wang, Lei Xie |
| 2025 | DepressGEN: Synthetic Data Generation Framework for Depression Detection. Wenrui Liang, Rong Zhang, Xuezhen Zhang, Ying Ma, Wei-Qiang Zhang |
| 2025 | Developing High-Quality TTS for Punjabi and Urdu: Benchmarking against MMS Models. Fatima Naseem, Maham Sajid, Farah Adeeba, Sahar Rauf, Asad Mustafa, Sarmad Hussain |
| 2025 | Developing a High-performance Framework for Speech Emotion Recognition in Naturalistic Conditions Challenge for Emotional Attribute Prediction. Thanathai Lertpetchpun, Tiantian Feng, Dani Byrd, Shrikanth Narayanan |
| 2025 | Developing a LeFF Transformer Model for Exacerbated Speech Detection in COPD and Asthma. Yuyang Yan, Sami O. Simons, Visara Urovi |
| 2025 | Developing a Top-tier Framework in Naturalistic Conditions Challenge for Categorized Emotion Prediction: From Speech Foundation Models and Learning Objective to Data Augmentation and Engineering Choices. Tiantian Feng, Thanathai Lertpetchpun, Dani Byrd, Shrikanth Narayanan |
| 2025 | Development and Validation of a Wav2Vec 2.0-Based Cross-Language Methodology for Measurement of Articulatory Precision. Tanya Talkar, Kan Kawabata, Connor Higgins, Sean Tobyne |
| 2025 | Dhvani: A Weakly-supervised Phonemic Error Detection and Personalized Feedback System for Hindi. Arnav Rustagi, Satvik Bajpai, Nimrat Kaur, Siddharth |
| 2025 | DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech. Deok-Hyeon Cho, Hyung-Seok Oh, Seung-Bin Kim, Seong-Whan Lee |
| 2025 | Dialogue Response Prefetching Based on Semantic Similarity and Prediction Confidence of Language Model. Kiyotada Mori, Seiya Kawano, Angel F. Garcia Contreras, Koichiro Yoshino |
| 2025 | Diarization-Guided Multi-Speaker Embeddings. Joonas Kalda, Clément Pagés, Tanel Alumäe, Hervé Bredin |
| 2025 | DiceHuBERT: Distilling HuBERT with a Self-Supervised Learning Objective. Hyung-Gun Chi, Zakaria Aldeneh, Tatiana Likhomanenko, Oggi Rudovic, Takuya Higuchi, Li-Wei Chen, Shinji Watanabe, Ahmed Hussen Abdelaziz |
| 2025 | DiffDSR: Dysarthric Speech Reconstruction Using Latent Diffusion Model. Xueyuan Chen, Dongchao Yang, Wenxuan Wu, Minglin Wu, Jing Xu, Xixin Wu, Zhiyong Wu, Helen Meng |
| 2025 | DiffEmotionVC: A Dual-Granularity Disentangled Diffusion Framework for Any-to-Any Emotional Voice Conversion. Xiaosu Su, Bowen Yang, Xiaowei Yi, Yun Cao |
| 2025 | DiffMV-ETS: Diffusion-based Multi-Voice Electromyography-to-Speech Conversion using Speaker-Independent Speech Training Targets. Kevin Scheck, Tom Dombeck, Zhao Ren, Peter Wu, Michael Wand, Tanja Schultz |
| 2025 | DiffStereo: End-to-End Mono-to-Stereo Audio Generation with Diffusion Transformer. Suqi Zhang, Zheqi Dai, Yongyi Zang, Yin Cao, Qiuqiang Kong |
| 2025 | Differentiable K-means for Fully-optimized Discrete Token-based ASR. Kentaro Onda, Yosuke Kashiwagi, Emiru Tsunoo, Hayato Futami, Shinji Watanabe |
| 2025 | Differentiable Reward Optimization for LLM based TTS system. Changfeng Gao, Zhihao Du, Shiliang Zhang |
| 2025 | Diffusion Buffer: Online Diffusion-based Speech Enhancement with Sub-Second Latency. Bunlong Lay, Rostilav Makarov, Timo Gerkmann |
| 2025 | Direct-path Relative Harmonic Coefficients Detection for Multi-source Direction-of-Arrival Estimation in Reverberant Environments. Liang Tao, Maoshen Jia, Yonggang Hu |
| 2025 | Direction-Aware Neural Acoustic Fields for Few-Shot Interpolation of Ambisonic Impulse Responses. Christopher Ick, Gordon Wichern, Yoshiki Masuyama, François G. Germain, Jonathan Le Roux |
| 2025 | Directional Speech Recognition with Full-Duplex Capability. Ju Lin, Yiteng Huang, Ming Sun, Frank Seide, Florian Metze |
| 2025 | Discl-VC: Disentangled Discrete Tokens and In-Context Learning for Controllable Zero-Shot Voice Conversion. Kaidi Wang, Wenhao Guan, Ziyue Jiang, Hukai Huang, Peijie Chen, Weijie Wu, Qingyang Hong, Lin Li |
| 2025 | Discovering Directions of Uncertainty in Speech Inpainting. Kfir Cohen, Lior Wolf, Bracha Laufer-Goldshtein |
| 2025 | Discrete Audio Representations for Automated Audio Captioning. Jingguang Tian, Haoqin Sun, Xinhui Hu, Xinkang Xu |
| 2025 | Discrete Tokens Exhibit Interlanguage Speech Intelligibility Benefit: an Analytical Study Towards Accent-robust ASR Only with Native Speech Data. Kentaro Onda, Keisuke Imoto, Satoru Fukayama, Daisuke Saito, Nobuaki Minematsu |
| 2025 | Disentangling Dual-Encoder Masked Autoencoder for Respiratory Sound Classification. Peidong Wei, Shiyu Miao, Lin Li |
| 2025 | Disentangling Speaker and Content in Pre-trained Speech Models with Latent Diffusion for Robust Speaker Verification. Zhe Li, Man-Wai Mak, Jen-Tzung Chien, Mert Pilanci, Zezhong Jin, Helen Meng |
| 2025 | Distilling a speech and music encoder with task arithmetic. Fabian Ritter Gutierrez, Yi-Cheng Lin, Jui-Chiang Wei, Jeremy H. M. Wong, Eng Siong Chng, Nancy F. Chen, Hung-yi Lee |
| 2025 | DnR-nonverbal: Cinematic Audio Source Separation DatasetContaining Non-Verbal Sounds. Takuya Hasumi, Yusuke Fujita |
| 2025 | Do you read me? - flow of speech effect on speaker recognition systems. Alicja Martinek, Joanna Gajewska, Ewelina Bartuzi-Trokielewicz |
| 2025 | Does English fish sound like French fiche? Perceptual similarity judgments versus acoustic similarity. Rory Turnbull, Elisa Kiefer, Sharon Peperkamp |
| 2025 | Does effortful speech production indicate communication difficulty caused by noise and hearing aid support? Lena-Marie Huttner, Jeppe H. Christensen, Gitte Keidser, Tobias May, Torsten Dau, Sergi Rotger-Griful |
| 2025 | Dog2vec: Self-Supervised Pre-Training for Canine Vocal Representation. Xingyuan Li, Kenny Q. Zhu, Mengyue Wu |
| 2025 | Domain Adaptation Method and Modality Gap Impact in Audio-Text Models for Prototypical Sound Classification. Emiliano Acevedo, Martín Rocamora, Magdalena Fuentes |
| 2025 | DuRep: Dual-Mode Speech Representation Learning via ASR-Aware Distillation. Prabash Reddy Male, Swayambhu Nath Ray, Harish Arsikere, Akshat Jaiswal, Prakhar Swarup, Prantik Sen, Debmalya Chakrabarty, K. V. Vijay Girish, Nikhil Bhave, Frederick Weber, Sambuddha Bhattacharya, Sri Garimella |
| 2025 | Dual Orthogonality Sub-center Loss for Enhanced Anomalous Sound Detection. Dong Wang, Jiqing Han, Tieran Zheng, Guibin Zheng, Yongjun He |
| 2025 | DualCodec: A Low-Frame-Rate, Semantically-Enhanced Neural Audio Codec for Speech Generation. Jiaqi Li, Xiaolong Lin, Zhekai Li, Shixi Huang, Yuancheng Wang, Chaoren Wang, Zhenpeng Zhan, Zhizheng Wu |
| 2025 | Dynamic Acoustic Model Architecture Optimization in Training for ASR. Jingjing Xu, Zijian Yang, Albert Zeyer, Eugen Beck, Ralf Schlüter, Hermann Ney |
| 2025 | Dynamic Context-Aware Streaming Pretrained Language Model For Inverse Text Normalization. Luong Ho, Khanh Le, Vinh Pham, Bao Nguyen, Tan Tran, Duc Chau |
| 2025 | Dynamic Layer Gating for Speech Enhancement. Venkatesh Parvathala, K. Sri Rama Murty |
| 2025 | Dysarthric Speech Recognition Using Curriculum Learning and Multi-stream Architecture. I-Ting Hsieh, Chung-Hsien Wu |
| 2025 | Dysfluent WFST: A Framework for Zero-Shot Speech Dysfluency Transcription and Detection. Chenxu Guo, Jiachen Lian, Xuanru Zhou, Jinming Zhang, Shuhe Li, Zongli Ye, Peter Park, Anaisha Das, Zoe Ezzes, Jet Vonk, Brittany Morin, Rian Bogley, Lisa Wauters, Zachary A. Miller, Maria Luisa Gorno-Tempini, Gopala Anumanchipalli |
| 2025 | E2E-BPVC: End-to-End Background-Preserving Voice Conversion via In-Context Learning. Yihan Liu, Zhengyang Chen, Leying Zhang, Yanmin Qian |
| 2025 | EAA: Emotion-Aware Audio Large Language Models with Dual Cross-Attention and Context-Aware Instruction Tuning. Hongfei Du, Sidi Lu, Gang Zhou, Ye Gao |
| 2025 | EASY: Emotion-aware Speaker Anonymization via Factorized Distillation. Jixun Yao, Hexin Liu, Eng Siong Chng, Lei Xie |
| 2025 | EATS-Speech: Emotion-Adaptive Transformation and Priority Synthesis for Zero-Shot Text-to-Speech. Jingyuan Xing, Zhipeng Li, Shuaiqi Chen, Xiaofen Xing, Xiangmin Xu |
| 2025 | EEG-based Speech Decoding Based on Multi-mode Joint Modeling. Peiran Li, Fei Chen, Xixin Wu |
| 2025 | EEG-based Voice Conversion : Hearing the Voice of Your Brain. Yizhong Geng, Wenxin Fu, Qihang Lu, Bingsong Bai, Cong Wang, Yingming Gao, Ya Li |
| 2025 | EME-TTS: Unlocking the Emphasis and Emotion Link in Speech Synthesis. Haoxun Li, Leyuan Qu, Jiaxi Hu, Taihao Li |
| 2025 | Echoes of Phonetics: Unveiling Relevant Acoustic Cues for ASR via Feature Attribution. Dennis Fucci, Marco Gaido, Matteo Negri, Mauro Cettolo, Luisa Bentivogli |
| 2025 | Effect of Loudspeaker Emitted Speech on ASR performance. Vikram C. M, Sanjoy Pal, Nidhi Mantri, Gopal Kumar Agrawal |
| 2025 | Effect of Noise Floor in Room Impulse Response on Speech Perception Under Spherical Harmonics-based Spatial Sound Reproduction. Yunqi C. Zhang, Dhruv Jagmohan, Hong Kit Li, C. T. Justine Hui, Yusuke Hioka |
| 2025 | Effect of physical exercise on voice in people living with COPD. Lauren G. Reinders, Loes van Bemmel, Alexander Mackay, David Nobbs, Frits M. E. Franssen, Hester Gietema, Simona Schäfer, Sami O. Simons |
| 2025 | Effective Context in Neural Speech Models. Yen Meng, Sharon Goldwater, Hao Tang |
| 2025 | Effective and Efficient One-pass Compression of Speech Foundation Models Using Sparsity-aware Self-pinching Gates. Haoning Xu, Zhaoqing Li, Youjun Chen, Huimeng Wang, Guinan Li, Mengzhe Geng, Chengxi Deng, Xunying Liu |
| 2025 | Effects of Prosodic Information on Dialect Classification Using Whisper Features. Phoebe Parsons, Heming Strømholt Bremnes, Knut Kvale, Torbjørn Svendsen, Giampiero Salvi |
| 2025 | Effects of Speaker Count, Duration, and Accent Diversity on Zero-Shot Accent Robustness in Low-Resource ASR. Zheng Xin Yong, Vineel Pratap, Michael Auli, Jean Maillard |
| 2025 | Efficient Data Selection for Domain Adaptation of ASR Using Pseudo-Labels and Multi-Stage Filtering. Pradeep Rangappa, Andrés Carofilis, Jeena J. Prakash, Shashi Kumar, Sergio Burdisso, Srikanth R. Madikeri, Esaú Villatoro-Tello, Bidisha Sharma, Petr Motlícek, Kadri Hacioglu, Shankar Venkatesan, Saurabh Vyas, Andreas Stolcke |
| 2025 | Efficient Multilingual ASR Finetuning via LoRA Language Experts. Jiahong Li, Yiwen Shao, Jianheng Zhuo, Chenda Li, Liliang Tang, Dong Yu, Yanmin Qian |
| 2025 | Efficient Neural and Numerical Methods for High-QualityOnline Speech Spectrogram Inversion via Gradient Theorem. Andres Fernandez, Juan Azcarreta Ortiz, Çagdas Bilen, Jesus Monge-Alvarez |
| 2025 | Efficient Noise-Robust Hybrid Audiovisual Encoder with Joint Distillation and Pruning for Audiovisual Speech Recognition. Zhengyang Li, Pascal Reichert, Thomas Graave, Patrick Blumenberg, Tim Fingscheidt |
| 2025 | Efficient Speech Enhancement via Embeddings from Pre-trained Generative Audioencoders. Xingwei Sun, Heinrich Dinkel, Yadong Niu, Linzhang Wang, Junbo Zhang, Jian Luan |
| 2025 | Efficient Streaming Speech Quality Prediction with Spiking Neural Networks. Mattias Nilsson, Riccardo Miccini, Julian Rossbroich, Clément Laroche, Tobias Piechowiak, Friedemann Zenke |
| 2025 | Efficient Streaming TTS Acoustic Model with Depthwise RVQ Decoding Strategies in a Mamba Framework. Joun Yeop Lee, Sangjun Park, Byoung Jin Choi, Ji-Hyun Lee, Min-Kyung Kim, Hoon-Young Cho |
| 2025 | Efficient Trie-based Biasing using K-step Prediction for Rare Word Recognition. Kwok Chin Yuen, Jia Qi Yip |
| 2025 | Efficient and Direct Duplex Modeling for Speech-to-Speech Language Model. Ke Hu, Ehsan Hosseini-Asl, Chen Chen, Edresson Casanova, Subhankar Ghosh, Piotr Zelasko, Zhehuai Chen, Jason Li, Jagadeesh Balam, Boris Ginsburg |
| 2025 | Efficient and Microphone-Fault-Tolerant 3D Sound Source Localization. Yiyuan Yang, Shitong Xu, Niki Trigoni, Andrew Markham |
| 2025 | Egocentric Speaker Classification in Child-Adult Dyadic Interactions: From Sensing to Computational Modeling. Tiantian Feng, Anfeng Xu, Xuan Shi, Somer Bishop, Shrikanth Narayanan |
| 2025 | Eigenvoice Synthesis based on Model Editing for Speaker Generation. Masato Murata, Koichi Miyazaki, Tomoki Koriyama, Tomoki Toda |
| 2025 | EmbedAug: An Augmentation Scheme for End-to-End Automatic Speech Recognition. Ashish Panda, Sunil Kumar Kopparapu |
| 2025 | EmoDB 2.0: A Database of Emotional Speech in a World that is not Black or White but Grey. Felix Burkhardt, Oliver Schrüfer, Uwe D. Reichel, Hagen Wierstorf, Anna Derington, Florian Eyben, Björn W. Schuller |
| 2025 | EmoJudge: LLM Based Post-Hoc Refinement for Multimodal Speech Emotion Recognition. Prabhav Singh, Jesús Villalba |
| 2025 | EmoSpeechAuth: Emotion-Aware Speaker Verification. Magdalena Golebiowska, Piotr Syga |
| 2025 | EmoSphere-SER: Enhancing Speech Emotion Recognition Through Spherical Representation with Auxiliary Classification. Deok-Hyeon Cho, Hyung-Seok Oh, Seung-Bin Kim, Seong-Whan Lee |
| 2025 | Emotion-Guided Graph Attention Networks for Speech-Based Depression Detection under Emotion-Inducting Tasks. Yuqiu Zhou, Yongjie Zhou, Yudong Yang, Yang Liu, Jun Huang, Shuzhi Zhao, Rongfeng Su, Lan Wang, Nan Yan |
| 2025 | EmotionRankCLAP: Bridging Natural Language Speaking Styles and Ordinal Speech Emotion via Rank-N-Contrast. Shreeram Suresh Chandra, Lucas Goncalves, Junchen Lu, Carlos Busso, Berrak Sisman |
| 2025 | Employing self-supervised learning models for cross-linguistic child speech maturity classification. Theo Zhang, Madurya Suresh, Anne Warluamont, Kasia Hitczenko, Alejandrina Cristià, Margaret Cychosz |
| 2025 | Empowering Large Language Models for End-to-End Speech Translation Leveraging Synthetic Data. Yu Pu, Xiaoqian Liu, Guangyu Zhang, Zheng Yan, Wei-Qiang Zhang, Xie Chen |
| 2025 | EnCodecMAE: leveraging neural codecs for universal audio representation learning. Leonardo Pepino, Pablo Riera, Luciana Ferrer |
| 2025 | Enabling the replicability of speech synthesis perceptual evaluations. Sébastien Le Maguer, Gwénolé Lecorvé, Damien Lolive, Naomi Harte, Juraj Simko |
| 2025 | End-to-End DOA-Guided Speech Extraction in Noisy Multi-Talker Scenarios. Kangqi Jing, Wenbin Zhang, Yu Gao |
| 2025 | End-to-End Diarization utilizing Attractor Deep Clustering. David Palzer, Matthew Maciejewski, Eric Fosler-Lussier |
| 2025 | End-to-End Indian Language Dubbing with Zero-Shot Speaker Preservation. Giri Raju, Sandeep Konam |
| 2025 | End-to-End Speech Translation Guided by Robust Translation Capability of Large Language Model. Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi |
| 2025 | End-to-End Speech Translation for Low-Resource Languages Using Weakly Labeled Data. Aishwarya Pothula, Bhavana Akkiraju, Srihari Bandarupalli, Charan Devarkonda, Santosh Kesiraju, Anil Kumar Vuppala |
| 2025 | Enhanced Hybrid Transducer and Attention Encoder Decoder with Text Data. Yun Tang, Eesung Kim, Vijendra Raj Apsingekar |
| 2025 | Enhancing Acoustic-to-Articulatory Inversion with Multi-Target Pretraining for Low-Resource Settings. Jesuraj Bandekar, Prasanta Kumar Ghosh |
| 2025 | Enhancing Acoustic-to-Articulatory Speech Inversion by Incorporating Nasality. Saba Tabatabaee, Suzanne Boyce, Liran Oren, Mark Tiede, Carol Y. Espy-Wilson |
| 2025 | Enhancing Audio Deepfake Detection by Improving Representation Similarity of Bonafide Speech. Seung-Bin Kim, Hyun-seo Shin, Jungwoo Heo, Chan-yeong Lim, Kyo-Won Koo, Jisoo Son, Sanghyun Hong, Souhwan Jung, Ha-Jin Yu |
| 2025 | Enhancing GOP in CTC-Based Mispronunciation Detection with Phonological Knowledge. Aditya Kamlesh Parikh, Cristian Tejedor García, Catia Cucchiarini, Helmer Strik |
| 2025 | Enhancing Generalization of Speech Large Language Models with Multi-Task Behavior Imitation and Speech-Text Interleaving. Jingran Xie, Xiang Li, Hui Wang, Yue Yu, Yang Xiang, Xixin Wu, Zhiyong Wu |
| 2025 | Enhancing Low-Resource Language and Instruction Following Capabilities of Audio Language Models. Potsawee Manakul, Guangzhi Sun, Warit Sirichotedumrong, Kasima Tharnpipitchai, Kunat Pipatanakul |
| 2025 | Enhancing Lyrics Transcription on Music Mixtures with Consistency Loss. Jiawen Huang, Felipe Sousa, Emir Demirel, Emmanouil Benetos, Igor Gadelha |
| 2025 | Enhancing Retrieval-Augmented Audio Captioning with Generation-Assisted Multimodal Querying and Progressive Learning. Changin Choi, Sungjun Lim, Wonjong Rhee |
| 2025 | Enhancing Serialized Output Training for Multi-Talker ASR with Soft Monotonic Alignment and Utterance-level Timestamp. Fengyun Tan, Tao Wei, Kun Zou, Ning Cheng, Shaojun Wang, Jing Xiao |
| 2025 | Enhancing Speech Emotion Recognition with Multi-Task Learning and Dynamic Feature Fusion. Honghong Wang, Jing Deng, Fanqin Meng, Rong Zheng |
| 2025 | Enhancing Speech Instruction Understanding and Disambiguation in Robotics via Speech Prosody. David Sasu, Benedict Quartey, Kweku Andoh Yamoah, Natalie Schluter |
| 2025 | Enhancing Syllabic Recognition via Speech-EEG Phase Analysis and Non-Activity State Modeling. Rini A. Sharon, Hema A. Murthy |
| 2025 | Enhancing Target-speaker Automatic Speech Recognition Using Multiple Speaker Embedding Extractors with Virtual Speaker Embedding. Ju-Seok Seong, Jeong-Hwan Choi, Ye-Rin Jeoung, Ilseok Kim, Joon-Hyuk Chang |
| 2025 | Enhancing Transcripts of Open-Source Automatic Speech Recognition Models Through Fine-Tuning with Laughter and Speech-Laugh. Phuoc Hoang Ho, Dragos Alexandru Balan, Dirk K. J. Heylen, Khiet P. Truong |
| 2025 | EnvSDD: Benchmarking Environmental Sound Deepfake Detection. Han Yin, Yang Xiao, Rohan Kumar Das, Jisheng Bai, Haohe Liu, Wenwu Wang, Mark D. Plumbley |
| 2025 | Equivalence and differences: Formant patterns of labialization and pharyngealization in Tashlhiyt. Philipp Buech, Anne Hermes, Rachid Ridouane |
| 2025 | Evaluating ASR Robustness to Spontaneous Speech Errors: A Study of WhisperX Using a Speech Error Database. John Alderete, Macarious Kin Fung Hui, Aanchan Mohan |
| 2025 | Evaluating Automatic Speech Recognition Pipelines for Mandarin-English Bilingual Child Language Assessment in Telehealth. Hongchen Wu, Yao Du, Zirong Li, Yixin Gu, Disha Thotappala Jayaprakash, Li Sheng |
| 2025 | Evaluating Deep Speaker Embedding Robustness to Domain, Sampling Rate, and Codec Variations. Alexandre Ferro Filho, Diogo Fernandes Costa Silva, Pedro Elias Engelberg Silva Borges, Arlindo Rodrigues Galvão Filho |
| 2025 | Evaluating Large Language Models in Data Generation for Low-Resource Scenarios: A Case Study on Question Answering. Ebru Arisoy, Merve Ünlü Menevse, Yusufcan Manav, Arzucan Özgür |
| 2025 | Evaluating Logit-Based GOP Scores for Mispronunciation Detection. Aditya Kamlesh Parikh, Cristian Tejedor García, Catia Cucchiarini, Helmer Strik |
| 2025 | Evaluating Parameter Sharing for Spoofing-Aware Speaker Verification: A Case Study on the ASVspoof 5 Dataset. Aykut Büker, Oguzhan Kurnaz, Sule Bekiryazici, Selim Can Demirtas, Cemal Hanilçi |
| 2025 | Evaluating Progress of CALL System Users on Accentedness and Comprehensibility: An Acoustic and ASR-Based Approach. Wenwei Dong, Catia Cucchiarini, Roeland van Hout, Helmer Strik |
| 2025 | Evaluating Speech Enhancement Performance Across Demographics and Language. José Giraldo, Alex Peiró Lilja, Carme Armentano-Oller, Rodolfo Zevallos, Cristina España-Bonet |
| 2025 | Evaluating Speech Foundation Models for Automatic Speech Recognition in the Low-Resource Kanyen'kéha Language. Mengzhe Geng, Patrick Littell, Aidan Pine, Robbie Jimerson, Gilles Boulianne, Vishwa Gupta, Rolando Coto-Solano, Anna Kazantseva, Marc Tessier, Delaney Lothian, Akwiratékha' Martin, Eric Joanis, Samuel Larkin, Roland Kuhn |
| 2025 | Evaluating Wav2Vec2-Bert for Computer-Assisted Pronunciation Training for isiZulu. Alexandra Fort, Francis Tyers |
| 2025 | Evaluating the Effectiveness of Pre-Trained Audio Embeddings for Classification of Parkinson's Disease Speech Data. Emmy Postma, Cristian Tejedor García |
| 2025 | Evaluating the Usefulness of Non-Diagnostic Speech Data for Developing Parkinson's Disease Classifiers. Terry Yi Zhong, Esther Janse, Cristian Tejedor García, Louis ten Bosch, Martha A. Larson |
| 2025 | Evaluating the suitability of acoustic parameters for capturing breathy voice in non-pathological female speakers. Chloe Patman, Paul Foulkes, Kirsty McDougall |
| 2025 | Evaluation of Three Automatic Alignment Tools for the Processing of Non-native French. Qian Zhou, Mathilde Hutin |
| 2025 | Evaluation of a model for sound radiation from the vocal tract wall. Peter Birkholz, Tianyi Zhang |
| 2025 | ExagTTS: An Approach Towards Controllable Word Stress Incorporated TTS for Exaggerated Synthesized Speech Aiding Second Language Learners. Anindita Mondal, Monica Surtani, Anil Kumar Vuppala, Parameswari Krishnamurthy, Chiranjeevi Yarra |
| 2025 | Examining Test-Time Adaptation for Personalized Child Speech Recognition. Zhonghao Shi, Xuan Shi, Anfeng Xu, Tiantian Feng, Harshvardhan Srivastava, Shrikanth Narayanan, Maja J. Mataric |
| 2025 | Explainable Depression Detection using Masked Hard Instance Mining. Patawee Prakrankamanant, Shinji Watanabe, Ekapol Chuangsuwanich |
| 2025 | Explainable Speech Emotion Recognition Through Attentive Pooling: Insights from Attention-Based Temporal Localization. Tahitoa Leygue, Astrid Sabourin, Christian Bolzmacher, Sylvain Bouchigny, Margarita Anastassova, Quoc-Cuong Pham |
| 2025 | Exploiting Bispectral Features for Single-Channel Speech Enhancement. Venkatesh Parvathala, Ramesh Gundluru, Sreekanth Sankala, K. Sri Rama Murty |
| 2025 | Exploiting Context-dependent Duration Features for Voice Anonymization Attack Systems. Natalia A. Tomashenko, Emmanuel Vincent, Marc Tommasi |
| 2025 | Exploiting Echo Path Priors for Enhanced Stereo Acoustic Echo Cancellation. Jinfu Wang, Ziteng Wang, Xin Liu, Yang Liu, Qing Shi, Zhengqiang Luo, Feiran Yang |
| 2025 | Exploratory Analysis of Brainstem fMRI Data During Sustained Phonation. Carey Smith, Hu Cheng, Pertti Palo, Daniel Aalto, Steven M. Lulich |
| 2025 | Exploratory Study of Filled Pauses in Ukrainian Language: Phonetic Properties of Filled Pauses. Anna Havras, Carlos Mendes, Helena Moniz, Gueorgui Hristovsky, João Miranda |
| 2025 | Exploring Efficient Directional and Distance Cues for Regional Speech Separation. Yiheng Jiang, Haoxu Wang, Yafeng Chen, Gang Qiao, Biao Tian |
| 2025 | Exploring Generative Error Correction for Dysarthric Speech Recognition. Moreno La Quatra, Alkis Koudounas, Valerio Mario Salerno, Sabato Marco Siniscalchi |
| 2025 | Exploring Linear Variant Transformers and k-NN Memory Inference for Long-Form ASR. Carlos Carvalho, Jinchuan Tian, William Chen, Yifan Peng, Alberto Abad, Shinji Watanabe |
| 2025 | Exploring Pre-trained models on Ultrasound Modeling for Mice Autism Detection with Uniform Filter Bank and Attentive Scoring. Yuchen Song, Yucong Zhang, Ming Li |
| 2025 | Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR. Mingyu Cui, Yifan Yang, Jiajun Deng, Jiawen Kang, Shujie Hu, Tianzi Wang, Zhaoqing Li, Shiliang Zhang, Xie Chen, Xunying Liu |
| 2025 | Exploring Shared-Weight Mechanisms in Transformer and Conformer Architectures for Automatic Speech Recognition. Thomas Rolland, Alberto Abad |
| 2025 | Exploring auditory feedback mechanisms in speech recognition. Louise Coppieters de Gibson, Philip N. Garner |
| 2025 | Exploring the Effect of Segmentation and Vocabulary Size on Speech Tokenization for Speech Language Models. Shunsuke Kando, Yusuke Miyao, Shinnosuke Takamichi |
| 2025 | Exploring the Limits of Conformer CTC-Encoder for Speech Emotion Recognition using Large Language Models. Edmilson Da Silva Morais, Hagai Aronowitz, Aharon Satt, Ron Hoory, Avihu Dekel, Brian Kingsbury, George Saon |
| 2025 | Exploring the Power of Empirical Mode Decomposition for Sensing the Sound of Silence: A Pilot Study on Mice Autism Detection via Ultrasonic Vocalisation. Chenhao Wu, Xiangjun Cai, Haojie Zhang, Tianrui Jia, Yilu Deng, Kun Qian, Björn W. Schuller, Yoshiharu Yamamoto, Jiang Liu |
| 2025 | Extended High-frequency Cues to Phoneme Recognition: Insights from ASR. Zhe-chen Guo, Bharath Chandrasekaran |
| 2025 | Extended Loss: Incorporating Long Context into Training Models when using Short Audio Frames. Quang Minh Dinh, Hoda Rezaee Kaviani, Mehrdad Hosseinzadeh, Yuanhao Yu |
| 2025 | Extending the Fongbe to French Speech Translation Corpus: resources, models and benchmark. D. Fortune Kponou, Salima Mdhaffar, Fréjus A. A. Laleye, Eugène C. Ezin, Yannick Estève |
| 2025 | EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer. Jiarui Hai, Yong Xu, Hao Zhang, Chenxing Li, Helin Wang, Mounya Elhilali, Dong Yu |
| 2025 | FD-Bench: A Full-Duplex Benchmarking Pipeline Designed for Full Duplex Spoken Dialogue Systems. Yizhou Peng, Yi-Wen Chao, Dianwen Ng, Yukun Ma, Chongjia Ni, Bin Ma, Eng Siong Chng |
| 2025 | FFD: Fine-Finger Diffusion Model for Music to Fine-grained Finger Dance Generation. Boya Dong, Wentao Lei, Li Liu |
| 2025 | FLASepformer: Efficient Speech Separation with Gated Focused Linear Attention Transformer. Haoxu Wang, Yiheng Jiang, Gang Qiao, Pengteng Shi, Biao Tian |
| 2025 | FROST-EMA: Finnish and Russian Oral Speech Dataset of Electromagnetic Articulography Measurements with L1, L2 and Imitated L2 Accents. Satu Hopponen, Tomi Kinnunen, Alexandre Nikolaev, Rosa González Hautamäki, Lauri Tavi, Einar Meister |
| 2025 | FT-Boosted SV: Towards Noise Robust Speaker Verification for English Speaking Classroom Environments. Saba Tabatabaee, Jing Liu, Carol Y. Espy-Wilson |
| 2025 | FUSE-MOS: Fusion of Speech Embeddings for MOS Prediction with Uncertainty Quantification. Enjamamul Hoq, Nikhil Gupta, Danielle Omondi, Ifeoma Nwogu |
| 2025 | FUSE: Universal Speech Enhancement using Multi-Stage Fusion of Sparse Compression and Token Generation Models for the URGENT 2025 Challenge. Nabarun Goswami, Tatsuya Harada |
| 2025 | FaVC: A Validated, Transcribed, Parallel Farsi Speech Dataset for Voice Conversion. Mina Serajian, Saeed Najafzadeh Rahaghi, Hadi Veisi, Saman Haratizadeh |
| 2025 | Face2VoiceSync: Lightweight Face-Voice Consistency for Text-Driven Talking Face Generation. Fang Kang, Yin Cao, Haoyu Chen |
| 2025 | Facilitating Personalized TTS for Dysarthric Speakers Using Knowledge Anchoring and Curriculum Learning. Yejin Jeon, Solee Im, Youngjae Kim, Gary Geunbae Lee |
| 2025 | Fact-Controlled Diagnosis of Hallucinations in Medical Text Summarization. Suhas BN, Han-Chin Shing, Lei Xu, Mitch Strong, Jon Burnsky, Jessica Ofor, Jordan R. Mason, Susan Chen, Sundararajan Srinivasan, Chaitanya Shivade, Jack Moriarty, Joseph Paul Cohen |
| 2025 | Factorized RVQ-GAN For Disentangled Speech Tokenization. Sameer Khurana, Dominik Klement, Antoine Laurent, Dominik Bobos, Juraj Novosad, Peter Gazdik, Ellen Zhang, Zili Huang, Amir Hussein, Ricard Marxer, Yoshiki Masuyama, Ryo Aihara, Chiori Hori, François G. Germain, Gordon Wichern, Jonathan Le Roux |
| 2025 | Factors affecting the in-context learning abilities of LLMs for dialogue state tracking. Pradyoth Hegde, Santosh Kesiraju, Jan Svec, Simon Sedlácek, Bolaji Yusuf, Oldrich Plchot, Deepak K. T, Jan Cernocký |
| 2025 | FaiST: A Benchmark Dataset for Fairness in Speech Technology. Maliha Jahan, Yinglun Sun, Priyam Mazumdar, Zsuzsanna Fagyal, Thomas Thebaud, Jesús Villalba, Mark Hasegawa-Johnson, Najim Dehak, Laureano Moro-Velázquez |
| 2025 | FairASR: Fair Audio Contrastive Learning for Automatic Speech Recognition. Jongsuk Kim, Jaemyung Yu, Minchan Kwon, Junmo Kim |
| 2025 | Fairness in Dysarthric Speech Synthesis: Understanding Intrinsic Bias in Dysarthric Speech Cloning using F5-TTS. Anuprabha M, Krishna Gurugubelli, Anil Kumar Vuppala |
| 2025 | FasterVoiceGrad: Faster One-step Diffusion-Based Voice Conversion with Adversarial Diffusion Conversion Distillation. Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Yuto Kondo |
| 2025 | Feature Importance across Domains for Improving Non-Intrusive Speech Intelligibility Prediction in Hearing Aids. Ryandhimas E. Zezario, Sabato Marco Siniscalchi, Fei Chen, Hsin-Min Wang, Yu Tsao |
| 2025 | Federated Learning with Feature Space Separation for Speaker Recognition. Ying Meng, Zhihua Fang, Liang He |
| 2025 | Few-Shot Speech Deepfake Detection Adaptation with Gaussian Processes. Neta Glazer, David Chernin, Idan Achituve, Sharon Gannot, Ethan Fetaya |
| 2025 | Few-step Adversarial Schrödinger Bridge for Generative Speech Enhancement. Seungu Han, Sungho Lee, Juheon Lee, Kyogu Lee |
| 2025 | Fifteen Years of Child-Centered Long-Form Recordings: Promises, Resources, and Remaining Challenges to Validity. Loann Peurey, Marvin Lavechin, Tarek Kunze, Manel Khentout, Lucas Gautheron, Emmanuel Dupoux, Alejandrina Cristià |
| 2025 | Finding the Human Voice in AI: Insights on the Perception of AI-Voice Clones from Naturalness and Similarity Ratings. Linda Bakkouche, Charles McGhee, Emily Lau, Stephanie Cooper, Xinbing Luo, Madeleine Rees, Kai Alter, Brechtje Post, Julia Schwarz |
| 2025 | Fine-Tuning ASR for Stuttered Speech: Personalized vs. Generalized Approaches. Dena F. Mujtaba, Nihar R. Mahapatra |
| 2025 | Fine-Tuning Text-to-Speech Diffusion Models Using Reinforcement Learning with Human Feedback. Jingyi Chen, Ju-Seung Byun, Micha Elsner, Pichao Wang, Andrew Perrault |
| 2025 | Fine-tune Before Structured Pruning: Towards Compact and Accurate Self-Supervised Models for Speaker Diarization. Jiangyu Han, Federico Landini, Johan Rohdin, Anna Silnova, Mireia Díez, Jan Cernocký, Lukás Burget |
| 2025 | Fine-tuning Parakeet-TDT for Dysarthric Speech Recognition in the Speech Accessibility Project Challenge. Kaito Takahashi, Keigo Hojo, Toshimitsu Sakai, Yukoh Wakabayashi, Norihide Kitaoka |
| 2025 | Fine-tuning Strategies for Automatic Speech Recognition of Low-Resource Speech with Autism Spectrum Disorder. Yeseul Park, Bowon Lee |
| 2025 | Finetune Large Pre-Trained Model Based on Frequency-Wise Multi-Query Attention Pooling for Anomalous Sound Detection. Nan Jiang, Yan Song, Qing Gu, Haoyu Song, Lirong Dai, Ian McLoughlin |
| 2025 | First Analyze Then Enhance: A Task-Aware System for Speech Separation, Denoising, and Dereverberation. Shaoxiang Dang, Li Li, Shogo Seki, Hiroaki Kudo |
| 2025 | First Steps Towards Voice Anonymization for Code-Switching Speech. Sarina Meyer, Ekaterina Kolos, Ngoc Thang Vu |
| 2025 | Flexible VAD-PVAD Transition: A Detachable PVAD Module for Dynamic Encoder RNN VAD. En-Lun Yu, Chien-Chun Wang, Jeih-weih Hung, Shih-Chieh Huang, Berlin Chen |
| 2025 | FlowSE: Efficient and High-Quality Speech Enhancement via Flow Matching. Ziqian Wang, Zikai Liu, Xinfa Zhu, Yike Zhu, Mingshuai Liu, Jun Chen, Longshuai Xiao, Chao Weng, Lei Xie |
| 2025 | FlowTSE: Target Speaker Extraction with Flow Matching. Aviv Navon, Aviv Shamsian, Yael Segal-Feldman, Neta Glazer, Gil Hetz, Joseph Keshet |
| 2025 | Focal Modulation Network: A Novel Solution for Polyphonic Music Instrument Recognition without Attention and Aggregation Strategy. Lekshmi Chandrika Reghunath, Rajeev Rajan |
| 2025 | FoleyMaster: High-Quality Video-to-Audio Synthesis via MLLM-Augmented Prompt Tuning and Joint Semantic-Temporal Adaptation. Liming Liang, Luo Chen, Yuehan Jin, Xianwei Zhuang, Yuxin Xie, Yongkang Yin, Yuexian Zou |
| 2025 | Foundation Model Hidden Representations for Heart Rate Estimation from Auscultation. Jingping Nie, Tien Dung Tran, Karan Thakkar, Vasudha Kowtha, Jon Huang, Carlos Avendaño, Erdrin Azemi, Vikramjit Mitra |
| 2025 | FreeCodec: A Disentangled Neural Speech Codec with Fewer Tokens. Youqiang Zheng, Weiping Tu, Yueteng Kang, Jie Chen, Yike Zhang, Li Xiao, Yuhong Yang, Long Ma |
| 2025 | French Listening Tests for the Assessment of Intelligibility, Quality, and Identity of Body-Conducted Speech Enhancement. Thomas Joubaud, Julien Hauret, Véronique Zimpfer, Éric Bavu |
| 2025 | French schwa is not acoustically distinct from its two lexical neighbors /ø/ and /œ/. Mathilde Hutin, Mélanie Lancien, Noam Faust |
| 2025 | Frequency-Domain Enhanced Extreme Bandwidth Extension Network with ICCRN for Superior Speech Quality. Hongtao Bao, Xueliang Zhang |
| 2025 | From Context to Code-switching: Examining the Interplay of Language Proficiency and Multilingualism in Speech. Debasmita Bhattacharya, Aanya Tolat, Julia Hirschberg |
| 2025 | From KAN to GR-KAN: Advancing Speech Enhancement with KAN-Based Methodology. Haoyang Li, Yuchen Hu, Chen Chen, Sabato Marco Siniscalchi, Songting Liu, Eng Siong Chng |
| 2025 | From Pretraining to Performance: Benchmarking Self-Supervised Speech Models for Interspeech-25 SER Challenge. Drishya Uniyal, Vinayak Abrol |
| 2025 | From Scarcity to Sufficiency: Speech Recognition Pipeline for Zero-resource Language. Nikolay Karpov, Sofia Kostandian, Nune Tadevosyan, Alexan Ayrapetyan, Andrei Andrusenko, Ara Yeroyan, Mher Yerznkanyan, Vitaly Lavrukhin |
| 2025 | From Sharpness to Better Generalization for Speech Deepfake Detection. Wen Huang, Xuechen Liu, Xin Wang, Junichi Yamagishi, Yanmin Qian |
| 2025 | From Speech Science to Language Transparence. Alexander Waibel |
| 2025 | From Static to Dynamic: Enhancing AAC with Generative Imagery and Zero-Shot TTS. Juliana Francis, Joakim Gustafsson, Éva Székely |
| 2025 | From Talking and Listening Devices to Intelligent Communicative Machines. Roger K. Moore |
| 2025 | From Weak Labels to Strong Results: Utilizing 5, 000 Hours of Noisy Classroom Transcripts with Minimal Accurate Data. Ahmed Adel Attia, Dorottya Demszky, Jing Liu, Carol Y. Espy-Wilson |
| 2025 | From Words to Waves: Analyzing Concept Formation in Speech and Text-Based Foundation Models. Asim Ersoy, Basel Ahmad Mousi, Shammur Absar Chowdhury, Firoj Alam, Fahim Dalvi, Nadir Durrani |
| 2025 | Frozen Large Language Models Can Perceive Paralinguistic Aspects of Speech. Wonjune Kang, Junteng Jia, Chunyang Wu, Wei Zhou, Egor Lakomkin, Yashesh Gaur, Leda Sari, Suyoun Kim, Ke Li, Jay Mahadeokar, Ozlem Kalinli |
| 2025 | Fully End-to-end Streaming Open-vocabulary Keyword Spotting with W-CTC Forced Alignment. Dohyun Kim, Jiwook Hwang |
| 2025 | Fully Few-shot Class-incremental Audio Classification Using Multi-level Embedding Extractor and Ridge Regression Classifier. Yongjie Si, Yanxiong Li, Jiaxin Tan, Qianhua He, Il-Youp Kwak |
| 2025 | Functional Connectivity and Hilbert-Based Features for Covert Speech EEG Variability Analysis and Classification. Saravanakumar Duraisamy, Maurice Rekrut, Luis A. Leiva |
| 2025 | GALAXY: A Large-Scale Open-Domain Dataset for Multimodal Learning. Yihan Wu, Yichen Lu, Yijing Chen, Jiaqi Song, William Chen, Ruihua Song, Shinji Watanabe |
| 2025 | GIA-MIC: Multimodal Emotion Recognition with Gated Interactive Attention and Modality-Invariant Learning Constraints. Jiajun He, Jinyi Mi, Tomoki Toda |
| 2025 | GLCLAP: A Novel Contrastive Learning Pre-trained Model for Contextual Biasing in ASR. Yuxiang Kong, Fan Cui, Liyong Guo, Heinrich Dinkel, Lichun Fan, Junbo Zhang, Jian Luan |
| 2025 | GST-BERT-TTS: Prosody Prediction Without Accentual Labels For Multi-Speaker TTS Using BERT With Global Style Tokens. Tadashi Ogura, Takuma Okamoto, Yamato Ohtani, Erica Cooper, Tomoki Toda, Hisashi Kawai |
| 2025 | GTA: Towards Generative Text-To-Audio Retrieval via Multi-Scale Tokenizer. Minghui Fang, Shengpeng Ji, Jialong Zuo, Xize Cheng, Wenrui Liu, Xiaoda Yang, Ruofan Hu, Jieming Zhu, Zhou Zhao |
| 2025 | GTAnet: Geometry-Guided Temporal Attention for EEG-Based Sound Source Tracking in Cocktail Party Scenarios. Saurav Pahuja, Gabriel Ivucic, Siqi Cai, Dashanka De Silva, Haizhou Li, Tanja Schultz |
| 2025 | Gaze-Enhanced Multimodal Turn-Taking Prediction in Triadic Conversations. Seongsil Heo, Christi Miller, Calvin Murdock, Michael J. Proulx |
| 2025 | GenECA: A General-Purpose Framework for Real-Time Adaptive Multimodal Embodied Conversational Agents. Santosh V. Patapati, Aashrith Tatineni, Trisanth Srinivasan |
| 2025 | Generalizable Audio Deepfake Detection via Hierarchical Structure Learning and Feature Whitening in Poincaré sphere. Mingru Yang, Yanmei Gu, Qianhua He, Yanxiong Li, Peirong Zhang, Yongqiang Chen, Zhiming Wang, Huijia Zhu, Jian Liu, Weiqiang Wang |
| 2025 | Generalizable Audio Spoofing Detection using Non-Semantic Representations. Arnab Das, Yassine El Kheir, Carlos Franzreb, Tim Herzig, Tim Polzehl, Sebastian Möller |
| 2025 | Generating Consistent Prosodic Patterns from Open-Source TTS Systems. Ha Eun Shim, Olivia Yung, Paige Tuttösí, Boey Kwan, Angelica Lim, Yue Wang, H. Henny Yeung |
| 2025 | GigaAM: Efficient Self-Supervised Learner for Speech Recognition. Aleksandr Kutsakov, Alexandr Maximenko, Georgii Gospodinov, Pavel Bogomolov, Fyodor Minkin |
| 2025 | GoP2Vec: A few shot learning for pronunciation assessment with goodness of pronunciation (GoP) based representations from an i-vector framework and augmentation. Meenakshi Sirigiraju, Chiranjeevi Yarra |
| 2025 | Gradual modeling of the Lombard effect by modifying speaker embeddings from a Text-To-Speech model. Thiago Henrique Gomes Lobato, Magnus Schäfer |
| 2025 | Grammatical Error Detection on Spontaneous Children's Speech Using Iterative Pseudo Labeling. Christopher Gebauer, Lars Rumberg, Lars Köhn, Hanna Ehlert, Edith Beaulac, Jörn Ostermann |
| 2025 | Granary: Speech Recognition and Translation Dataset in 25 European Languages. Nithin Rao Koluguri, Monica Sekoyan, George Zelenfroynd, Sasha Meister, Shuoyang Ding, Sofia Kostandian, He Huang, Nikolay Karpov, Jagadeesh Balam, Vitaly Lavrukhin, Yifan Peng, Sara Papi, Marco Gaido, Alessio Brutti, Boris Ginsburg |
| 2025 | Grapheme-Coherent Phonemic and Prosodic Annotation of Speech by Implicit and Explicit Grapheme Conditioning. Hien Ohnaka, Yuma Shirahata, Byeongseon Park, Ryuichi Yamamoto |
| 2025 | GraphemeAug: A Systematic Approach to Synthesized Hard Negative Keyword Spotting Examples. Harry Zhang, Kurt Partridge, Pai Zhu, Neng Chen, Hyun Jin Park, Dhruuv Agarwal, Quan Wang |
| 2025 | H-QuEST: Accelerating Query-by-Example Spoken Term Detection with Hierarchical Indexing. Akanksha Singh, Yi-Ping Phoebe Chen, Vipul Arora |
| 2025 | HASRD: Hierarchical Acoustic and Semantic Representation Disentanglement. Amir Hussein, Sameer Khurana, Gordon Wichern, François G. Germain, Jonathan Le Roux |
| 2025 | HK-GenSpeech: A Generative AI Scene Creation Framework for Speech Based Cognitive Assessment. Vi Jun Sean Yong, Serkan Kumyol, Pau Le Lisa Low, Winnie Suk Wai Leung, Tristan Braud |
| 2025 | HWB-Net: A Novel High-Performance and Efficient Hybrid Waveform Bandwidth Extension Method. Xin Liu, Shulin He, Xueliang Zhang |
| 2025 | HYFuse: Aligning Heterogeneous Speech Pre-Trained Representations in Hyperbolic Space for Speech Emotion Recognition. Orchid Chetia Phukan, Girish, Mohd Mujtaba Akhtar, Swarup Ranjan Behera, Pailla Balakrishna Reddy, Arun Balaji Buduru, Rajesh Sharma |
| 2025 | Harnessing Text-to-Speech Voice Cloning Models for Improved Audiological Speech Assessment. Lidea Shahidi, Erdem Baha Topbas, Thu Ngan Dang, Tobias Goehring |
| 2025 | Hear Me Out: Interactive evaluation and bias discovery platform for speech-to-speech conversational AI. Shree Harsha Bokkahalli Satish, Gustav Eje Henter, Éva Székely |
| 2025 | Hearing deficits of transformer-based ASR for anechoic and spatial signals. Dirk Eike Hoffner, Simon Weihe, Thomas Brand, Bernd T. Meyer |
| 2025 | Hearing from Silence: Reasoning Audio Descriptions from Silent Videos via Vision-Language Model. Yong Ren, Chenxing Li, Le Xu, Hao Gu, Duzhen Zhang, Yujie Chen, Manjie Xu, Ruibo Fu, Shan Yang, Dong Yu |
| 2025 | Heart Rate as a Proxy Measure to Assess Human Confidence in Spoken Speech. Harish Battula, Gauri Deshpande, Yagna Gudipalli, Sachin Patel |
| 2025 | HiFiTTS-2: A Large-Scale High Bandwidth Speech Dataset. Ryan Langman, Xuesong Yang, Paarth Neekhara, Shehzeen Hussain, Edresson Casanova, Evelina Bakhturina, Jason Li |
| 2025 | How do both phonological and syntactic complexity influence speech planning? Ivan Yuen, Katherine Demuth, Stefanie Shattuck-Hufnagel |
| 2025 | How sibilant spectra shape gender perception in prepubertal children: A voice morphing study. Riccarda Funk, Melanie Weirich, Adrian P. Simpson |
| 2025 | How to Connect Speech Foundation Models and Large Language Models? What Matters and What Does Not. Francesco Verdini, Pierfrancesco Melucci, Stefano Perna, Francesco Cariaggi, Marco Gaido, Sara Papi, Szymon Mazurek, Marek Kasztelnik, Luisa Bentivogli, Sébastien Bratières, Paolo Merialdo, Simone Scardapane |
| 2025 | How to Recover Long Audio Sequences Through Gradient Inversion Attack With Dynamic Segment-based Reconstruction. Xijie Zeng, Frank Rudzicz |
| 2025 | HuBERT-VIC: Improving Noise-Robust Automatic Speech Recognition of Speech Foundation Model via Variance-Invariance-Covariance Regularization. Hyebin Ahn, Kangwook Jang, Hoirin Kim |
| 2025 | Hybrid Data Sampling for ASR: Integrating Acoustic Diversity and Transcription Uncertainty. Komei Hiruta, Yosuke Yamano, Hideaki Tamori |
| 2025 | Hybrid Expert Knowledge and Self-Supervised Learning for Diagnostic Modeling of Adductor Spasmodic and Primary Myotonic Dysphonia. Zhou Du, Hang Chen, Huijun Ding, Jun Du, Zhen Chen |
| 2025 | Hybrid HMM-SVM classifier using frication-based features for detection of non-normative sibilant articulation patterns in Polish children's speech. Zuzanna Miodonska |
| 2025 | I want a horror - comedy - movie: Slips-of-the-Tongue Impact Conversational Recommender System Performance. Maria Teleki, Lingfeng Shi, Chengkai Liu, James Caverlee |
| 2025 | IDIR: Identifying and Distilling Informative Relations for Speaker Verification. Chong-Xin Gan, Zhe Li, Zezhong Jin, Zilong Huang, Man-Wai Mak, Kong Aik Lee |
| 2025 | Identification of Pathological Pronunciation Profiles in ASR Transcription Errors. Margot Masson, Isabelle Ferrané, Julie Mauclair |
| 2025 | Identifying Primary Stress Across Related Languages and Dialects with Transformer-based Speech Encoder Models. Nikola Ljubesic, Ivan Porupski, Peter Rupnik |
| 2025 | Identifying Vocal and Facial Biomarkers of Depression in Large-Scale Remote Recordings: A Multimodal Study Using Mixed-Effects Modeling. Nelson Hidalgo Julia, Robert Lewis, Craig Ferguson, Simon Goldberg, Wendy Lau, Caroline Swords, Gabriela Valdivia, Christine D. Wilson-Mendenhall, Raquel Tartar, Rosalind W. Picard, Richard Davidson |
| 2025 | Impact of Background Noise on Turn-Taking Dynamics in Triadic Conversations. Valeska Slomianka, Tobias May, Torsten Dau |
| 2025 | Improved Intelligibility of Dysarthric Speech using Conditional Flow Matching. Shoutrik Das, Nishant Singh, Arjun Gangwar, S. Umesh |
| 2025 | Improving Audio Classification by Transitioning from Zero- to Few-Shot. James Taylor, Wolfgang Mack |
| 2025 | Improving Automatic Speech Recognition for Children's Reading Assessment with Disfluency-aware Language Models. Jazmín Vidal, Luciana Ferrer, Juan Esteban Kamienkowski, Pablo Riera |
| 2025 | Improving Bird Classification with Primary Color Additives. Ezhini Rasendiran R, Chandresh Kumar Maurya |
| 2025 | Improving Child Speech Recognition and Reading Mistake Detection by Using Prompts. Lingyun Gao, Cristian Tejedor García, Catia Cucchiarini, Helmer Strik |
| 2025 | Improving Cross-Attention based on Positional Alignment during Inference for Robust Long-form Speech Recognition. Changhan Oh, Kiyoung Park, Jeom-ja Kang, Woo Yong Choi, Hwa Jeon Song |
| 2025 | Improving End-to-end Mixed-case ASR with Knowledge Distillation and Integration of Voice Activity Cues. Sashi Novitasari, Takashi Fukuda, Gakuto Kurata |
| 2025 | Improving Generalization of End-to-End ASR through Diversity and Independence Regularization. Ye-Eun Ko, Mun-Hak Lee, Dong-Hyun Kim, Joon-Hyuk Chang |
| 2025 | Improving Linguistic Diversity of Large Language Models with Possibility Exploration Fine-Tuning. Long Mai, Julie Carson-Berndsen |
| 2025 | Improving Low-Resource Dialect Classification Using Retrieval-based Voice Conversion. Lea Fischbach, Akbar Karimi, Caroline Kleen, Alfred Lameli, Lucie Flek |
| 2025 | Improving Multilingual Speech Models on ML-SUPERB 2.0: Fine-tuning with Data Augmentation and LID-Aware CTC. Qingzheng Wang, Jiancheng Sun, Yifan Peng, Shinji Watanabe |
| 2025 | Improving Noise Robustness of LLM-based Zero-shot TTS via Discrete Acoustic Token Denoising. Ye-Xin Lu, Hui-Peng Du, Fei Liu, Yang Ai, Zhen-Hua Ling |
| 2025 | Improving Practical Aspects of End-to-End Multi-Talker Speech Recognition for Online and Offline Scenarios. Aswin Shanmugam Subramanian, Amit Das, Naoyuki Kanda, Jinyu Li, Xiaofei Wang, Yifan Gong |
| 2025 | Improving Respiratory Sound Classification with Architecture-Agnostic Knowledge Distillation from Ensembles. Miika Toikkanen, June-Woo Kim |
| 2025 | Improving Speech Emotion Recognition Through Cross Modal Attention Alignment and Balanced Stacking Model. Lucas H. Ueda, João Lima, Leonardo Marques, Paula Dornhofer Paro Costa |
| 2025 | Improving Synthetic Data Training for Contextual Biasing Models with a Keyword-Aware Cost Function. Kwok Chin Yuen, Jia Qi Yip, Eng Siong Chng |
| 2025 | Improving User Impression of Spoken Dialogue Systems by Controlling Para-linguistic Expression Based on Intimacy. Shoki Kawanishi, Akinori Ito, Yuya Chiba, Takashi Nose |
| 2025 | In This Environment, As That Speaker: A Text-Driven Framework for Multi-Attribute Speech Conversion. Jiawei Jin, Zhihan Yang, Yixuan Zhou, Zhiyong Wu |
| 2025 | In-context Language Learning for Endangered Languages in Speech Recognition. Zhaolin Li, Jan Niehues |
| 2025 | In-context learning capabilities of Large Language Models to detect suicide risk among adolescents from speech transcripts. Filomene Roquefort, Alexandre Ducorroy, Rachid Riad |
| 2025 | Incorporating Linguistic Constraints from External Knowledge Source for Audio-Visual Target Speech Extraction. Wenxuan Wu, Shuai Wang, Xixin Wu, Helen Meng, Haizhou Li |
| 2025 | Individualized speech enhancement for hearing-impaired listeners. Chuan Wen, Sarah Verhulst |
| 2025 | Infant Cry Emotion Recognition Using Improved ECAPA-TDNN with Multi-scale Feature Fusion and Attention Enhancement. Junyu Zhou, Yanxiong Li, Haolin Yu |
| 2025 | InfiniteAudio: Infinite-Length Audio Generation with Consistency. Chaeyoung Jung, Hojoon Ki, Ji-Hoon Kim, Junmo Kim, Joon Son Chung |
| 2025 | Influence of Proficiency and L2 Experience on Dynamic Spectral Cue Utilization in L2 Vowel Perception and Production. Linda Bakkouche, Brechtje Post |
| 2025 | Influence of Room Acoustics on Objective Voice Assessment Methods in the Context of Speech and Language Therapy. Sven Franz, Tanja Grewe, Bernd T. Meyer, Jörg Bitzer |
| 2025 | Influence of wall coverings of 3D-printed vocal tract models on measured transfer functions. Peter Birkholz, Dominik Schäfer, Patrick Häsner, Jihyeon Yun, Iris Kruppke, Rémi Blandin |
| 2025 | Instantaneous changes in acoustic signals reflect syllable progression and cross-linguistic syllable variation. Haley Hsu, Dani Byrd, Khalil Iskarous, Louis Goldstein |
| 2025 | Intelligibility Prediction for Time-Modified Speech Signals Using Spectro-Temporal Modulation Features. Aymen Bashir, Haolan Wang, Amin Edraki, Wai-Yip Chan, Jesper Jensen |
| 2025 | Intelligibility of Text-to-Speech Systems for Mathematical Expressions. Sujoy Roychowdhury, Ranjani H. G., Sumit Soman, Nishtha Paul, Subhadip Bandyopadhyay, Siddhanth Iyengar |
| 2025 | Inter-Speaker Relative Cues for Text-Guided Target Speech Extraction. Wang Dai, Archontis Politis, Tuomas Virtanen |
| 2025 | Interactive Fusion of Multi-View Speech Embeddings via Pretrained Large-Scale Speech Models for Speech Emotional Attribute Prediction in Naturalistic Conditions. Yuyun Liu, Yujia Gu, Jiahao Luo, Wenming Zheng, Cheng Lu, Yuan Zong |
| 2025 | Interspeech 2025 URGENT Speech Enhancement Challenge. Kohei Saijo, Wangyou Zhang, Samuele Cornell, Robin Scheibler, Chenda Li, Zhaoheng Ni, Anurag Kumar, Marvin Sach, Yihui Fu, Wei Wang, Tim Fingscheidt, Shinji Watanabe |
| 2025 | Intrasentential English in Swedish TTS: perceived English-accentedness. Christina Tånnander, David House, Jonas Beskow, Jens Edlund |
| 2025 | Introducing EMOPARKNZ: the Emotional Speech Database from New Zealand English Speakers with Parkinson's Disease. Itay Ben-Dom, Catherine I. Watson, Clare M. McCann |
| 2025 | Investigating Affect Mining Techniques for Annotation Sample Selection in the Creation of Finnish Affective Speech Corpus. Kalle Lahtinen, Einari Vaaras, Liisa Mustanoja, Okko Räsänen |
| 2025 | Investigating Gender Bias in Text-to-Audio Generation Models. Aarish Shah Mohsin, Mohammad Nadeem, Shahab Saquib Sohail, Tughrul Arslan, Mandar Gogate, Nasir Saleem, Amir Hussain |
| 2025 | Investigating Glottal Stop Coda Loss During Sound Change of Checked Syllables Based on Speech-EGG Voice Offset Alignment. Bingliang Zhao, Xiyu Wu |
| 2025 | Investigating Stochastic Methods for Prosody Modeling in Speech Synthesis. Paul Mayer, Florian Lux, Alejandro Pérez González de Martos, Angelina Elizarova, Lindsey Vanderlyn, Dirk Väth, Ngoc Thang Vu |
| 2025 | Investigating continuous autoregressive generative speech enhancement. Haici Yang, Gordon Wichern, Ryo Aihara, Yoshiki Masuyama, Sameer Khurana, François G. Germain, Jonathan Le Roux |
| 2025 | Investigating effects of sex hormones, cycle phases and age on female fundamental frequency. Melanie Weirich, Adrian P. Simpson |
| 2025 | Investigating the Impact of Word Informativeness on Speech Emotion Recognition. Sofoklis Kakouros |
| 2025 | Investigating the Reasonable Effectiveness of Speaker Pre-Trained Models and their Synergistic Power for SingMOS Prediction. Orchid Chetia Phukan, Girish, Mohd Mujtaba Akhtar, Swarup Ranjan Behera, Pailla Balakrishna Reddy, Arun Balaji Buduru, Rajesh Sharma |
| 2025 | Investigating the Reasoning Abilities of Large Language Models for Understanding Spoken Language in Interpersonal Interactions. Pranjal Aggarwal, Ghritachi Mahajani, Pavan Kumar Malasani, Vaibhav Jamadagni, Caroline J. Wendt, Ehsanul Haque Nirjhar, Theodora Chaspari |
| 2025 | Is Synthetic Data Truly Effective for Training Speech Language Models? Tomoya Mizumoto, Atsushi Kojima, Yusuke Fujita, Lianbo Liu, Yui Sudo |
| 2025 | Is it all about race?: A Cross-examination of /s/ in a Multilingual (Nigerian) Context. Oluwasegun Amoniyan |
| 2025 | Is your model big enough? Training and interpreting large-scale monolingual speech foundation models. Yaroslav Getman, Tamás Grósz, Tommi Lehtonen, Mikko Kurimo |
| 2025 | Iterative Refinement, Not Training Objective, Makes HuBERT Behave Differently from wav2vec 2.0. Robin Huo, Ewan Dunbar |
| 2025 | J-SPAW: Japanese speaker verification and spoofing attacks recorded in-the-wild dataset. Sayaka Shiota, Suzuka Horie, Kouta Kanno, Shinnosuke Takamichi |
| 2025 | J-j-j-just Stutter: Benchmarking Whisper's Performance Disparities on Different Stuttering Patterns. Charan Sridhar, Shaomei Wu |
| 2025 | JIS: A Speech Corpus of Japanese Idol Speakers with Various Speaking Styles. Yuto Kondo, Hirokazu Kameoka, Kou Tanaka, Takuhiro Kaneko |
| 2025 | Joint Rate Allocation and Sensor Selection for Speech Enhancement in Wireless Acoustic Sensor Networks. De Hu, Qilong Li |
| 2025 | Joint Reference Microphone Selection and Filter Order Determination in Multi-channel Active Noise Control. De Hu, Shuyao Liu, Yanrong He |
| 2025 | Joint Target-Speaker ASR and Activity Detection. Chikara Maeda, Muhammad Shakeel, Yui Sudo |
| 2025 | Jointly Improving Dialect Identification and ASR in Indian Languages using Multimodal Feature Fusion. Saurabh Kumar, Amartyaveer, Prasanta Kumar Ghosh |
| 2025 | Kinship in Speech: Leveraging Linguistic Relatedness for Zero-Shot TTS in Indian Languages. Utkarsh Pathak, Chandra Sai Krishna Gunda, Anusha Prakash, Keshav Agarwal, Hema A. Murthy |
| 2025 | Knowledge Distillation Method for Pruned RNN-T Models via Pruning Bounds Sharing and Losses Confusion. Xiaocan Zhang, Weiwei Jiang, Guibin Zheng, Chenhao Jing, Jiqing Han, Tieran Zheng |
| 2025 | L3C-DeepMFC: Low-Latency Low-Complexity Deep Marginal Feedback Cancellation with Closed-Loop Fine Tuning for Hearing Aids. Fengyuan Hao, Brian C. J. Moore, Huiyong Zhang, Xiaodong Li, Chengshi Zheng |
| 2025 | LASPA: Language Agnostic Speaker Disentanglement with Prefix-Tuned Cross-Attention. Aditya Srinivas Menon, Raj Prakash Gohil, Kumud Tripathi, Pankaj Wasnik |
| 2025 | LATE: Open Source Toolkit for Latvian and Latgalian Speech Transcription. Arturs Znotins, Didzis Gosko, Normunds Gruzitis |
| 2025 | LHCP-ASR: An English Speech Corpus of High-Energy Particle Physics Talks for Narrow-Domain ASR Benchmarking. Jaume Santamaria-Jorda, Pablo Segovia-Martínez, Gonçal V. Garcés Díaz-Munío, Joan Albert Silvestre-Cerdà, Adrià Giménez, Rubén Gaspar Aparicio, René Fernández Sánchez, Jorge Civera, Albert Sanchís, Alfons Juan |
| 2025 | LID Models are Actually Accent Classifiers: Implications and Solutions for LID on Accented Speech. Niyati Bafna, Matthew Wiesner |
| 2025 | LIST: Language-Independent Speech Token for Multilingual Speech Synthesis with Language Models. Chang Liu, Zhen-Hua Ling, Yu Gu |
| 2025 | LLM-Synth4KWS: Scalable Automatic Generation and Synthesis of Confusable Data for Custom Keyword Spotting. Pai Zhu, Quan Wang, Dhruuv Agarwal, Kurt Partridge |
| 2025 | LLM-based Generative Error Correction for Rare Words with Synthetic Data and Phonetic Context. Natsuo Yamashita, Masaaki Yamamoto, Hiroaki Kokubo, Yohei Kawaguchi |
| 2025 | LLM-based phoneme-to-grapheme for phoneme-based speech recognition. Te Ma, Min Bi, Saierdaer Yusuyin, Hao Huang, Zhijian Ou |
| 2025 | LRBA: Stealthy Backdoor Attacks on Speech Classification via Latent Rearrangement in VITS. Zexin Li, Wenhan Yao, Ye Xiao, Jinsu Yang, Fen Xiao, Weiping Wen |
| 2025 | LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec. Yiwei Guo, Zhihan Li, Chenpeng Du, Hankun Wang, Xie Chen, Kai Yu |
| 2025 | LSPnet: an ultra-low bitrate hybrid neural codec. Bowen Zhang, Ian McLoughlin, Xiaoxiao Miao, A. S. Madhukumar |
| 2025 | Label Semantic-Driven Contrastive Learning for Speech Emotion Recognition. Jiaxi Hu, Leyuan Qu, Haoxun Li, Taihao Li |
| 2025 | Label-Context-Dependent Internal Language Model Estimation for CTC. Zijian Yang, Minh-Nghia Phan, Ralf Schlüter, Hermann Ney |
| 2025 | Language and Accent Familiarity Effects on the Use of Acoustic Cues in Talker Identification. Shengyue Xiong, Zhe-chen Guo, Bharath Chandrasekaran |
| 2025 | Language-Agnostic Speech Tokenizer for Spoken Term Detection with Efficient Retrieval. Anup Singh, Kris Demuynck, Vipul Arora |
| 2025 | Language-Agnostic Suicidal Risk Detection Using Large Language Models. June-Woo Kim, Wonkyo Oh, Haram Yoon, Sung-Hoon Yoon, Dae-Jin Kim, Dong-Ho Lee, Sang-Yeol Lee, Chan-Mo Yang |
| 2025 | Language-Aware Prompt Tuning for Parameter-Efficient Seamless Language Expansion in Multilingual ASR. Hongli Yang, Sheng Li, Hao Huang, Ayiduosi Tuohan, Yizhou Peng |
| 2025 | Language-Guided Contrastive Audio-Visual Masked Autoencoder with Automatically Generated Audio-Visual-Text Triplets from Videos. Yuchi Ishikawa, Shota Nakada, Hokuto Munakata, Kazuhiro Saito, Tatsuya Komatsu, Yoshimitsu Aoki |
| 2025 | Large Language Models based ASR Error Correction for Child Conversations. Anfeng Xu, Tiantian Feng, So Hyun Kim, Somer Bishop, Catherine Lord, Shrikanth Narayanan |
| 2025 | Lateral Channel Formation in Australian English /l/: Insights from Magnetic Resonance Imaging. Tünde Szalay, Michael Proctor, Amelia Gully, Tharinda Piyadasa, Craig T. Jin, David Waddington, Naeim Sanaei, Sheryl Foster, Kirrie J. Ballard |
| 2025 | Layer-Wise Decision Fusion for Fake Audio Detection Using XLS-R. Yixuan Xiao, Ngoc Thang Vu |
| 2025 | Learning More with Less: Self-Supervised Approaches forLow-Resource Speech Emotion Recognition. Ziwei Gong, Pengyuan Shi, Kaan Donbekci, Lin Ai, Run Chen, David Sasu, Zehui Wu, Julia Hirschberg |
| 2025 | Learning Optimal Prosody Embedding Codebook based on F0 and Energy. David Portes, Ales Horák |
| 2025 | Learning Phonetic Context-Dependent Viseme for Enhancing Speech-Driven 3D Facial Animation. Hyung Kyu Kim, Hak Gu Kim |
| 2025 | Legally validated evaluation framework for voice anonymization. Nathalie Vauquier, Brij Mohan Lal Srivastava, Seyed Ahmad Hosseini, Emmanuel Vincent |
| 2025 | Length Aware Speech Translation for Video Dubbing. Aswin Shanmugam Subramanian, Harveen Singh Chadha, Vikas Joshi, Shubham Bansal, Jian Xue, Rupeshkumar Mehta, Jinyu Li |
| 2025 | Lessons Learned from the URGENT 2024 Speech Enhancement Challenge. Wangyou Zhang, Kohei Saijo, Samuele Cornell, Robin Scheibler, Chenda Li, Zhaoheng Ni, Anurag Kumar, Marvin Sach, Wei Wang, Yihui Fu, Shinji Watanabe, Tim Fingscheidt, Yanmin Qian |
| 2025 | Lessons Learnt: Revisit Key Training Strategies for Effective Speech Emotion Recognition in the Wild. Jing-Tong Tzeng, Bo-Hao Su, Ya-Tse Wu, Hsing-Hang Chou, Chi-Chun Lee |
| 2025 | Leveraging AM and FM Rhythm Spectrograms for Dementia Classification and Assessment. Parismita Gogoi, Vishwanath Pratap Singh, Seema Khadirnaikar, Soma Siddhartha, Sishir Kalita, Jagabandhu Mishra, Md. Sahidullah, Priyankoo Sarmah, S. R. M. Prasanna |
| 2025 | Leveraging Cascaded Binary Classification and Multimodal Fusion for Dementia Detection through Spontaneous Speech. Yin-Long Liu, Yuanchao Li, Rui Feng, Liu He, Jia-Xin Chen, Yi-Ming Wang, Yu-Ang Chen, Yan-Han Peng, Jia-Hong Yuan, Zhen-Hua Ling |
| 2025 | Leveraging Geographic Metadata for Dialect-Aware Speech Recognition. Pouya Mehralian, Hugo Van hamme |
| 2025 | Leveraging Information Retrieval to Enhance Spoken Language Understanding Prompts in Few-Shot Learning. Pierre Lepagnol, Sahar Ghannay, Thomas Gerald, Christophe Servan, Sophie Rosset |
| 2025 | Leveraging LLM and Self-Supervised Training Models for Speech Recognition in Chinese Dialects: A Comparative Analysis. Tianyi Xu, Hongjie Chen, Qing Wang, Hang Lv, Jian Kang, Jie Li, Zhennan Lin, Yongxiang Li, Lei Xie |
| 2025 | Leveraging LLM for Stuttering Speech: A Unified Architecture Bridging Recognition and Event Detection. Shangkun Huang, Jing Deng, Jintao Kang, Rong Zheng |
| 2025 | Leveraging LLMs for Written to Spoken Style Data Transformation to Enhance Spoken Dialog State Tracking. Haris Gulzar, Monikka Roslianna Busto, Akiko Masaki, Takeharu Eda, Ryo Masumura |
| 2025 | Leveraging Large Language Models for Sarcastic Speech Annotation in Sarcasm Detection. Zhu Li, Yuqing Zhang, Xiyuan Gao, Shekhar Nayak, Matt Coler |
| 2025 | Leveraging Large Language Models for Spontaneous Speech-Based Suicide Risk Detection. Yifan Gao, Jiao Fu, Long Guo, Hong Liu |
| 2025 | Leveraging Multi-Level Features of ATST with Conformer-Based Dual-Branch Network for Sound Event Detection. Lipeng Dai, Qing Wang, Jie Zhang, Shengyu Peng, Yu Guan, Wu Guo |
| 2025 | Leveraging Ordinal Information for Speech-based Depression Classification. Lishi Zuo, Man-Wai Mak |
| 2025 | Leveraging SSL Speech Features and Mamba for Enhanced DeepFake Detection. Hoan My Tran, Damien Lolive, David Guennec, Aghilas Sini, Arnaud Delhay, Pierre-François Marteau |
| 2025 | Leveraging Self-Supervised Learning Based Speaker Diarization for MISP 2025 AVSD Challenge. Zeyan Song, Tianchi Sun, Ronghui Hu, Kai Chen, Jing Lu |
| 2025 | Leveraging Text and Speech Processing for Suicide Risk Classification in Chinese Adolescents. Justyna Krzywdziak, Bartlomiej Eljasiak, Joanna Stepien, Michal Swiatek, Agnieszka Pruszek |
| 2025 | Leveraging Unlabeled Audio for Audio-Text Contrastive Learning via Audio-Composed Text Features. Tatsuya Komatsu, Hokuto Munakata, Yuchi Ishikawa |
| 2025 | Leveraging Unlabeled Audio-Visual Data in Speech Emotion Recognition using Knowledge Distillation. Varsha Pendyala, Pedro Morgado, William A. Sethares |
| 2025 | Lexical competition in the process of Cantonese tone merging: Diverse Impact Mechanisms Across Different Individuals and Tone Pairs. Lishan Li, Yaolin Zhou, Xiaoying Xu |
| 2025 | Lexical stress affects lenition: The case of Italian palato-alveolar affricates. Bowei Shao, Philipp Buech, Anne Hermes, Maria Giavazzi |
| 2025 | LiRI Corpus Platform: Demonstration of a Web-Based Infrastructure for Multimodal Corpus Analysis. Teodora Vukovic, Jérémy Zehr, Jonathan Schaber, Igor Mustac, Nikolina Rajovic, Daniel McDonald, Johannes Graën, Noah Bubenhofer |
| 2025 | LiSTEN: Learning Soft Token Embeddings for Neural Audio LLMs. Pooneh Mousavi, Shubham Gupta, Cem Subakan, Mirco Ravanelli |
| 2025 | LightL2S: Ultra-Low Complexity Lip-to-Speech Synthesis for Multi-Speaker Scenarios. Yifan Liang, Kang Yang, Fangkun Liu, Andong Li, Xiaodong Li, Chengshi Zheng |
| 2025 | Lightweight Front-end Enhancement for Robust ASR via Frame Resampling and Sub-Band Pruning. Siyi Zhao, Wei Wang, Yanmin Qian |
| 2025 | Lightweight Speech Enhancement Model Based on Harmonic Attention and Phase Estimation with Skin-Attachable Accelerometer. Yonghun Song, Yeeun Kim, Yoonyoung Chung |
| 2025 | Lightweight Speech Enhancement for Mandarin Esophageal Speech. Jia-Jyu Su, Yen-Ting Lin, Wu-Hao Li, Chao-Kai Chang, Yan-Zhi Chen, Chen-Yu Chiang |
| 2025 | Lightweight and Robust Multi-Channel End-to-End Speech Recognition with Spherical Harmonic Transform. Xiangzhu Kong, Hao Huang, Zhijian Ou |
| 2025 | LinearVC: Linear Transformations of Self-Supervised Features Through the Lens of Voice Conversion. Herman Kamper, Benjamin van Niekerk, Julian Zaïdi, Marc-André Carbonneau |
| 2025 | Linguistic Masking and Its Release in Simulated Electric-acoustic Hearing. Yuting Ding, Xuefei Wang, Fei Chen |
| 2025 | Listen through the Sound: Generative Speech Restoration Leveraging Acoustic Context Representation. Soo-Whan Chung, Min-Seok Choi |
| 2025 | Listen, Analyze, and Adapt to Learn New Attacks: An Exemplar-Free Class Incremental Learning Method for Audio Deepfake Source Tracing. Yang Xiao, Rohan Kumar Das |
| 2025 | LitMAS: A Lightweight and Generalized Multi-Modal Anti-Spoofing Framework for Biometric Security. Nidheesh Gorthi, Kartik Thakral, Rishabh Ranjan, Richa Singh, Mayank Vatsa |
| 2025 | Location-Aware Target Speaker Extraction for Hearing Aids. Daniel-José Alcala Padilla, Nils L. Westhausen, Swati Vivekananthan, Bernd T. Meyer |
| 2025 | LombardTokenizer: Disentanglement and Control of Vocal Effort in a Neural Speech Codec. Maxime Jacquelin, Maëva Garnier, Laurent Girin, Rémy Vincent, Olivier Perrotin |
| 2025 | Long-Context Speech Synthesis with Context-Aware Memory. Zhipeng Li, Xiaofen Xing, Jingyuan Xing, Hangrui Hu, Heng Lu, Xiangmin Xu |
| 2025 | Loquacious Set: 25, 000 Hours of Transcribed and Diverse English Speech Recognition Data for Research and Commercial Use. Titouan Parcollet, Yuan Tseng, Shucong Zhang, Rogier C. van Dalen |
| 2025 | Low Complex IIR Adaptive Hear-Through Ambient Filtering for Overcoming Practical Constraints in Earbuds. Rishabh Gupta, MLNS Karthik, Yughendaran Palanivel |
| 2025 | M3L: A Multi-Modal and Multi-Lingual Depression Detection Framework. Jiajun You, Shuai Wang, Xun Gong, Xiang Wan |
| 2025 | MADUV: The 1st INTERSPEECH Mice Autism Detection via Ultrasound Vocalization Challenge. Zijiang Yang, Meishu Song, Xin Jing, Haojie Zhang, Kun Qian, Bin Hu, Kota Tamada, Toru Takumi, Björn W. Schuller, Yoshiharu Yamamoto |
| 2025 | MASV: Speaker Verification with Global and Local Context Mamba. Yang Liu, Li Wan, Yiteng Huang, Ming Sun, Xinhao Mei, Xubo Liu, Yangyang Shi, Florian Metze |
| 2025 | MATER: Multi-level Acoustic and Textual Emotion Representation for Interpretable Speech Emotion Recognition. Hyo Jin Jon, Longbin Jin, Hyuntaek Jung, Hyunseo Kim, Donghun Min, Eun Yi Kim |
| 2025 | MDDM: A Multi-view Discriminative Enhanced Diffusion-based Model for Speech Enhancement. Nan Xu, Zhaolong Huang, Xiaonan Zhi |
| 2025 | MFLA: Monotonic Finite Look-ahead Attention for Streaming Speech Recognition. Yinfeng Xia, Huiyan Li, Chenyang Le, Manhong Wang, Yutao Sun, Xingyang Ma, Yanmin Qian |
| 2025 | MIKU-PAL: An Automated and Standardized Multimodal Method for Speech Paralinguistic and Affect Labeling. Yifan Cheng, Ruoyi Zhang, Jiatong Shi |
| 2025 | MM-MovieDubber: Towards Multi-Modal Learning for Multi-Modal Movie Dubbing. Junjie Zheng, Zihao Chen, Chaofan Ding, Yunming Liang, Yihan Fan, Huan Yang, Lei Xie, Xinhan Di |
| 2025 | MMLoRA: Multitask Memory Parameter-Efficient Fine-Tuning for Multimodal SER. Yuanbo Fang, Xiaofen Xing, Xueru Li, Weibin Zhang, Xiangmin Xu |
| 2025 | MOPSA: Mixture of Prompt-Experts Based Speaker Adaptation for Elderly Speech Recognition. Chengxi Deng, Xurong Xie, Shujie Hu, Mengzhe Geng, Yicong Jiang, Jiankun Zhao, Jiajun Deng, Guinan Li, Youjun Chen, Huimeng Wang, Haoning Xu, Mingyu Cui, Xunying Liu |
| 2025 | MOVER: Combining Multiple Meeting Recognition Systems. Naoyuki Kamo, Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani |
| 2025 | MPE-TTS: Customized Emotion Zero-Shot Text-To-Speech Using Multi-Modal Prompt. Zhichao Wu, Yueteng Kang, Songjun Cao, Long Ma, Qiulin Li, Qun Yang |
| 2025 | MSDA: Combining Pseudo-labeling and Self-Supervision for Unsupervised Domain Adaptation in ASR. Dimitrios Damianos, Georgios Paraskevopoulos, Alexandros Potamianos |
| 2025 | MSFNet: A Nested Model for Multi-Sampling-Frequency Speech Enhancement. Venkatesh Parvathala, K. Sri Rama Murty |
| 2025 | MTSE: Multi-Target Speaker Extraction for Conversation Scenarios. Thomas Serre, Mathieu Fontaine, Eric Benhaim, Slim Essid |
| 2025 | MVP: Multi-source Voice Pathology detection. Alkis Koudounas, Moreno La Quatra, Gabriele Ciravegna, Marco Fantini, Erika Crosetti, Giovanni Succo, Tania Cerquitelli, Sabato Marco Siniscalchi, Elena Baralis |
| 2025 | Mamba-based Hybrid Model for Speech Enhancement. Se-Ha Kim, Tae-Gyeong Kim, Chang-Jae Chun |
| 2025 | Medusa: A Multimodal Deep Fusion Multi-Stage Training Framework for Speech Emotion Recognition in Naturalistic Conditions. Georgios Chatzichristodoulou, Despoina Kosmopoulou, Antonios Kritikos, Anastasia Poulopoulou, Efthymios Georgiou, Athanasios Katsamanis, Vassilis Katsouros, Alexandros Potamianos |
| 2025 | Mel-McNet: A Mel-Scale Framework for Online Multichannel Speech Enhancement. Yujie Yang, Bing Yang, Xiaofei Li |
| 2025 | MelRe: Vision-Based Mel-Spectrogram Restoration. Kaixuan Luan, Xiaoda Yang, Shile Cai, Ruofan Hu, Minghui Fang, Wenrui Liu, Jialong Zuo, Jiaqi Duan, Yuhang Ma, Junyu Lu |
| 2025 | Meta-Learning Approaches for Speaker-Dependent Voice Fatigue Models. Roseline Polle, Agnes Norbury, Alexandra Livia Georgescu, Nicholas Cummins, Stefano Goria |
| 2025 | Meta-PerSER: Few-Shot Listener Personalized Speech Emotion Recognition via Meta-learning. Shi-Xin Fang, Liang-Yeh Shen, Yi-Cheng Lin, Huang-Cheng Chou, Hung-yi Lee |
| 2025 | MiSTR: Multi-Modal iEEG-to-Speech Synthesis with Transformer-Based Prosody Prediction and Neural Phase Reconstruction. Mohammed Salah Al-Radhi, Géza Németh, Branislav Gerazov |
| 2025 | Mimic Blocker: Self-Supervised Adversarial Training for Voice Conversion Defense with Pretrained Feature Extractors. Gwangyeol Yu, Junhyeok Lee, Seoryeong Kim, Jimin Lee, Jehyuk Lee |
| 2025 | Mispronunciation Detection Without L2 Pronunciation Dataset in Low-Resource Setting: A Case Study in Finland Swedish. Nhan Phan, Mikko Kuronen, Maria Kautonen, Riikka Ullakonoja, Anna von Zansen, Yaroslav Getman, Ekaterina Voskoboinik, Tamás Grósz, Mikko Kurimo |
| 2025 | Mitigating Audiovisual Mismatch in Visual-Guide Audio Captioning. Le Xu, Chenxing Li, Yong Ren, Yujie Chen, Yu Gu, Ruibo Fu, Shan Yang, Dong Yu |
| 2025 | Mitigating Language Mismatch in SSL-Based Speaker Anonymization. Zhe Zhang, Wen-Chin Huang, Xin Wang, Xiaoxiao Miao, Junichi Yamagishi |
| 2025 | Mitigating Non-Target Speaker Bias in Guided Speaker Embedding. Shota Horiguchi, Takanori Ashihara, Marc Delcroix, Atsushi Ando, Naohiro Tawara |
| 2025 | Mitigating Overfitting During Speech Foundation Model Fine-tuning: Applications to Dysarthric Speech Detection. Yan Xiong, Visar Berisha, Julie Liss, Chaitali Chakrabarti |
| 2025 | Mitigating Subgroup Disparities in Multi-Label Speech Emotion Recognition: A Pseudo-Labeling and Unsupervised Learning Approach. Yi-Cheng Lin, Huang-Cheng Chou, Hung-yi Lee |
| 2025 | Mixture of LoRA Experts for Low-Resourced Multi-Accent Automatic Speech Recognition. Raphaël Bagat, Irina Illina, Emmanuel Vincent |
| 2025 | Modality-Agnostic Multimodal Emotion Recognition using a Contrastive Masked Autoencoder. Georgios Chochlakis, Turab Iqbal, Woo Hyun Kang, Zhaocheng Huang |
| 2025 | Modality-Specific Speech Enhancement and Noise-Adaptive Fusion for Acoustic and Body-Conduction Microphone Framework. Yunsik Kim, Yoonyoung Chung |
| 2025 | Model as Loss: A Self-Consistent Training Paradigm. Saisamarth Rajesh Phaye, Milos Cernak, Andrew Harper |
| 2025 | Modeling Formant Dynamics in Mandarin /ai/: Effects of Speech Style and Speech Rate. Yunzhuo Xiang, Jingyi Sun |
| 2025 | Modeling Multi-Turn Spoken Language Understanding with Dynamic Graph Convolutional Networks. Yi Huang, Si Chen, Jingyu Yao, Junlan Feng |
| 2025 | Modeling Probabilistic Reduction using Information Theory and Naive Discriminative Learning. Anna Stein, Kevin Tang |
| 2025 | Modeling Vowel System Typology Using Iterated Confusion Minimization. John McGahay |
| 2025 | Monotonic Attention for Robust Text-to-Speech Synthesis in Large Language Model Frameworks. Yike Zhang, Yiming Li, Jie Chen, Qinghua Wu, Songjun Cao, Long Ma |
| 2025 | Multi-Channel Acoustic Echo Cancellation Based on Direction-of-Arrival Estimation. Fei Zhao, Xueliang Zhang, Zhong-Qiu Wang |
| 2025 | Multi-Channel Sequence-to-Sequence Neural Diarization: Experimental Results for The MISP 2025 Challenge. Ming Cheng, Fei Su, Cancan Li, Juan Liu, Ming Li |
| 2025 | Multi-Modal Multi-Task Affective States Recognition Based on Label Encoder Fusion. Maxim Markitantov, Elena Ryumina, Heysem Kaya, Alexey Karpov |
| 2025 | Multi-Teacher Language-Aware Knowledge Distillation for Multilingual Speech Emotion Recognition. Mehedi Hasan Bijoy, Dejan Porjazovski, Tamás Grósz, Mikko Kurimo |
| 2025 | Multi-lingual and Zero-Shot Speech Recognition by Incorporating Classification of Language-Independent Articulatory Features. Ryo Magoshi, Shinsuke Sakai, Jaeyoung Lee, Tatsuya Kawahara |
| 2025 | Multi-task learning for speech emotion recognition in naturalistic conditions. Bartlomiej Zgórzynski, Juliusz Wójtowicz-Kruk, Piotr Masztalski, Wladyslaw Sredniawa |
| 2025 | Multi-view Fusion and Parameter Perturbation for Few-Shot Class-Incremental Audio Classification. Yulu Fang, Mingyue He, Qisheng Xu, Jianqiao Zhao, Cheng Yang, Kele Xu, Yong Dou |
| 2025 | MultiActor-Audiobook: Zero-Shot Audiobook Generation with Faces and Voices of Multiple Speakers. Kyeongman Park, Seongho Joo, Kyomin Jung |
| 2025 | Multichannel Keyword Spotting for Noisy Conditions. Dzmitry Saladukha, Ivan Koriabkin, Kanstantsin Artsiom, Aliaksei Rak, Nikita Ryzhikov |
| 2025 | Multilingual Query-by-Example KWS for Indian Languages using Transliteration. Kirandevraj R, Vinod K. Kurmi, Vinay P. Namboodiri, C. V. Jawahar |
| 2025 | Multilingual Speech Assessment Using Cross-Attention and Multitask Learning. Sehyun Oh, Minhwa Chung, Sunhee Kim |
| 2025 | Multimodal Assessment of Speech Impairment in Amyotrophic Lateral Sclerosis Using Audio-Visual and Machine Learning Approaches. Francesco Pierotti, Andrea Bandini |
| 2025 | Multimodal Biomarkers for Schizophrenia: Towards Individual Symptom Severity Estimation. Gowtham Premananth, Philip Resnik, Sonia Bansal, Deanna L. Kelly, Carol Y. Espy-Wilson |
| 2025 | Multimodal Dynamics of Hand Gestures and Pauses in Multiparty Interactions. Delphine Charuau, Naomi Harte |
| 2025 | Multimodal Emotion Diarization: Frame-Wise Integration of Text and Audio Representations. Ziv Tamir, Thomas Thebaud, Jesús Villalba, Najim Dehak, Oren Kurland |
| 2025 | Multimodal Fusion with Semi-Supervised Learning Minimizes Annotation Quantity for Modeling Videoconference Conversation Experience. Andrew Chang, Chenkai Hu, Ji Qi, Zhuojian Wei, Kexin Zhang, Viswadruth Akkaraju, David Poeppel, Dustin Freeman |
| 2025 | Multimodal Prosody Modeling: A Use Case for Multilingual Sentence Mode Prediction. Bogdan Vlasenko, Mathew Magimai-Doss |
| 2025 | Multimodal Silent Recognition of Phonemes Using Radar and Optopalatographic Silent Speech Interfaces. João Menezes, Aubin Mouras, Arne-Lukas Fietkau, Dani Kazzy, Peter Birkholz |
| 2025 | Multimodal Speech, Language and Orofacial Analysis for Remote Assessment of Positive, Negative and Cognitive Symptoms in Schizophrenia. Michael Neumann, Hardik Kothare, Beverly Insel, Anzalee Khan, Danyah Nadim, Jean-Pierre Lindenmayer, Vikram Ramanarayanan |
| 2025 | Multimodal Speech-Based Biomarkers Outperform the ALS Functional Rating Scale in Predicting Individual Disease Progression in ALS. Hardik Kothare, Michael Neumann, Vikram Ramanarayanan |
| 2025 | Multimodal Zero-Shot Framework for Deepfake Hate Speech Detection in Low-Resource Languages. Rishabh Ranjan, Ayinala Likhith, Mayank Vatsa, Richa Singh |
| 2025 | Multimodal and Multitask Learning for Predicting Multiple Scores in L2 English Speech. Sehyun Oh, Sunhee Kim, Minhwa Chung |
| 2025 | Multistage Universal Speech Enhancement System for URGENT Challenge. Xiaohuai Le, Zhuangqi Chen, Siyu Sun, Xianjun Xia, Chuanzeng Huang |
| 2025 | Multitalker Babble in English Vowel Perception Training: A Comparison between Humans and Neural Models. Wenwei Dong, Alif Silpachai, Catia Cucchiarini, Helmer Strik |
| 2025 | Multitask Learning with Fused Attention for Improved ASR and Mispronunciation Detection in Children's Speech Sound Disorders. Selina S. Sung, Seunghee Ha, Tae-Jin Yoon, Jungmin So |
| 2025 | Multivariate Probabilistic Assessment of Speech Quality. Fredrik Cumlin, Xinyu Liang, Victor Ungureanu, Chandan K. A. Reddy, Christian Schüldt, Saikat Chatterjee |
| 2025 | NAM-to-Speech Conversion with Multitask-Enhanced Autoregressive Models. Neil Shah, Shirish Karande, Vineet Gandhi |
| 2025 | NGPU-LM: GPU-Accelerated N-Gram Language Model for Context-Biasing in Greedy ASR Decoding. Vladimir Bataev, Andrei Andrusenko, Lilit Grigoryan, Aleksandr Laptev, Vitaly Lavrukhin, Boris Ginsburg |
| 2025 | NIRANTAR: Continual Learning with New Languages and Domains on Real-world Speech Data. Tahir Javed, Kaushal Santosh Bhogale, Mitesh M. Khapra |
| 2025 | NanoCodec: Towards High-Quality Ultra Fast Speech LLM Inference. Edresson Casanova, Paarth Neekhara, Ryan Langman, Shehzeen Hussain, Subhankar Ghosh, Xuesong Yang, Ante Jukic, Jason Li, Boris Ginsburg |
| 2025 | Naturalness-Aware Curriculum Learning with Dynamic Temperature for Speech Deepfake Detection. Taewoo Kim, Guisik Kim, Choongsang Cho, Young Han Lee |
| 2025 | Network of acoustic characteristics for the automatic detection of suicide risk from speech. Contribution to the 2025 SpeechWellness challenge by the Semawave team. Vincent P. Martin, Charles Brazier, Maxime Amblard, Michel Musiol, Jean-Luc Rouas |
| 2025 | Neural Spectral Band Generation for Audio Coding. Woongjib Choi, Byeong Hyeon Kim, Hyungseob Lim, Inseon Jang, Hong-Goo Kang |
| 2025 | Neural Speech Extraction with Human Feedback. Malek Itani, Ashton Graves, Sefik Emre Eskimez, Shyamnath Gollakota |
| 2025 | Neuro2Semantic: A Transfer Learning Framework for Semantic Reconstruction of Continuous Language from Human Intracranial EEG. Siavash Shams, Richard J. Antonello, Gavin Mischler, Stephan Bickel, Ashesh D. Mehta, Nima Mesgarani |
| 2025 | NeuroSpex+: Dual-Task Training of Neuro-Guided Speaker Extraction with Speech Envelope and Waveform. Dashanka De Silva, Siqi Cai, Saurav Pahuja, Tanja Schultz, Haizhou Li |
| 2025 | Neurodyne: Neural Pitch Manipulation with Representation Learning and Cycle-Consistency GAN. Yicheng Gu, Chaoren Wang, Zhizheng Wu, Lauri Juvela |
| 2025 | Neutral Tone Variation in Beijing Mandarin: Is Neutral Tone Toneless? Xiao Dong, Fengming Liu, Chien-Jer Charles Lin, Monica Nesbitt, Shuju Shi |
| 2025 | No Audiogram: Leveraging Existing Scores for Personalized Speech Intelligibility Prediction. Haoshuai Zhou, Changgeng Mo, Boxuan Cao, Linkai Li, Shan Xiang Wang |
| 2025 | Non-Intrusive Binaural Speech Intelligibility Prediction Using Mamba for Hearing-Impaired Listeners. Katsuhiko Yamamoto, Koichi Miyazaki |
| 2025 | Non-Standard Accent TTS Support via Large Multi-Accent Frontend Pronunciation Knowledge Transfer. Noe Berger, Siqi Sun, Korin Richmond |
| 2025 | Non-intrusive Speech Quality Assessment with Diffusion Models Trained on Clean Speech. Danilo de Oliveira, Julius Richter, Jean-Marie Lemercier, Simon Welker, Timo Gerkmann |
| 2025 | Nosey: Open-Source Hardware for Acoustic Nasalance. Maya Dewhurst, Jack Collins, Justin J. H. Lo, Roy Alderton, Sam Kirkham |
| 2025 | Novel Loss-Enhanced Universal Adversarial Patches for Sustainable Speaker Privacy. Elvir Karimov, Alexander Varlamov, Danil Ivanov, Dmitrii Korzh, Oleg Rogov |
| 2025 | Novel Parasitic Dual-Scale Modeling for Efficient and Accurate Multilingual Speech Translation. Chenyang Le, Yinfeng Xia, Huiyan Li, Manhong Wang, Yutao Sun, Xingyang Ma, Yanmin Qian |
| 2025 | OMPAL: Bridging Speech and Learning with an Open-Source Mandarin Pronunciation Assessment Corpus for Global Learners. Wen-Wei Hsieh, Hao-Wei Chi, Kuan-Chen Wang, Ping-Cheng Yeh, Te-Hsin Liu, Chen-Yu Chiang |
| 2025 | OWSM v4: Improving Open Whisper-Style Speech Models via Data Scaling and Cleaning. Yifan Peng, Muhammad Shakeel, Yui Sudo, William Chen, Jinchuan Tian, Chyi-Jiunn Lin, Shinji Watanabe |
| 2025 | OWSM-Biasing: Contextualizing Open Whisper-Style Speech Models for Automatic Speech Recognition with Dynamic Vocabulary. Yui Sudo, Yusuke Fujita, Atsushi Kojima, Tomoya Mizumoto, Lianbo Liu |
| 2025 | Objective and Subjective Evaluation of Diffusion-Based Speech Enhancement for Dysarthric Speech. Dimme de Groot, Tanvina Patel, Devendra Kayande, Odette Scharenborg, Zhengjun Yue |
| 2025 | On Apical Vowels in Eastern Zhenjiang Mandarin. Xuying Wang, Fang Hu |
| 2025 | On Enhancing the Performance of Children's ASR Task in Limited Data Scenario. Ankita, Shambhavi, Syed Shahnawazuddin |
| 2025 | On Retrieval of Long Audios with Complex Text Queries. Ruochu Yang, Milind Rao, Harshavardhan Sundar, Anirudh Raju, Aparna Khare, Srinath Tankasala, Di He, Venkatesh Ravichandran |
| 2025 | On the Design of a Robust Superdirective Beamformer and Topology Parameter Optimization with Frustum-Shaped Microphone Arrays Featuring Multiple Rings. Kunlong Zhao, Gongping Huang, Xudong Zhao, Jingdong Chen, Jacob Benesty, Zoran Cvetkovic |
| 2025 | On the Language and Gender Biases in PSTN, VoIP and Neural Audio Codecs. Kemal Altwlkany, Amar Kuric, Emanuel Lacic |
| 2025 | On the Production and Perception of a Single Speaker's Gender. Robin Netzorg, Naomi Carvalho, Andrea Guzman, Lydia Wang, Juliana Francis, Klo Vivienne Garoute, Keith Johnson, Gopala Anumanchipalli |
| 2025 | On the Relationship between Accent Strength and Articulatory Features. Kevin Huang, Sean Foley, Jihwan Lee, Yoonjeong Lee, Dani Byrd, Shrikanth Narayanan |
| 2025 | On the Relevance of Clinical Assessment Tasks for the Automatic Detection of Parkinson's Disease Medication State from Speech. David Gimeno-Gómez, Rubén Solera-Ureña, Anna Pompili, Carlos D. Martínez-Hinarejos, Rita Cardoso, Isabel Guimarães, Joaquim J. Ferreira, Alberto Abad |
| 2025 | On the Within-class Variation Issue in Alzheimer's Disease Detection. Jiawen Kang, Dongrui Han, Lingwei Meng, Jingyan Zhou, Jinchao Li, Xixin Wu, Helen Meng |
| 2025 | On the cross-modal makeup of charisma: Insights from a field-data analysis. Oliver Niebuhr |
| 2025 | On the influence of language similarity in non-target speaker verification trials. Paul M. Reuter, Michael Jessen |
| 2025 | On the reliability of feature attribution methods for speech classification. Gaofei Shen, Hosein Mohebbi, Arianna Bisazza, Afra Alishahi, Grzegorz Chrupala |
| 2025 | On-device Streaming Discrete Speech Units. Kwanghee Choi, Masao Someki, Emma Strubell, Shinji Watanabe |
| 2025 | On-the-fly Routing for Zero-shot MoE Speaker Adaptation of Speech Foundation Models for Dysarthric Speech Recognition. Shujie Hu, Xurong Xie, Mengzhe Geng, Jiajun Deng, Huimeng Wang, Guinan Li, Chengxi Deng, Tianzi Wang, Mingyu Cui, Helen Meng, Xunying Liu |
| 2025 | Online AV-CrossNet: a Causal and Efficient Audiovisual System for Speech Enhancement and Target Speaker Extraction. Cheng Yu, Vahid Ahmadi Kalkhorani, Buye Xu, DeLiang Wang |
| 2025 | Online Audio-Visual Autoregressive Speaker Extraction. Zexu Pan, Wupeng Wang, Shengkui Zhao, Chong Zhang, Kun Zhou, Yukun Ma, Bin Ma |
| 2025 | Open Universal Arabic ASR Leaderboard. Yingzhi Wang, Anas Alhmoud, Muhammad Alqurishi |
| 2025 | Open-Set Source Tracing of Audio Deepfake Systems. Nicholas Klein, Hemlata Tak, Elie Khoury |
| 2025 | Optimized Real-time Speech Enhancement with Deep SSMs on Raw Audio. Yan Ru Pei, Ritik Shrivastava, Sidharth |
| 2025 | Optimizing ASR for Catalan-Spanish Code-Switching: A Comparative Analysis of Methodologies. Carlos Mena, Pol Serra, Jacobo Romero, Abir Messaoudi, José Giraldo, Carme Armentano-Oller, Rodolfo Zevallos, Iván Meza, Javier Hernando |
| 2025 | Optimizing CLAP Reward with LLM Feedback for Semantically Aligned and Diverse Automated Audio Captioning. Seyun Ahn, Pil Moo Byun, Won-Gook Choi, Joon-Hyuk Chang |
| 2025 | Optimizing Pause Context in Fine-Tuning Pre-trained Large Language Models for Dementia Detection. Xiaoquan Ke, Man-Wai Mak, Helen Meng |
| 2025 | OpusLM: A Family of Open Unified Speech Language Models. Jinchuan Tian, William Chen, Yifan Peng, Jiatong Shi, Siddhant Arora, Shikhar Bharadwaj, Takashi Maekaku, Yusuke Shinohara, Keita Goto, Xiang Yue, Huck Yang, Shinji Watanabe |
| 2025 | Oral Reading Errors by Grade 3 Children in Indian Schools: A Hindi-English Perspective. Sneha Raman, Preeti Rao |
| 2025 | Overcoming Data Scarcity in Multi-Dialectal Arabic ASR via Whisper Fine-Tuning. Ömer Tarik Özyilmaz, Matt Coler, Matias Valdenegro-Toro |
| 2025 | Overestimated performance of auditory attention decoding caused by experimental design in EEG recordings. Yujie Yan, Xiran Xu, Haolin Zhu, Songyi Li, Bo Wang, Xihong Wu, Jing Chen |
| 2025 | Overlap-Adaptive Hybrid Speaker Diarization and ASR-Aware Observation Addition for MISP 2025 Challenge. Shangkun Huang, Yuxuan Du, Jingwen Yang, Dejun Zhang, Xupeng Jia, Jing Deng, Jintao Kang, Rong Zheng |
| 2025 | PAEFF: Precise Alignment and Enhanced Gated Feature Fusion for Face-Voice Association. Abdul Hannan, Muhammad Arslan Manzoor, Shah Nawaz, Muhammad Irzam Liaqat, Markus Schedl, Mubashir Noman |
| 2025 | PARROT: Synergizing Mamba and Attention-based SSL Pre-Trained Models via Parallel Branch Hadamard Optimal Transport for Speech Emotion Recognition. Orchid Chetia Phukan, Mohd Mujtaba Akhtar, Girish, Swarup Ranjan Behera, Jaya Sai Kiran Patibandla, Arun Balaji Buduru, Rajesh Sharma |
| 2025 | PAST: Phonetic-Acoustic Speech Tokenizer. Nadav Har-Tuv, Or Tal, Yossi Adi |
| 2025 | PERCEPT-US: A Multimodal American English Child Speech Corpus Specialized for Articulatory Feedback. Amanda Eads, Heather Kabakoff, Nina Benway, Elaine Hitchcock, Jonathan L. Preston, Tara McAllister |
| 2025 | PPGs-BERT: Leveraging Phoneme Sequence and BERT for Alzheimer's Disease Detection from Spontaneous Speech. Qi Sun, Ziyue Qiu, Yu Pu, Jinpeng Li, Xuchu Chen, Wei-Qiang Zhang |
| 2025 | Pairwise Evaluation of Accent Similarity in Speech Synthesis. Jinzuomu Zhong, Suyuan Liu, Dan Wells, Korin Richmond |
| 2025 | ParaNoise-SV: Integrated Approach for Noise-Robust Speaker Verification with Parallel Joint Learning of Speech Enhancement and Noise Extraction. Minu Kim, Kangwook Jang, Hoirin Kim |
| 2025 | Parameter-Efficient Fine-Tuning for Low-Resource Text-to-Speech via Cross-Lingual Continual Learning. Ki-Joong Kwon, Jun-Ho So, Sang-Hoon Lee |
| 2025 | Parameter-Efficient Fine-tuning with Instance-Aware Prompt and Parallel Adapters for Speaker Verification. Shengyu Peng, Wu Guo, Jie Zhang, Yu Guan, Lipeng Dai, Zuoliang Li |
| 2025 | Parameter-efficient Fine-tuning of Conformer-based Streaming Speech Recognition into Non-streaming Models. Yunjae Nam, Jeong U. Han, Kiyeon Kim, Jaemin Lim |
| 2025 | PartialEdit: Identifying Partial Deepfakes in the Era of Neural Speech Editing. You Zhang, Baotong Tian, Lin Zhang, Zhiyao Duan |
| 2025 | Pathology-Aware Speech Encoding and Data Augmentation for Dysarthric Speech Recognition. Ilja Baumann, Dominik Wagner, Korbinian Riedhammer, Tobias Bocklet |
| 2025 | Patient-Aware Feature Alignment for Robust Lung Sound Classification: Cohesion-Separation and Global Alignment Losses. Seung Gyu Jeong, Seong Eun Kim |
| 2025 | Perception of Emotional Speech by Individuals with High Borderline Personality Features. Yizhou Chen, Xiyu Wu |
| 2025 | Perception of Long and Short Vowel Contrast in Te Reo Māori in Clean and Everyday Listening Environments. C. T. Justine Hui, Jenice Kuzhikombil, Isabella Shields, Hiraia Haami-Wells, Catherine I. Watson, Peter J. Keegan |
| 2025 | Performance of Montreal Forced Aligner on Cantonese Spontaneous Speech. Ka Ki SO, Chenzi Xu, Grace Wenling Cao, Peggy Mok |
| 2025 | PeriodCodec: A Pitch-Controllable Neural Audio Codec Using Periodic Signals for Singing Voice Synthesis. Masato Takagi, Miku Nishihara, Yukiya Hono, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda |
| 2025 | PersonaTAB: Predicting Personality Traits using Textual, Acoustic, and Behavioral Cues in Fully-Duplex Speech Dialogs. Sho Inoue, Shuai Wang, Haizhou Li |
| 2025 | Personalized Fine-Tuning with Controllable Synthetic Speech from LLM-Generated Transcripts for Dysarthric Speech Recognition. Dominik Wagner, Ilja Baumann, Natalie Engert, Seanie Lee, Elmar Nöth, Korbinian Riedhammer, Tobias Bocklet |
| 2025 | PhonemeFake: Redefining Deepfake Realism with Language-Driven Segmental Manipulation and Adaptive Bilevel Detection. Oguzhan Baser, Ahmet Ege Tanriverdi, Sriram Vishwanath, Sandeep Chinchali |
| 2025 | Phonetic Posteriorgram-Based Phoneme Selection for Vocal Cord Disorder Classification in Continuous Mandarin Speech. Chih-Ning Chen, Yu-Lan Chuang, Ming-Jhang Yang, Wei-Cheng Hsu, Yung-An Tsou, Yi-Wen Liu |
| 2025 | Phonetically-Augmented Discriminative Rescoring for Voice Search Error Correction. Christophe Van Gysel, Maggie Wu, Lyan Verwimp, Caglar Tirkaz, Marco Bertola, Zhihong Lei, Youssef Oualil |
| 2025 | Physiologically-Informed Feature Analysis of Acquired Speech Disorders for Stroke Assessment. Giulia Sanguedolce, Jón Guðnason, Dragos-Cristian Gruia, Emilie D'Olne, Fatemeh Geranmayeh, Patrick A. Naylor |
| 2025 | Pick and Summarize: Integrating Extractive and Abstractive Speech Summarization. Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Ryo Fukuda, William Chen, Shinji Watanabe |
| 2025 | Pinyin-Guided Chinese Speech Recognition with Large Language Model. Jie Zhengjie, Gaofeng Cheng |
| 2025 | Pitch Accent Detection improves Pretrained Automatic Speech Recognition. David Sasu, Natalie Schluter |
| 2025 | Pitch Contour Model (PCM) with Transformer Cross-Attention for Speech Emotion Recognition. Minji Ryu, Ji-Hyeon Hur, Sung Heuk Kim, Gahgene Gweon |
| 2025 | Pitch Target Realization in Putonghua Tone Production of Children from Dialect-Speaking Regions. Mengxue Cao, Tianxin Zheng, Jiewen Zheng |
| 2025 | Pitfalls and Limits in Automatic Dementia Assessment. Franziska Braun, Christopher Witzl, Andreas Erzigkeit, Hartmut Lehfeld, Thomas Hillemacher, Tobias Bocklet, Korbinian Riedhammer |
| 2025 | Plug-and-Play Co-Occurring Face Attention for Robust Audio-Visual Speaker Extraction. Zexu Pan, Shengkui Zhao, Tingting Wang, Kun Zhou, Yukun Ma, Chong Zhang, Bin Ma |
| 2025 | Position also matters! Separating Same Instruments in String Quartet using Timbral and Positional Cues. Yuetonghui Xu, Yiwen Wang, Xihong Wu, Xiaobing Li, Feng Yu |
| 2025 | Power Spectral Density Estimation for Acoustic Source Separation Using A Spherical Microphone Array. Liang Tao, Maoshen Jia, Yonggang Hu |
| 2025 | Pre-aspiration in Iceland Is Conditioned by Gender/Sex. Meike Rommel, Mísa Hejná, Nicole Dehé |
| 2025 | PredTrAD - Prediction-based Transformer for Anomaly Detection in Multivariate Time Series Data. Jan Schuster, Alexander Wölfel, Fabian Brunner, Christian Bergler |
| 2025 | Predicting Adolescent Suicidal Risk from Multi-task-based Speech: An Ensemble Learning Approach. Xi Chen, Renzhe Yu, Yanshen Tan, Yiyi Li, Quan Qian, Ying Lin |
| 2025 | Prediction of listening effort ratings for habitual and clear-Lombard speech presented in noise. Esther Janse, Chen Shen, Martin Cooke |
| 2025 | Pretraining Multi-Speaker Identification for Neural Speaker Diarization. Shota Horiguchi, Atsushi Ando, Naohiro Tawara, Marc Delcroix |
| 2025 | Privacy-Preserving Speaker Verification via End-to-End Secure Representation Learning. Chenguang Hu, Yaqian Hao, Fulin Zhang, Xiaoxue Luo, Yao Shen, Yingying Gao, Chao Deng, Shilei Zhang, Junlan Feng |
| 2025 | Private kNN-VC: Interpretable Anonymization of Converted Speech. Carlos Franzreb, Arnab Das, Tim Polzehl, Sebastian Möller |
| 2025 | ProBiEM: Acoustic and Lexical Correlates of Prosodic Prominence in English-Malayalam Bilingual Speech. Anindita Mondal, Rahul Biju, Anil Kumar Vuppala, Reni K. Cherian, Chiranjeevi Yarra |
| 2025 | ProMode: A Speech Prosody Model Conditioned on Acoustic and Textual Inputs. Eray Eren, Qingju Liu, Hyeongwoo Kim, Pablo Garrido, Abeer Alwan |
| 2025 | Probing Prosodic Differences Between Two Regional Varieties of Brazilian Portuguese. Gustavo Silveira, Aviad Albert, Martine Grice |
| 2025 | Probing the Robustness Properties of Neural Speech Codecs. Wei-Cheng Tseng, David Harwath |
| 2025 | Processing of grammatical information in cochlear implant simulated speech by German adult listeners. Atty Schouwenaars, Esther Ruigendijk |
| 2025 | Prolongation in Romanian. Oana Niculescu, Monica Vasileanu |
| 2025 | PromptEVC: Controllable Emotional Voice Conversion with Natural Language Prompts. Tianhua Qi, Shiyan Wang, Cheng Lu, Tengfei Song, Hao Yang, Zhanglin Wu, Wenming Zheng |
| 2025 | Prompting Whisper for Improved Verbatim Transcription and End-to-end Miscue Detection. Griffin Dietz Smith, Dianna Yee, Jennifer King Chen, Leah Findlater |
| 2025 | Prosodic Structure Beyond Lexical Content: A Study of Self-Supervised Learning. Sarenne Wallbridge, Christoph Minixhofer, Catherine Lai, Peter Bell |
| 2025 | Prosodically Enhanced Foreign Accent Simulation by Discrete Token-based Resynthesis Only with Native Speech Corpora. Kentaro Onda, Keisuke Imoto, Satoru Fukayama, Daisuke Saito, Nobuaki Minematsu |
| 2025 | Prosody-Adaptable Audio Codecs for Zero-Shot Voice Conversion via In-Context Learning. Junchuan Zhao, Xintong Wang, Ye Wang |
| 2025 | PruneSLU: Efficient On-device Spoken Language Understanding through Vocabulary and Structural Pruning. Truong Do, Phuong Minh Nguyen, Le-Minh Nguyen |
| 2025 | Pseudo Labels-based Neural Speech Enhancement for the AVSR Task in the MISP-Meeting Challenge. Longjie Luo, Shenghui Lu, Lin Li, Qingyang Hong |
| 2025 | Pull It Together: Reducing the Modality Gap in Contrastive Learning. Amit Sofer, Yoav Goldman, Shlomo E. Chazan |
| 2025 | Pushing the Frontiers of Self-Distillation Prototypes Network with Dimension Regularization and Score Normalization. Yafeng Chen, Chong Deng, Hui Wang, Yiheng Jiang, Han Yin, Qian Chen, Wen Wang |
| 2025 | Pushing the Limits of Beam Search Decoding for Transducer-based ASR models. Lilit Grigoryan, Vladimir Bataev, Andrei Andrusenko, Hainan Xu, Vitaly Lavrukhin, Boris Ginsburg |
| 2025 | Pushing the Limits of End-to-End Diarization. Samuel J. Broughton, Lahiru Samarakoon |
| 2025 | Pushing the Performance of Synthetic Speech Detection with Kolmogorov-Arnold Networks and Self-Supervised Learning Models. Tuan Dat Phuong, Long-Vu Hoang, Huy Dat Tran |
| 2025 | QUADS: Quantized Distillation Framework for Efficient Speech Language Understanding. Subrata Biswas, Mohammad Nur Hossain Khan, Bashima Islam |
| 2025 | Quadruple Path Modeling with Latent Feature Transfer for Permutation-free Continuous Speech Separation. Jihyun Kim, Doyeon Kim, Hyewon Han, Jinyoung Lee, Jonguk Yoo, Chang Woo Han, Jeongook Song, Hoon-Young Cho, Hong-Goo Kang |
| 2025 | Quantifying and Reducing Speaker Heterogeneity within the Common Voice Corpus for Phonetic Analysis. Miao Zhang, Aref Farhadipour, Annie Baker, Jiachen Ma, Bogdan Pricop, Eleanor Chodroff |
| 2025 | Queer Waves: A German Speech Dataset Capturing Gender and Sexual Diversity from Podcasts and YouTube. Ingo Siegert, Jan Marquenie, Sven Grawunder |
| 2025 | R2S: Real-to-Synthetic Representation Learning for Training Speech Recognition Models on Synthetic Data. Minh Tran, Debjyoti Paul, Yutong Pang, Laxmi Pandey, Jinxi Guo, Ke Li, Shun Zhang, Xuedong Zhang, Xin Lei |
| 2025 | RA-CLAP: Relation-Augmented Emotional Speaking Style Contrastive Language-Audio Pretraining For Speech Retrieval. Haoqin Sun, Jingguang Tian, Jiaming Zhou, Hui Wang, Jiabei He, Shiwan Zhao, Xiangyu Kong, Desheng Hu, Xinkang Xu, Xinhui Hu, Yong Qin |
| 2025 | REAL-T: Real Conversational Mixtures for Target Speaker Extraction. Shaole Li, Shuai Wang, Jiangyu Han, Ke Zhang, Wupeng Wang, Haizhou Li |
| 2025 | REB-former: RWKV-enhanced E-branchformer for Speech Recognition. Jie Song, Wang Xiang, Jian Zhou, Cunhang Fan, Zhao Lv |
| 2025 | RELATE: Subjective evaluation dataset for automatic evaluation of relevance between text and audio. Yusuke Kanamori, Yuki Okamoto, Taisei Takano, Shinnosuke Takamichi, Yuki Saito, Hiroshi Saruwatari |
| 2025 | RESOUND: Speech Reconstruction from Silent Videos via Acoustic-Semantic Decomposed Modeling. Long-Khanh Pham, Thanh V. T. Tran, Minh-Tan Pham, Van Nguyen |
| 2025 | REWIND: Speech Time Reversal for Enhancing Speaker Representations in Diffusion-based Voice Conversion. Ishan D. Biyani, Nirmesh J. Shah, Ashishkumar P. Gudmalwar, Pankaj Wasnik, Rajiv Ratn Shah |
| 2025 | Ranking and Selection of Bias Words for Contextual Bias Speech Recognition. Haoxiang Hou, Xun Gong, Wangyou Zhang, Wei Wang, Yanmin Qian |
| 2025 | RapFlow-TTS: Rapid and High-Fidelity Text-to-Speech with Improved Consistency Flow Matching. Hyun Joon Park, Jeongmin Liu, Jin Sob Kim, Jeong Yeol Yang, Sung Won Han, Eunwoo Song |
| 2025 | Rapport-Building Dialogue Strategies for Deeper Connection: Integrating Proactive Behavior, Personalization, and Aizuchi Backchannels. Muhammad Yeza Baihaqi, Angel F. Garcia Contreras, Seiya Kawano, Koichiro Yoshino |
| 2025 | Rasmalai : Resources for Adaptive Speech Modeling in IndiAn Languages with Accents and Intonations. Ashwin Sankar, Yoach Lacombe, Sherry Thomas, Praveen Srinivasa Varadhan, Sanchit Gandhi, Mitesh M. Khapra |
| 2025 | ReFlow-VC: Zero-shot Voice Conversion Based on Rectified Flow and Speaker Feature Optimization. Pengyu Ren, Wenhao Guan, Kaidi Wang, Peijie Chen, Qingyang Hong, Lin Li |
| 2025 | ReSepNet: A Unified-Light Model for Recursive Speech Separation with Unknown Speaker Count. Hadi Alizadeh, Rahil Mahdian Toroghi, Hassan Zareian |
| 2025 | Real-Time Audio-Visual Speech Enhancement Using Pre-trained Visual Representations. Teng Aleksandra Ma, Sile Yin, Li-Chia Yang, Shuo Zhang |
| 2025 | Real-Time Diffusion Buffer for Speech Enhancement On A Laptop. Bunlong Lay, Rostilav Makarov, Timo Gerkmann |
| 2025 | Real-time TSE demonstration via SoundBeam with KD. Keigo Wakayama, Tomoko Kawase, Takafumi Moriya, Marc Delcroix, Hiroshi Sato, Tsubasa Ochiai, Masahiro Yasuda, Shoko Araki |
| 2025 | Reasoning-Based Approach with Chain-of-Thought for Alzheimer's Detection Using Speech and Large Language Models. Chanwoo Park, Anna Seo Gyeong Choi, Sunghye Cho, Chanwoo Kim |
| 2025 | Recognizing Every Voice: Towards Inclusive ASR for Rural Bhojpuri Women. Sakshi Joshi, Eldho Ittan George, Tahir Javed, Kaushal Santosh Bhogale, Nikhil Narasimhan, Mitesh M. Khapra |
| 2025 | Reconstruction of the Complete Vocal Tract Contour Through Acoustic to Articulatory Inversion Using Real-Time MRI Data. Sofiane Azzouz, Pierre-André Vuissoz, Yves Laprie |
| 2025 | Recreating Neural Activity During Speech Production with Language and Speech Model Embeddings. Owais Mujtaba Khanday, Pablo Rodríguez San Esteban, Zubair Ahmad Lone, Marc Ouellet, Jose A. Gonzalez-Lopez |
| 2025 | Reddit FlairShare: A Human-Annotated Dataset of Gender-Progressive Online Discourse. Carlos Hartmann |
| 2025 | Regularized Federated Learning for Privacy-Preserving Dysarthric and Elderly Speech Recognition. Tao Zhong, Mengzhe Geng, Shujie Hu, Guinan Li, Xunying Liu |
| 2025 | Regularizing Learnable Feature Extraction for Automatic Speech Recognition. Peter Vieting, Maximilian Kannen, Benedikt Hilmes, Ralf Schlüter, Hermann Ney |
| 2025 | Rehearsal with Auxiliary-Informed Sampling for Audio Deepfake Detection. Falih Gozi Febrinanto, Kristen Moore, Chandra Thapa, Jiangang Ma, Vidya Saikrishna, Feng Xia |
| 2025 | Relationship between objective and subjective perceptual measures of speech in individuals with head and neck cancer. Bence Mark Halpern, Thomas Tienkamp, Teja Rebernik, Rob J. J. H. van Son, Martijn Wieling, Defne Abur, Tomoki Toda |
| 2025 | Relative cue weighting in multilingual stop voicing production. Le Xuan Chan, Annika Heuser |
| 2025 | Replay Attacks Against Audio Deepfake Detection. Nicolas M. Müller, Piotr Kawa, Wei Herng Choong, Adriana Stan, Aditya Tirumala Bukkapatnam, Karla Pizzi, Alexander Wagner, Philip Sperl |
| 2025 | Representation of Perceived Prosodic Similarity of Conversational Feedback. Livia Qian, Carol Figueroa, Gabriel Skantze |
| 2025 | Representing Speech Through Autoregressive Prediction of Cochlear Tokens. Greta Tuckute, Klemen Kotar, Evelina Fedorenko, Daniel Yamins |
| 2025 | Restoring Harmonics: Enhancing Speech Quality with Deep Mask and Harmonic Restoration Network. Yu Zhao, Zengqiang Shang, Mou Wang, Xin Liu, Pengyuan Zhang |
| 2025 | Rethinking Leveraging Pre-Trained Multi-Layer Representations for Speaker Verification. Jin Sob Kim, Hyun Joon Park, Wooseok Shin, Sung Won Han |
| 2025 | Revisiting WFST-based Hybrid Japanese Speech Recognition System for Individuals with Organic Speech Disorders. Naoki Hojo, Ryoichi Takashima, Chihiro Sugiyama, Nobukazu Tanaka, Kanji Nohara, Kazunori Nozaki, Tetsuya Takiguchi |
| 2025 | Revival with Voice: Multi-modal Controllable Text-to-Speech Synthesis. Minsu Kim, Pingchuan Ma, Honglie Chen, Stavros Petridis, Maja Pantic |
| 2025 | Rhotic Articulation in Australian English: Insights from MRI. Michael Proctor, Tünde Szalay, Tharinda Piyadasa, Craig T. Jin, Naeim Sanaei, Amelia Gully, David Waddington, Sheryl Foster, Kirrie J. Ballard |
| 2025 | Robot-assisted Recognition of Vocal Emotions in Pseudospeech for Cochlear Implanted Adolescents. Gloria Araiza-Illan, Luke Meyer, Bert Maat, Deniz Baskent |
| 2025 | Robust Neural Codec Language Modeling with Phoneme Position Prediction for Zero-Shot TTS. Chunhui Lu, Xue Wen, Liming Song, Junkwang Oh |
| 2025 | Robust Personal Voice Activity Detection for Mitigating Domain Mismatch and False Acceptance Scenarios. Yuke Lin, Jun Chen, Wenjie Li, Longshuai Xiao, Chao Weng |
| 2025 | Robust Target Speaker Diarization and Separation via Augmented Speaker Embedding Sampling. Md Asif Jalal, Luca Remaggi, Vasileios Moschopoulos, Thanasis Kotsiopoulos, Vandana Rajan, Karthikeyan Saravanan, Anastasios Drosou, Junho Heo, Hyuk Oh, Seokyeong Jeong |
| 2025 | Robust Unsupervised Adaptation of a Speech Recogniser Using Entropy Minimisation and Speaker Codes. Rogier C. van Dalen, Shucong Zhang, Titouan Parcollet, Sourav Bhattacharya |
| 2025 | Robust Vocal Intensity Prediction: Overcoming Dataset Bias with Pretrained Deep Models. Quentin Le Tellier, Marc Evrard, Albert Rilliard, Jean-Sylvain Liénard |
| 2025 | Robust fine-tuning of speech recognition models via model merging: application to disordered speech. Alexandre Ducorroy, Rachid Riad |
| 2025 | Robustness of F0 Ratio as a Diagnostic: Comparing Creaky Voice in Danish and Seoul Korean. Michaela Watkins, Rasmus Puggaard-Rode, Paul Boersma, Silke Hamann |
| 2025 | Rollback Speech: Smart Feedback Prompts for Lost Utterances in Unstable Online Calls. Yuni Amaloa Quintero Villalobos, Wafaa Wardah, Sebastian Möller, Robert P. Spang |
| 2025 | Room Impulse Response as a Prompt for Acoustic Echo Cancellation. Fei Zhao, Shulin He, Xueliang Zhang |
| 2025 | Running Conventional Automatic Speech Recognition on Memristor Hardware: A Simulated Approach. Nick Rossenbach, Benedikt Hilmes, Leon Brackmann, Moritz Gunz, Ralf Schlüter |
| 2025 | SA-RAS: Speaker-Aware Style Retrieval Augmented Generation for Expressive Zero-Shot Text-to-Speech Synthesis. Xueru Li, Jingyuan Xing, Xiaofen Xing, Zhipeng Li, Xiangmin Xu |
| 2025 | SAKURA: On the Multi-hop Reasoning of Large Audio-Language Models Based on Speech and Audio Information. Chih-Kai Yang, Neo Ho, Yen-Ting Piao, Hung-yi Lee |
| 2025 | SC-SOT: Conditioning the Decoder on Diarized Speaker Information for End-to-End Overlapped Speech Recognition. Yuta Hirano, Sakriani Sakti |
| 2025 | SCD-Conformer: Semantic Content Disentanglement for Text-Independent Speaker Verification. Shanshan Yao, Dianlong Liu, Tian Li |
| 2025 | SCRIBAL: A Digital Transcription Tool in Higher Education. Javier Román, Pol Pastells, Mauro Vázquez Chas, Clara Puigventós, Montserrat Nofre, Mariona Taulé, Mireia Farrús |
| 2025 | SDBench: A Comprehensive Benchmark Suite for Speaker Diarization. Berkin Durmus, Blaise Munyampirwa, Eduardo Pacheco, Atila Orhon, Andrey Leonov |
| 2025 | SEED: Speaker Embedding Enhancement Diffusion Model. Kihyun Nam, Jungwoo Heo, Jee-weon Jung, Gangin Park, Chaeyoung Jung, Ha-Jin Yu, Joon Son Chung |
| 2025 | SGED-Probe: Probing E2E ASR decoder and aligner for spoken grammar error detection under three speaking practice conditions. Chowdam Venkata Thirumala Kumar, Chiranjeevi Yarra |
| 2025 | SHEET: A Multi-purpose Open-source Speech Human Evaluation Estimation Toolkit. Wen-Chin Huang, Erica Cooper, Tomoki Toda |
| 2025 | SIDC-KWS: Efficient Spiking Inception-Dilated Conformer with Self-Attention for Keyword Spotting. Jin-Gyo Lim, Seong-Eun Kim |
| 2025 | SLASH: Self-Supervised Speech Pitch Estimation Leveraging DSP-derived Absolute Pitch. Ryo Terashima, Yuma Shirahata, Masaya Kawamura |
| 2025 | SMARTMOS: Modeling Subjective Audio Quality Evaluation for Real-Time Applications. Sivakumar Balasubramanian, Jose Antonio Jimenez Amador, Kaustubh Kalgaonkar, King-Wei Hor, Sriram Srinivasan |
| 2025 | SNIFR : Boosting Fine-Grained Child Harmful Content Detection Through Audio-Visual Alignment with Cascaded Cross-Transformer. Orchid Chetia Phukan, Mohd Mujtaba Akhtar, Girish, Swarup Ranjan Behera, Abu Osama Siddiqui, Sarthak Jain, Priyabrata Mallick, Jaya Sai Kiran Patibandla, Pailla Balakrishna Reddy, Arun Balaji Buduru, Rajesh Sharma |
| 2025 | SNR-Aligned Consistent Diffusion for Adaptive Speech Enhancement. Yonghyeon Jun, Beomjun Woo, Myeonghun Jeong, Nam Soo Kim |
| 2025 | SOMSRED-SVC: Sequential Output Modeling with Speaker Vector Constraints for Joint Multi-Talker Overlapped ASR and Speaker Diarization. Naoki Makishima, Naotaka Kawata, Taiga Yamane, Mana Ihori, Tomohiro Tanaka, Satoshi Suzuki, Shota Orihashi, Ryo Masumura |
| 2025 | SOVA-Bench: Benchmarking the Speech Conversation Ability for LLM-based Voice Assistant. Yixuan Hou, Heyang Liu, Yuhao Wang, Ziyang Cheng, Ronghua Wu, Qunshan Gu, Yanfeng Wang, Yu Wang |
| 2025 | SPCODEC: Split and Prediction for Neural Speech Codec. Liang Wen, Lizhong Wang, Yuxing Zheng, Weijing Shi, Kwang Pyo Choi |
| 2025 | SPEAKtoCOPD: a flashmob study to collect COPD speech. Loes van Bemmel, Lauren G. Reinders, Folkert Brijker, Bas Holverda, Frits M. E. Franssen, Hanneke van Helvoort, Visara Urovi, Marieke Spreeuwenberg, Sami O. Simons |
| 2025 | SPGISpeech 2.0: Transcribed multi-speaker financial audio for speaker-tagged transcription. Raymond Grossman, Taejin Park, Kunal Dhawan, Andrew Titus, Sophia Zhi, Yulia Shchadilova, Weiqing Wang, Jagadeesh Balam, Boris Ginsburg |
| 2025 | SQ-AST: A Transformer-Based Model for Speech Quality Prediction. Wafaa Wardah, Robert P. Spang, Vincent Barriac, Jan Reimes, Anna Llagostera, Jens Berger, Sebastian Möller |
| 2025 | SSF-DST: A Spectro-Spatial Features Enhanced Deep Spatiotemporal Network for EEG-Based Auditory Attention Detection. Tong Zhu, Xiaoke Yang, Jian Zhou, Lu Li, Zhao Lv, Cunhang Fan |
| 2025 | SSPS: Self-Supervised Positive Sampling for Robust Self-Supervised Speaker Verification. Théo Lepage, Réda Dehak |
| 2025 | STCON NIST SRE24 System: Composite Speaker Recognition Solution for Challenging Scenarios. Stepan Malykh, Alexander Anikin, Nikita Khmelev, Anastasia Korenevskaya, Anastasia Zorkina, Sergey Novoselov, Vladislav Marchevskiy, Vladimir Volokhov, Andrey Shulipa, Alexander Kozlov, Alexander Melnikov, Vasiliy Galyuk, Timur Pekhovskiy |
| 2025 | STOPA: A Dataset of Systematic VariaTion Of DeePfake Audio for Open-Set Source Tracing and Attribution. Anton Firc, Manasi Chhibber, Jagabandhu Mishra, Vishwanath Pratap Singh, Tomi Kinnunen, Kamil Malinka |
| 2025 | SaD: A Scenario-Aware Discriminator for Speech Enhancement. Xihao Yuan, Siqi Liu, Yan Chen, Hang Zhou, Chang Liu, Hanting Chen, Jie Hu |
| 2025 | SardinianVoxes: A Speech Recognition Dataset for the Sardinian Languages. Salvatore Carta, Alessandro Giuliani, Marco Manolo Manca, Mirko Marras, Leonardo Piano |
| 2025 | SawtArabi: A Benchmark Corpus for Arabic TTS. Standard, Dialectal and Code-Switching. Vasista Sai Lodagala, Lamya Alkanhal, Daniel Izham, Shivam Mehta, Shammur Absar Chowdhury, Aqeelah Makki, Hamdy S. Hussein, Gustav Eje Henter, Ahmed Ali |
| 2025 | Scalable Offline ASR for Command-Style Dictation in Courtrooms. Kumarmanas Nethil, Vaibhav Mishra, Kriti Anandan, Kavya Manohar |
| 2025 | Scalable Spontaneous Speech Dataset (SSSD): Crowdsourcing Data Collection to Promote Dialogue Research. Zaid Sheikh, Shuichiro Shimizu, Siddhant Arora, Jiatong Shi, Samuele Cornell, Xinjian Li, Shinji Watanabe |
| 2025 | Scaling Laws for Synthetic Speech for Model Training. Christoph Minixhofer, Ondrej Klejch, Peter Bell |
| 2025 | Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach. Umberto Cappellazzo, Minsu Kim, Stavros Petridis, Daniele Falavigna, Alessio Brutti |
| 2025 | Scaling and Prompting for Improved End-to-End Spoken Grammatical Error Correction. Mengjie Qian, Rao Ma, Stefano Bannò, Kate M. Knill, Mark J. F. Gales |
| 2025 | Scaling beyond Denoising: Submitted System and Findings in URGENT Challenge 2025. Zhihang Sun, Andong Li, Tong Lei, Rilin Chen, Meng Yu, Chengshi Zheng, Yi Zhou, Dong Yu |
| 2025 | Scaling pseudo-labeling data for end-to-end low-resource speech translation (the case of Kurdish language). Mohammad MohammadAmini, Aghilas Sini, Marie Tahon, Antoine Laurent |
| 2025 | Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs. Hayato Futami, Emiru Tsunoo, Yosuke Kashiwagi, Yuki Ito, Hassan Shahmohammadi, Siddhant Arora, Shinji Watanabe |
| 2025 | Score-Based Training for Energy-Based TTS Models. Wanli Sun, Anton Ragni |
| 2025 | Seamless Dysfluent Speech Text Alignment for Disordered Speech Analysis. Zongli Ye, Jiachen Lian, Xuanru Zhou, Jinming Zhang, Haodong Li, Shuhe Li, Chenxu Guo, Anaisha Das, Peter Park, Zoe Ezzes, Jet Vonk, Brittany Morin, Rian Bogley, Lisa Wauters, Zachary A. Miller, Maria Luisa Gorno-Tempini, Gopala Anumanchipalli |
| 2025 | Segmentation-Variant Codebooks for Preservation of Paralinguistic and Prosodic Information. Nicholas Sanders, Yuanchao Li, Korin Richmond, Simon King |
| 2025 | Selective Auditory Attention Decoding in Naturalistic Conversations Using EEG-Based Speech Envelope Tracking in Multi-Speaker Environments. Gabriel Ivucic, Saurav Pahuja, Dashanka De Silva, Tanja Schultz |
| 2025 | Selective Channel Attention based Target Speaker Voice Activity Detection for Speaker Diarization under AD-HOC Microphone Array Settings. Hongyu Zhang, Ming Cheng, Jing Feng, Ming Li |
| 2025 | Selective Invocation for Multilingual ASR: A Cost-effective Approach Adapting to Speech Recognition Difficulty. Hongfei Xue, Yufeng Tang, Jun Zhang, Xuelong Geng, Lei Xie |
| 2025 | Self-Improvement for Audio Large Language Model using Unlabeled Speech. Shaowen Wang, Xinyuan Chen, Yao Xu |
| 2025 | Self-Supervised Models of Speech Processing for Haitian Creole. William N. Havard, Renauld Govain, Benjamin Lecouteux, Emmanuel Schang |
| 2025 | Self-supervised Optimality-Guided Learning of Speech Articulation. Juraj Simko, Benjamin Elie, Alice Turk |
| 2025 | Self-supervised learning of speech representations with Dutch archival data. Nik Vaessen, Roeland Ordelman, David A. van Leeuwen |
| 2025 | Semantic Processing During Spoken Word Production by Children with Cochlear Implants. Man Wang, Yixin Ding, Niels O. Schiller |
| 2025 | Semantic-Aware Interpretable Multimodal Music Auto-Tagging. Andreas Patakis, Vassilis Lyberatos, Spyridon Kantarelis, Edmund Dervakos, Giorgos Stamou |
| 2025 | Semi-Supervised Learning for Automatic Speech Recognition with Word Error Rate Estimation and Targeted Domain Data Selection. Chanho Park, Thomas Hain |
| 2025 | Sentence-Final Particles in Mandarin Child-Directed Speech: Frequency and Impact on Speech Rate. Yizhi Liu, Luyuan Geng, Yan Gu, Mengru Han |
| 2025 | SepVAC: Multitask Learning of Speaker Separation, Speaker Localization, Microphone Array Localization, and Room Acoustic Parameter Estimation in Various Acoustic Conditions. Roland Hartanto, Sakriani Sakti, Koichi Shinoda |
| 2025 | SiamCTC: Learning Speech Representations through Monotonic Temporal Alignment. SooHwan Eom, Mark Hasegawa-Johnson, Chang D. Yoo |
| 2025 | Significance of Time-Frequency preprocessing for automatic Ultrasonic Vocalization classification in Autism Spectrum Disorder model detection. Szymon Szmajdzinski, Juliusz Wójtowicz-Kruk, Ivan Ryzhankow, Lukasz Lazarski, Jakub Zak, Wladyslaw Sredniawa |
| 2025 | Simple and Effective Content Encoder for Singing Voice Conversion via SSL-Embedding Dimension Reduction. Wangjin Zhou, Tianjiao Du, Chenglin Xu, Sheng Li, Yi Zhao, Tatsuya Kawahara |
| 2025 | Simultaneous Masked and Unmasked Decoding with Speculative Decoding Masking for Fast ASR without Accuracy Loss. Koji Okabe, Hitoshi Yamamoto |
| 2025 | Simultaneous Speech Translation Integrated Compact Multiple Sound Spot Synthesis System On A Laptop Carried Out With A Backpack. Takuma Okamoto, Michiyo Kono |
| 2025 | Skip-Salsa: Skip Synchronous Fusion of ASR LLM Decoders. Ashish R. Mittal, Darshan Prabhu, Sunita Sarawagi, Preethi Jyothi |
| 2025 | SonarGuard2: Ultrasonic Face Liveness Detection Based on Adaptive Doppler Effect Feature Extraction. Xiaoming Zhang, Ke-Yue Zhang, Taiping Yao, Songjun Cao, Shouhong Ding, Long Ma |
| 2025 | Song Form-aware Full-Song Text-to-Lyrics Generation with Multi-Level Granularity Syllable Count Control. Yunkee Chae, Eunsik Shin, Suntae Hwang, Seungryeol Paik, Kyogu Lee |
| 2025 | SoundSculpt: Direction and Semantics Driven Ambisonic Target Sound Extraction. Tuochao Chen, D. Shin, Hakan Erdogan, Sinan Hersek |
| 2025 | Sounding Like a Winner? Prosodic Differences in Post-Match Interviews. Sofoklis Kakouros, Haoyu Chen |
| 2025 | Source Verification for Speech Deepfakes. Viola Negroni, Davide Salvi, Paolo Bestagini, Stefano Tubaro |
| 2025 | Spatially Weighted Contrastive Learning for Robust Sound Source Localization. Hyun-Soo Kim, Da-Hee Yang, Joon-Hyuk Chang |
| 2025 | Spatio-Spectral Diarization of Meetings by Combining TDOA-based Segmentation and Speaker Embedding-based Clustering. Tobias Cord-Landwehr, Tobias Gburrek, Marc Deegen, Reinhold Haeb-Umbach |
| 2025 | Speaker Conditioning of Voice Activity Detection via Implicit Separation. Matthew Maciejewski |
| 2025 | Speaker Diarization with Overlapping Community Detection Using Graph Attention Networks and Label Propagation Algorithm. Zhaoyang Li, Jie Wang, Xiaoxiao Li, Wangjie Li, Longjie Luo, Lin Li, Qingyang Hong |
| 2025 | Speaker Normalization and Content Restoration for Zero-Shot Voice Conversion with Attention-Enhanced Discriminator. Desheng Hu, Yang Xiang, Jian Lu, Xinhui Hu, Xinkang Xu |
| 2025 | Speaker Separation for an Unknown Number of Speakers with Encoder-Decoder-Based Contextual Information Module. Xue Yang, Guiru Shen, Yu Yang |
| 2025 | Speaker Targeting via Self-Speaker Adaptation for Multi-talker ASR. Weiqing Wang, Taejin Park, Ivan Medennikov, Jinhan Wang, Kunal Dhawan, He Huang, Nithin Rao Koluguri, Jagadeesh Balam, Boris Ginsburg |
| 2025 | Speaker-Aware Multi-Task Learning for Speech Emotion Recognition. Xiaohan Shi, Xingfeng Li, Tomoki Toda |
| 2025 | Speaker-Distinguishable CTC: Learning Speaker Distinction Using CTC for Multi-Talker Speech Recognition. Asahi Sakuma, Hiroaki Sato, Ryuga Sugano, Tadashi Kumano, Yoshihiko Kawai, Tetsuji Ogawa |
| 2025 | Speaker-agnostic Emotion Vector for Cross-speaker Emotion Intensity Control. Masato Murata, Koichi Miyazaki, Tomoki Koriyama |
| 2025 | Speaker-specific Patterns of Phonetic Covariation in Korean Word-medial Stops and the Role of Phonological and Morphological Contexts. Chloe D. Kwon |
| 2025 | SpecTokenizer: A Lightweight Streaming Codec in the Compressed Spectrum Domain. Zixiang Wan, Guochang Zhang, Yifeng He, Jianqiang Wei |
| 2025 | Spectrotemporal Modulation: Efficient and Interpretable Feature Representation for Classifying Speech, Music, and Environmental Sounds. Andrew Chang, Yike Li, Iran R. Roman, David Poeppel |
| 2025 | Speech Annotation for A: Accuracy, Access, and Application. Zirong Li, Hongchen Wu, Yixin Gu, Yao Du, Yang Yue |
| 2025 | Speech Enhancement based on cascaded two flows. Seonggyu Lee, Sein Cheong, Sangwook Han, Kihyuk Kim, Jong Won Shin |
| 2025 | Speech Enhancement with Dual-path Multi-Channel Linear Prediction Filter and Multi-norm Beamforming. Chengyuan Qin, Wenmeng Xiong, Jing Zhou, Maoshen Jia, Changchun Bao |
| 2025 | Speech Kinematic Analysis from Acoustics: Scientific, Clinical and Practical Applications. Carol Y. Espy-Wilson |
| 2025 | Speech LLMs in Low-Resource Scenarios: Data Volume Requirements and the Impact of Pretraining on High-Resource Languages. Seraphina Fong, Marco Matassoni, Alessio Brutti |
| 2025 | Speech Mutil-label Emotion Recognition Using Asymmetric Class Loss Function Based on Effective Samples. Shanshan Xiang, Hankiz Yilahun, Askar Hamdulla |
| 2025 | Speech Reduction in French: The Relationship Between Vowel Space and Articulation Dynamics. Kübra Bodur, Corinne Fredouille, Christine Meunier |
| 2025 | Speech Reference Intervals: An Assessment of Feasibility in Depression Symptom Severity Prediction. Lauren L. White, Ewan Carr, Judith Dineley, Catarina Botelho, Pauline Conde, Faith Matcham, Carolin Oetzmann, Amos Folarin, George Fairs, Agnes Norbury, Stefano Goria, Srinivasan Vairavan, Til Wykes, Richard J. B. Dobson, Vaibhav A. Narayan, Matthew Hotopf, Alberto Abad, Isabel Trancoso, Nicholas Cummins |
| 2025 | Speech Unlearning. Jiali Cheng, Hadi Amiri |
| 2025 | Speech and Text Foundation Models for Depression Detection: Cross-Task and Cross-Language Evaluation. Lucía Gómez-Zaragozá, Javier Marín-Morales, Mariano Alcañiz, Mohammad Soleymani |
| 2025 | Speech power spectra: a window into neural oscillations in Parkinson's disease. Sevada Hovsepyan, Mathew Magimai-Doss |
| 2025 | Speech stimulus design to study the neural coding of speech and the impact of cochlear synaptopathy. Etienne Gaudrain, Sarah Verhulst, Deniz Baskent |
| 2025 | Speech transcription from South Tyrolean Dialect to Standard German with Whisper. Luca Ducceschi, Greta H. Franzini |
| 2025 | Speech-Based Automatic Chronic Kidney Disease Diagnosis via Transformer Fusion of Glottal and Spectrogram Features. Jihyun Mun, Minhwa Chung, Sunhee Kim |
| 2025 | Speech-IFEval: Evaluating Instruction-Following and Quantifying Catastrophic Forgetting in Speech-Aware Language Models. Ke-Han Lu, Chun-Yi Kuan, Hung-yi Lee |
| 2025 | Speech-guided Grapheme-to-Phoneme Conversion for Cantonese Text-to-Speech. Timothy Shin Heng Mak, King Yiu Suen, Albert Y. S. Lam |
| 2025 | Speech-to-Text Translation with Phoneme-Augmented CoT: Enhancing Cross-Lingual Transfer in Low-Resource Scenarios. Gerard I. Gállego, Oriol Pareras, Martí Cortada Garcia, Lucas Takanori, Javier Hernando |
| 2025 | SpeechDialogueFactory: A Framework for Natural Speech Dialogue Generation. Minghan Wang, Ye Bai, Yuxia Wang, Thuy-Trang Vu, Ehsan Shareghi, Gholamreza Haffari |
| 2025 | SpeechMLC: Speech Multi-label Classification. Miseul Kim, Seyun Um, Hyeonjin Cha, Hong-Goo Kang |
| 2025 | SpeechRefiner: Towards Perceptual Quality Refinement for Front-End Algorithms. Sirui Li, Shuai Wang, Zhijun Liu, Zhongjie Jiang, Yannan Wang, Haizhou Li |
| 2025 | SpeechSEC: A Unified Multi-Task Framework for Speech Synthesis, Editing, and Continuation. Liming Liang, Dongchao Yang, Xianwei Zhuang, Yuxin Xie, Luo Chen, Yuehan Jin, Yuexian Zou |
| 2025 | Speechless: Speech Instruction Training Without Speech for Low Resource Languages. Alan Dao, Dinh Bach Vu, Huy Hoang Ha, Tuan Le Duc Anh, Shreyas Gopal, Yue Heng Yeo, Warren Keng Hoong Low, Eng Siong Chng, Jia Qi Yip |
| 2025 | Spoken Language Modeling with Duration-Penalized Self-Supervised Units. Nicol Visser, Herman Kamper |
| 2025 | Spoken Language Understanding on Unseen Tasks With In-Context Learning. Neeraj Agrawal, Sriram Ganapathy |
| 2025 | Spoken Question Answering for Visual Queries. Nimrod Shabtay, Zvi Kons, Avihu Dekel, Hagai Aronowitz, Ron Hoory, Assaf Arbelle |
| 2025 | SpokenNativQA: Multilingual Everyday Spoken Queries for LLMs. Firoj Alam, Md. Arid Hasan, Shammur Absar Chowdhury |
| 2025 | Spot and Merge: A Hybrid Context Biasing Approach for Rare Word and Out of Vocabulary Recognition. Jatin Agrawal, Bramhendra Koilakuntla, Srikanth Konjeti |
| 2025 | Spotlight-TTS: Spotlighting the Style via Voiced-Aware Style Extraction and Style Direction Adjustment for Expressive Text-to-Speech. Nam-Gyu Kim, Deok-Hyeon Cho, Seung-Bin Kim, Seong-Whan Lee |
| 2025 | Stack Less, Repeat More: A Block Reusing Approach for Progressive Speech Enhancement. Jangyeon Kim, Ui-Hyeop Shin, Jaehyun Ko, Hyung-Min Park |
| 2025 | StarGAN-Aug: A Cross-domain Fault Audio Generation Method for High-performance Fault Diagnosis of Power Transformers. Ben Niu, Yangjie Wei, Gang Yang, Yuqiao Wang, Shengling Yu |
| 2025 | StarVC: A Unified Auto-Regressive Framework for Joint Text and Speech Generation in Voice Conversion. Fengjin Li, Jie Wang, Yadong Niu, Yongqing Wang, Meng Meng, Jian Luan, Zhiyong Wu |
| 2025 | Steering Deep Non-Linear Spatially Selective Filters for Weakly Guided Extraction of Moving Speakers in Dynamic Scenarios. Jakob Kienegger, Timo Gerkmann |
| 2025 | Streaming Non-Autoregressive Model for Accent Conversion and Pronunciation Improvement. Tuan-Nam Nguyen, Ngoc-Quan Pham, Seymanur Akti, Alexander Waibel |
| 2025 | Streaming Sortformer: Speaker Cache-Based Online Speaker Diarization with Arrival-Time Ordering. Ivan Medennikov, Taejin Park, Weiqing Wang, He Huang, Kunal Dhawan, Jinhan Wang, Jagadeesh Balam, Boris Ginsburg |
| 2025 | Stress in Spoken and Whistled Greek. Andre Batchelder-Schwab, Vasileios Michos, Jonathan Barnes |
| 2025 | Structured Codebook Based Hierarchical Framework for DNN for Computationally Efficient Speech Enhancement. Chidambar B, Hanumanth Rao Naidu |
| 2025 | Structured pruning for efficient systolic array accelerated cascade Speech-to-Text Translation. Jean-Luc Rouas, Charles Brazier, Leila Ben Letaifa, Rafael Medina, Pedro Palacios, David Atienza, Giovanni Ansaloni |
| 2025 | Study of vocal fold vibration using M-mode ultrasound: a proof of concept. Juliette Dindart, Agnès Rouxel, Crystal Lin, Trung Kien Bui, Muriel Lefort, Claire Pillot-Loiseau, Christophe Trésallet, Frédérique Frouin |
| 2025 | StutterCut: Uncertainty-Guided Normalised Cut for Dysfluency Segmentation. Suhita Ghosh, Mélanie Jouaiti, Jan-Ole Perschewski, Sebastian Stober |
| 2025 | Stuttering Detection Based on Self-Attention Weights of Temporal Acoustic Vector Sequence. Genzo Miyahara, Tsuneo Kato, Akihiro Tamura |
| 2025 | SuPseudo: A Pseudo-supervised Learning Method for Neural Speech Enhancement in Far-field Speech Recognition. Longjie Luo, Lin Li, Qingyang Hong |
| 2025 | Sub-band based Adaptive IIR Algorithm with Biquad Filter Stability Constraints for Feedforward Hear-Through Equalization. Rishabh Gupta, MLNS Karthik, Omsrinath Chelamkuri |
| 2025 | Subtyping Speech Errors in Childhood Speech Sound Disorders with Acoustic-to-Articulatory Speech Inversion. Nina R. Benway, Saba Tabatabaee, Benjamin Munson, Jonathan Preston, Carol Y. Espy-Wilson |
| 2025 | SupraDoRAL: Automatic Word Prominence Detection Using Suprasegmental Dependencies of Representations with Acoustic and Linguistic Context. Jhansi Mallela, Upendra Vishwanath Y. S., Sankara Bharadwaj Rangavajjala, Bhaskar Bhatt, Chiranjeevi Yarra |
| 2025 | Supralaryngeal Kinematics of Implosives in Central Vietnamese: An EMA Study. Paul McGuire, Kye Shibata, Thanh Viet Cao, Feng-fan Hsieh, Yueh-Chin Chang |
| 2025 | Swedish Whispers; Leveraging a Massive Speech Corpus for Swedish Speech Recognition. Leonora Vesterbacka, Faton Rekathati, Robin Kurtz, Justyna Sikora, Agnes Toftgård |
| 2025 | Switch Conformer with Universal Phonetic Experts for Multilingual ASR. Masato Mimura, Jaeyoung Lee, Tatsuya Kawahara |
| 2025 | SynHate: Detecting Hate Speech in Synthetic Deepfake Audio. Rishabh Ranjan, Kishan Pipariya, Mayank Vatsa, Richa Singh |
| 2025 | Synchronous analysis of abnormal acoustic and linguistic production in Parkinson's speech. Daniel Escobar-Grisales, Cristian David Ríos-Urrego, Sabato Marco Siniscalchi, Adolfo M. García, Yamile Bocanegra, Leonardo Moreno, Elmar Nöth, Juan Rafael Orozco-Arroyave |
| 2025 | Synonymity-Based Semantic Coding for Efficient Speech Compression. Shanhui Gan, Zijian Liang, Kai Niu, Ping Zhang |
| 2025 | Synthesizing Speech with Selected Perceptual Voice Qualities - A Case Study with Creaky Voice. Frederik Rautenberg, Fritz Seebauer, Jana Wiechmann, Michael Kuhlmann, Petra Wagner, Reinhold Haeb-Umbach |
| 2025 | Synthetic Data Generation for Phrase Break Prediction with Large Language Model. Hoyeon Lee, Sejung Son, Ye-Eun Kang, Jong-Hwan Kim |
| 2025 | Synthetic Dysarthric Speech: A Supplement, Not a Substitute for Authentic Data in Dysarthric Speech Recognition. Jingting Li, Keyi Feng, Xinran Zhao, Yan Wang, Su-Jing Wang |
| 2025 | Synthetic Speech Source Tracing using Metric Learning. Dimitrios Koutsianos, Stavros Zacharopoulos, Yannis Panagakis, Themos Stafylakis |
| 2025 | TA-RIR: Topology-Aware Neural Modeling of Acoustic Propagation for Room Impulse Response Synthesis. Junhui Zhao, Hang Chen, Qing Wang, Jun Du, Yanhui Tu, Feng Ma |
| 2025 | TADA: Training-free Attribution and Out-of-Domain Detection of Audio Deepfakes. Adriana Stan, David Combei, Dan Oneata, Horia Cucu |
| 2025 | TELVID: A Multilingual Multi-modal Corpus for Speaker Recognition. Karen Jones, Kevin Walker, Christopher Caruso, Elliot Singer, Trang Nguyen, Robert B. Dunn, Stephanie M. Strassel |
| 2025 | TF-Mamba: A Time-Frequency Network for Sound Source Localization. Yang Xiao, Rohan Kumar Das |
| 2025 | TF-SkiMNet: Speech Enhancement Based on Inplace Modeling and Skipping Memory in Time-Frequency Domain. Zixuan Li, Shulin He, Jinglin Bai, Xueliang Zhang |
| 2025 | TS-URGENet: A Three-stage Universal Robust and Generalizable Speech Enhancement Network. Xiaobin Rong, Dahan Wang, Qinwen Hu, Yushi Wang, Yuxiang Hu, Jing Lu |
| 2025 | TS3-Codec: Transformer-Based Simple Streaming Single Codec. Haibin Wu, Naoyuki Kanda, Sefik Emre Eskimez, Jinyu Li |
| 2025 | TSDT-Net: Ultra-Low-Complexity Two-Stage Model Combining Dual-Path-Transformer and Transform-Average-Concatenate Network for Speech Enhancement. Yi Gao, Hangting Chen, Siyu Zhang, Qingshan Yang, Jingcong Chen |
| 2025 | TTMBA: Towards Text To Multiple Sources Binaural Audio Generation. Yuxuan He, Xiaoran Yang, Ningning Pan, Gongping Huang |
| 2025 | TVC-MusicGen: Time-Varying Structure Control for Background Music Generation via Self-Supervised Training. Chenyu Yang, Hangting Chen, Shuai Wang, Haina Zhu, Haizhou Li |
| 2025 | TalTech Systems for the Interspeech 2025 ML-SUPERB 2.0 Challenge. Tanel Alumäe, Artem Fedorchenko |
| 2025 | Talker Normalization in Chinese Bilinguals: A Comparative Study. Mingxi Lu, Ran Tao, Yujia Tian |
| 2025 | TargetVoice: Single Channel Low-Latency Target Speaker Extraction. Arun Kumar Pallala, Nivedita Chennupati, Balaji Padmanaban, Rakesh Pogula, Uma Subhashini Ravuri, Naveen Ellanki, Harish Rajamani, Naveen Ambati |
| 2025 | Teacher-Free Knowledge Distillation for Improving Short-Utterance Spoken Language Identification. Spandan Dey, Hirak Mondal, Sanjay Kumar Kurmi |
| 2025 | Teaching Audio-Aware Large Language Models What Does Not Hear: Mitigating Hallucinations through Synthesized Negative Samples. Chun-Yi Kuan, Hung-yi Lee |
| 2025 | Temp4Cap: Temporally-aligned Automated Audio Captioning. Ho-Young Choi, Jae-Heung Cho, Pil Moo Byun, Won-Gook Choi, Joon-Hyuk Chang |
| 2025 | Temporal Convolutional Network with Smoothed and Weighted Losses for Distant Voice Activity and Overlapped Speech Detection. Shaojie Li, Qintuya Si, De Hu |
| 2025 | Temporal Modeling of Room Impulse Response Generation via Multi-Scale Autoregressive Learning. Sheng Lyu, Yuemin Yu, Chenshu Wu |
| 2025 | Temporal organization of prenuclear glides in Hefei Mandarin. Yifan Yang, Zhiheng Qian |
| 2025 | Test-Time Training for Speech Enhancement. Avishkar Behera, Riya Ann Easow, Venkatesh Parvathala, K. Sri Rama Murty |
| 2025 | Test-Time Training for Speech-based Depression Detection. Sri Harsha Dumpala, Chandramouli Shama Sastry, Rudolf Uher, Sageev Oore |
| 2025 | Text Entry for All: Towards Speech-based Multimodal Interaction for Inclusion, Accessibility and the Preservation of the World's Linguistic Heritage. Julián Zapata, Lara Hanna |
| 2025 | Text-Enhanced Audio Encoder for Large Language Model based Speech Recognition via Cross-Modality Pre-training with Unpaired Audio-Text Data. Hang Su, Yuxiang Kong, Lichun Fan, Jian Luan |
| 2025 | Thai Speech Spoofing Detection Dataset with Variations in Speaking Styles. Ticho Urai, Pachara Boonsarngsuk, Ekapol Chuangsuwanich |
| 2025 | The 1st SpeechWellness Challenge: Detecting Suicide Risk Among Adolescents. Wen Wu, Ziyun Cui, Chang Lei, Yinan Duan, Diyang Qu, Ji Wu, Bowen Zhou, Runsen Chen, Chao Zhang |
| 2025 | The 2024 NIST Speaker Recognition Evaluation. Craig S. Greenberg, Lukas L. Diduch, Audrey Tong, Elliot Singer, Trang Nguyen, Robert Dunn, Lisa P. Mason, Beth Matys |
| 2025 | The Development of Speech Rhythm in Putonghua-Learning Preschool Children in South Xinjiang Uyghur Autonomous Region of China. Aijun Li, Zhiwei Wang, Jun Gao, Xin Zhou |
| 2025 | The Effect of Word Predictability on Spoken Cross-Language Intelligibility. Wei Xue, Iuliia Zaitova, Bernd Möbius |
| 2025 | The Faetar Speech Recognition Benchmark. Michael Ong, Sean Robertson, Leo Peckham, Alba Jorquera Jimenez de Aberasturi, Paula Arkhangorodsky, Robin Huo, Aman Sakhardande, Mark Hallap, Naomi Nagy, Ewan Dunbar |
| 2025 | The Interspeech 2025 Challenge on Speech Emotion Recognition in Naturalistic Conditions. Abinay Reddy Naini, Lucas Goncalves, Ali N. Salman, Pravin Mote, Ismail Rasim Ulgen, Thomas Thebaud, Laureano Moro-Velázquez, Leibny Paola García, Najim Dehak, Berrak Sisman, Carlos Busso |
| 2025 | The Interspeech 2025 Speech Accessibility Project Challenge. Xiuwen Zheng, Bornali Phukon, Jonghwan Na, Ed Cutrell, Kyu J. Han, Mark Hasegawa-Johnson, Pan-Pan Jiang, Aadhrik Kuila, Colin Lea, Bob MacDonald, Gautam Varma Mantena, Venkatesh Ravichandran, Leda Sari, Katrin Tomanek, Chang D. Yoo, Chris Zwilling |
| 2025 | The ML-SUPERB 2.0 Challenge: Towards Inclusive ASR Benchmarking for All Language Varieties. William Chen, Chutong Meng, Jiatong Shi, Martijn Bartelds, Shih-Heng Wang, Hsiu-Hsuan Wang, Rafael Mosquera, Sara Hincapie, Dan Jurafsky, Antonis Anastasopoulos, Hung-yi Lee, Karen Livescu, Shinji Watanabe |
| 2025 | The Multimodal Information Based Speech Processing (MISP) 2025 Challenge: Audio-Visual Diarization and Recognition. Ming Gao, Shilong Wu, Hang Chen, Jun Du, Chin-Hui Lee, Shinji Watanabe, Jingdong Chen, Sabato Marco Siniscalchi, Odette Scharenborg |
| 2025 | The NaijaVoices Dataset: Cultivating Large-Scale, High-Quality, Culturally-Rich Speech Data for African Languages. Chris Emezue, NaijaVoices Community, Busayo Awobade, Abraham Toluwase Owodunni, Handel Emezue, Gloria Monica Tobechukwu Emezue, Nefertiti Nneoma Emezue, Sewade Ogun, Bunmi Akinremi, David Ifeoluwa Adelani, Chris Pal |
| 2025 | The Prosodic Characteristics of Standard Chinese Rhetorical Questions in Naturalistic Settings. Shuwen Chen, Qingke Sun, Yue Huang, Yingyi Luo |
| 2025 | The Role of Contextual Variation in Learning Cantonese Tones from Naturalistic Speech. Fengyue Lisa Zhao, Jennifer Kuo |
| 2025 | The Role of Syntactic Structures in Shaping Directionality in Trisyllabic Tone Sandhi: Evidence from Tianjin Mandarin. Siqi Lu, Hui Feng, Ziyu Xiong |
| 2025 | The Role of Voiced Consonant Duration in Sung Vowel-Consonant and Consonant-Vowel Recognition. Allan Vurma, Einar Meister, Lya Meister, Jaan Ross, Marju Raju, Veeda Kala, Tuuri Dede |
| 2025 | The Speech Accessibility Project: Best Practices for Collection and Curation of Disordered Speech. Chris Zwilling, Mark Hasegawa-Johnson, Heather Hodges, Lorraine O. Ramig, Adina Bradshaw, Clarion Mendes, Heejin Kim, Alexandria Barkhimer, Laura Mattie, Meg Dickinson, Shawnise Carter, Marie Moore Channell |
| 2025 | The State Of TTS: A Case Study with Human Fooling Rates. Praveen Srinivasa Varadhan, Sherry Thomas, Sai Teja M. S., Suvrat Bhooshan, Mitesh M. Khapra |
| 2025 | The Sub-3Sec Problem: From Text-Independent to Text-Dependent Corpus. Ruichen Zuo, Kong Aik Lee, Zilong Huang, Man-Wai Mak |
| 2025 | The Text-to-speech in the Wild (TITW) Database. Jee-weon Jung, Wangyou Zhang, Soumi Maiti, Yihan Wu, Xin Wang, Ji-Hoon Kim, Yuta Matsunaga, Seyun Um, Jinchuan Tian, Hye-jin Shim, Nicholas W. D. Evans, Joon Son Chung, Shinnosuke Takamichi, Shinji Watanabe |
| 2025 | The function of creaky voice in South Korean: A perception study. Patrik Hrabánek, Michaela Watkins, Silke Hamann |
| 2025 | The mutual exclusivity bias of bilingual visually grounded speech models. Dan Oneata, Leanne Nortje, Yevgen Matusevych, Herman Kamper |
| 2025 | The role of audio-visual integration in the time course of phonetic encoding in self-supervised speech models. Yi Wang, Oli Danyi Liu, Peter Bell |
| 2025 | Theoretical proposal for a unified Bayesian model of adaptation in non-interactive and interactive speech production. Mélen Guillaume, Anahita Basirat, Julien Diard |
| 2025 | Thinking Fast and Slow: Robust Speech Recognition via Deep Filter-Tuning. Dianwen Ng, Kun Zhou, Bin Ma, Eng Siong Chng |
| 2025 | Thinking in Directivity: Speech Large Language Model for Multi-Talker Directional Speech Recognition. Jiamin Xie, Ju Lin, Yiteng Huang, Tyler Vuong, Zhaojiang Lin, Zhaojun Yang, Peng Su, Prashant Rawat, Sangeeta Srivastava, Ming Sun, Florian Metze |
| 2025 | TinyClick: Single-Turn Agent for Empowering GUI Automation. Pawel Pawlowski, Krystian Zawistowski, Wojciech Lapacz, Adam Wiacek, Marcin Skorupa, Sebastien Postansque, Jakub Hoscilowicz |
| 2025 | Token-Level Logits Matter: A Closer Look at Speech Foundation Models for Ambiguous Emotion Recognition. Jule Valendo Halim, Siyi Wang, Hong Jia, Ting Dang |
| 2025 | Tonal Contrasts in the Malipo Variety of the Mienic Language. Changhong Du, Fang Hu |
| 2025 | Tonal Perception in Changde Mandarin. Zhenrui Zhang, Fang Hu |
| 2025 | Tonal Variation and Word Meaning in Taiwanese. Yu-Ying Chuang, Sheng-Fu Wang |
| 2025 | Tonality-Based Accompaniment-Guided Automatic Singing Evaluation. Pei-Chin Hsieh, Yih-Liang Shen, Ngoc Son Tran, Tai-Shih Chi |
| 2025 | Tone recognition in low-resource languages of North-East India: peeling the layers of SSL-based speech models. Parismita Gogoi, Sishir Kalita, Wendy Lalhminghlui, Viyazonuo Terhiija, Moakala Tzudir, Priyankoo Sarmah, S. R. M. Prasanna |
| 2025 | Towards Accurate Phonetic Error Detection Through Phoneme Similarity Modeling. Xuanru Zhou, Jiachen Lian, Cheol Jun Cho, Tejas S. Prabhune, Shuhe Li, William Li, Rodrigo Ortiz, Zoe Ezzes, Jet Vonk, Brittany Morin, Rian Bogley, Lisa Wauters, Zachary A. Miller, Maria Luisa Gorno-Tempini, Gopala Anumanchipalli |
| 2025 | Towards Adaptable and Intelligible Speech Synthesis in Noisy Environments. Lubos Marcinek, Jonas Beskow, Joakim Gustafson |
| 2025 | Towards Better Disentanglement in Non-Autoregressive Zero-Shot Expressive Voice Conversion. Seymanur Akti, Tuan-Nam Nguyen, Alexander Waibel |
| 2025 | Towards Bitrate-Efficient and Noise-Robust Speech Coding with Variable Bitrate RVQ. Yunkee Chae, Kyogu Lee |
| 2025 | Towards Classification of Typical and Atypical Disfluencies: A Self Supervised Representation Approach. Priyanka Kommagouni, Pragya Khanna, Vamshiraghusimha Narasinga, Anirudh Bocha, Anil Kumar Vuppala |
| 2025 | Towards Diverse and Efficient Audio Captioning via Diffusion Models. Manjie Xu, Chenxing Li, Yong Ren, Xinyi Tu, Ruibo Fu, Wei Liang, Dong Yu |
| 2025 | Towards Domain-Specific Spoken Language Understanding for a Catalan Voice-Controlled Video Game. Alex Peiró Lilja, Rodolfo Zevallos, Carme Armentano-Oller, José Giraldo, Cristina España-Bonet, Mireia Farrús |
| 2025 | Towards Early Prediction of Self-Supervised Speech Model Performance. Ryan Whetten, Lucas Maison, Titouan Parcollet, Marco Dinarelli, Yannick Estève |
| 2025 | Towards Efficiently Whisper Fine-tuning with Monotonic Alignments. Ziyang Zhuang, Tao Wei, Ming Fang, Ning Cheng, Shaojun Wang, Jing Xiao |
| 2025 | Towards Emotionally Consistent Text-Based Speech Editing: Introducing EmoCorrector and The ECD-TSE Dataset. Rui Liu, Pu Gao, Jiatian Xi, Berrak Sisman, Carlos Busso, Haizhou Li |
| 2025 | Towards Few-Shot Training-Free Anomaly Sound Detection. Ho-Hsiang Wu, Wei-Cheng Lin, Abinaya Kumar, Luca Bondi, Shabnam Ghaffarzadegan, Juan Pablo Bello |
| 2025 | Towards Frame-level Quality Predictions of Synthetic Speech. Michael Kuhlmann, Fritz Seebauer, Petra Wagner, Reinhold Haeb-Umbach |
| 2025 | Towards Fusion of Neural Audio Codec-based Representations with Spectral for Heart Murmur Classification via Bandit-based Cross-Attention Mechanism. Orchid Chetia Phukan, Girish, Mohd Mujtaba Akhtar, Swarup Ranjan Behera, Priyabrata Mallick, Santanu Roy, Arun Balaji Buduru, Rajesh Sharma |
| 2025 | Towards High-Quality LLM-Based Data for French Spontaneous Speech Simplification: an Exo-Refinement Approach. Lucia Ormaechea Grijalba, Nikos Tsourakis, Pierrette Bouillon, Benjamin Lecouteux, Didier Schwab |
| 2025 | Towards Human-like Multimodal Conversational Agent by Generating Engaging Speech. Taesoo Kim, Yongsik Jo, Hyunmin Song, Taehwan Kim |
| 2025 | Towards Inclusive ASR: Investigating Voice Conversion for Dysarthric Speech Recognition in Low-Resource Languages. Chin-Jou Li, Eunjung Yeo, Kwanghee Choi, Paula Andrea Pérez-Toro, Masao Someki, Rohan Kumar Das, Zhengjun Yue, Juan Rafael Orozco-Arroyave, Elmar Nöth, David R. Mortensen |
| 2025 | Towards Inclusive and Fair ASR: Insights from the SAPC Challenge for Optimizing Disordered Speech Recognition. Nada Gohider, Otman Basir |
| 2025 | Towards LLM-Empowered Fine-Grained Speech Descriptors for Explainable Emotion Recognition. Youjun Chen, Xurong Xie, Haoning Xu, Mengzhe Geng, Guinan Li, Chengxi Deng, Huimeng Wang, Shujie Hu, Xunying Liu |
| 2025 | Towards Machine Unlearning for Paralinguistic Speech Processing. Orchid Chetia Phukan, Girish, Mohd Mujtaba Akhtar, Shubham Singh, Swarup Ranjan Behera, Vandana Rajan, Muskaan Singh, Arun Balaji Buduru, Rajesh Sharma |
| 2025 | Towards Multi-Level Transcript Segmentation: LoRA Fine-Tuning for Table-of-Contents Generation. Steffen Freisinger, Philipp Seeberger, Thomas Ranzenberger, Tobias Bocklet, Korbinian Riedhammer |
| 2025 | Towards One-bit ASR: Extremely Low-bit Conformer Quantization Using Co-training and Stochastic Precision. Zhaoqing Li, Haoning Xu, Zengrui Jin, Lingwei Meng, Tianzi Wang, Huimeng Wang, Youjun Chen, Mingyu Cui, Shujie Hu, Xunying Liu |
| 2025 | Towards Personalised Audio Visual Speech Enhancement. Mandar Gogate, Kia Dashtipour, Amir Hussain |
| 2025 | Towards Pre-training an Effective Respiratory Audio Foundation Model. Daisuke Niizumi, Daiki Takeuchi, Masahiro Yasuda, Binh Thien Nguyen, Yasunori Ohishi, Noboru Harada |
| 2025 | Towards Robust Overlapping Speech Detection: A Speaker-Aware Progressive Approach Using WavLM. Zhaokai Sun, Li Zhang, Qing Wang, Pan Zhou, Lei Xie |
| 2025 | Towards Robust Speaker Recognition against Intrinsic Variation with Foundation Model Few-shot Tuning and Effective Speech Synthesis. Zhiyong Chen, Shuhang Wu, Xinnuo Li, Zhiqi Ai, Shugong Xu |
| 2025 | Towards Secure User Authentication for Headphones via In-Ear or In-Earcup Microphones. N. Shashaank, Xiao Quan, Andrew Kaluzny, Leonard Varghese, Marko Stamenovic, Chuan-Che Huang |
| 2025 | Towards Sentence Level Imagined Speech Generation from EEG signals. Sparsh Rastogi, Harsh Dadwal, Khushboo Modi, Jatin Bedi, Jasmeet Singh |
| 2025 | Towards Source Attribution of Singing Voice Deepfake with Multimodal Foundation Models. Orchid Chetia Phukan, Girish, Mohd Mujtaba Akhtar, Swarup Ranjan Behera, Priyabrata Mallick, Pailla Balakrishna Reddy, Arun Balaji Buduru, Rajesh Sharma |
| 2025 | Towards Temporally Explainable Dysarthric Speech Clarity Assessment. Seohyun Park, Chitralekha Gupta, Michelle Kah Yian Kwan, Xinhui Fung, Alexander Wenjun Yip, Suranga Nanayakkara |
| 2025 | Towards a Japanese Full-duplex Spoken Dialogue System. Atsumoto Ohashi, Shinya Iizuka, Jingjing Jiang, Ryuichiro Higashinaka |
| 2025 | Towards a Unified Benchmark for Arabic Pronunciation Assessment: Qur'anic Recitation as Case Study. Yassine El Kheir, Omnia Ibrahim, Amit Meghanani, Nada AlMarwani, Hawau Olamide Toyin, Sadeen Alharbi, Modar Alfadly, Lamya Alkanhal, Ibrahim Selim, Shehab Elbatal, Salima Mdhaffar, Thomas Hain, Yasser Hifny, Mostafa Shahin, Ahmed Ali |
| 2025 | Towards a dynamical model of transitions between fluent and stuttered speech. Yijing Lu, Khalil Iskarous, Louis Goldstein |
| 2025 | Towards an Ultra-Low-Delay Neural Audio Coding with Computational Efficiency. Byeong Hyeon Kim, Hyungseob Lim, Inseon Jang, Hong-Goo Kang |
| 2025 | Towards atypical speech transcription using LLM-based ASR. Jinda Zhang, Aanchan Mohan |
| 2025 | Towards the Objective Characterisation of Major Depressive Disorder Using Speech Data from a 12-week Observational Study with Daily Measurements. Robert Lewis, Szymon Fedor, Nelson Hidalgo Julia, Joshua Curtiss, Jiyeon Kim, Noah Jones, David Mischoulon, Thomas F. Quatieri, Nicholas Cummins, Paola Pedrelli, Rosalind W. Picard |
| 2025 | ToxicTone: A Mandarin Audio Dataset Annotated for Toxicity and Toxic Utterance Tonality. Yu-Xiang Luo, Yi-Cheng Lin, Ming-To Chuang, Jia-Hung Chen, I-Ning Tsai, Pei Xing Kiew, Yueh-Hsuan Huang, Chien-Feng Liu, Yu-Chen Chen, Bo-Han Feng, Wenze Ren, Hung-yi Lee |
| 2025 | Tracking /r/ Deletion: Forced Alignment of Pronunciation Variants and Sociophonetic Insights into Post-Obstruent Final /r/ in French. Anisia Popescu, Lori Lamel, Marc Evrard, Ioana Vasilescu |
| 2025 | Training Articulatory Inversion Models for Interspeaker Consistency. Charles McGhee, Mark J. F. Gales, Kate M. Knill |
| 2025 | Training Onset-and-Offset-Aware Sound Event Detection on a Heterogeneous Dataset via Probabilistic Sequential Modeling. Tomoya Yoshinaga, Yoshiaki Bando, Keitaro Tanaka, Keisuke Imoto, Masaki Onishi, Shigeo Morishima |
| 2025 | Training-Free Voice Conversion with Factorized Optimal Transport. Alexander Lobashev, Assel Yermekova, Maria A. Larchenko |
| 2025 | Transcribing Diverse Voices: Using Whisper for ICE corpora. Andreas Weilinghoff |
| 2025 | Transcribing Oral History Recordings Using the Transcription Portal. Christoph Draxler, Julian Pömp, Henk van den Heuvel, Fabio Ardolino, Arjan van Hessen |
| 2025 | Transcript-Prompted Whisper with Dictionary-Enhanced Decoding for Japanese Speech Annotation. Rui Hu, Xiaolong Lin, Jiawang Liu, Shixi Huang, Zhenpeng Zhan |
| 2025 | Triadic Multi-party Voice Activity Projection for Turn-taking in Spoken Dialogue Systems. Mikey Elmers, Koji Inoue, Divesh Lala, Tatsuya Kawahara |
| 2025 | Tungnaá In Live Performance: An Implementation Of Interactive Artistic Text-To-Voice. Victor Shepardson, Jonathan Reus, Thor Magnusson |
| 2025 | Turing's Echo: Investigating Linguistic Sensitivity of Deepfake Voice Detection via Gamification. Binh Nguyen, Thai Le |
| 2025 | U-SAM: An Audio Language Model for Unified Speech, Audio, and Music Understanding. Ziqian Wang, Xianjun Xia, Xinfa Zhu, Lei Xie |
| 2025 | Ultra-Low Bit Post-Training Quantization of Large Speech Models via K-Means Clustering and Mixed Precision Allocation. Tianteng Gu, Bei Liu, Haoyu Wang, Yanmin Qian |
| 2025 | Understanding Dementia Speech Alignment with Diffusion-Based Image Generation. Mansi, Anastasios Lepipas, Dominika C. Woszczyk, Yiying Guan, Soteris Demetriou |
| 2025 | Unfolding A Few Structures for The Many: Memory-Efficient Compression of Conformer and Speech Foundation Models. Zhaoqing Li, Haoning Xu, Xurong Xie, Zengrui Jin, Tianzi Wang, Xunying Liu |
| 2025 | Uni-VERSA: Versatile Speech Assessment with a Unified Network. Jiatong Shi, Hye-jin Shim, Shinji Watanabe |
| 2025 | Unified Audio-Visual Modeling for Recognizing Which Face Spoke When and What in Multi-Talker Overlapped Speech and Video. Naoki Makishima, Naotaka Kawata, Taiga Yamane, Mana Ihori, Tomohiro Tanaka, Satoshi Suzuki, Shota Orihashi, Ryo Masumura |
| 2025 | Unified Microphone Conversion: Many-to-Many Device Mapping via Feature-wise Linear Modulation. Myeonghoon Ryu, Hongseok Oh, Suji Lee, Han Park |
| 2025 | Unified Semi-Supervised Pipeline for Automatic Speech Recognition. Nune Tadevosyan, Nikolay Karpov, Andrei Andrusenko, Vitaly Lavrukhin, Ante Jukic |
| 2025 | Unified Text and Speaker Verification using SSL model for Text-Dependent Speaker Verification. Nathan Griot, Driss Matrouf, Raphaël Blouet, Jean-François Bonastre, Ana Mantecon |
| 2025 | Unified Variational and Physics-aware Model for Room Impulse Response Estimation. Louis Lalay, Mathieu Fontaine, Roland Badeau |
| 2025 | Unifying Listener Scoring Scales: Comparison Learning Framework for Speech Quality Assessment and Continuous Speech Emotion Recognition. Cheng-Hung Hu, Yusuke Yasuda, Akifumi Yoshimoto, Tomoki Toda |
| 2025 | Universal Preference-Score-based Pairwise Speech Quality Assessment. Yu-Fei Shi, Yang Ai, Zhen-Hua Ling |
| 2025 | Universal Semantic Disentangled Privacy-preserving Speech Representation Learning. Biel Tura Vecino, Subhadeep Maji, Aravind Varier, Antonio Bonafonte, Ivan Valles, Michael Owen, Constantinos Papayiannis, Leif Rädel, Grant P. Strimel, Oluwaseyi Feyisetan, Roberto Barra-Chicote, Ariya Rastrow, Volker Leutnant, Trevor Wood |
| 2025 | Universal Speech Enhancement with Regression and Generative Mamba. Rong Chao, Rauf Nasretdinov, Yu-Chiang Frank Wang, Ante Jukic, Szu-Wei Fu, Yu Tsao |
| 2025 | Unlearning LLM-Based Speech Recognition Models. Zhe Liu |
| 2025 | Unleashing the Inner Monster: Demonstrating High-Fidelity Human to Non-Human Voice Conversion. Namhyun Cho, Sunmin Kim, Minsu Kang, Seolhee Lee, Choonghyeon Lee, Yangsun Lee |
| 2025 | Unlocking Temporal Flexibility: Neural Speech Codec with Variable Frame Rate. Hanglei Zhang, Yiwei Guo, Zhihan Li, Xiang Hao, Xie Chen, Kai Yu |
| 2025 | Unmasking real-world audio deepfakes: A data-centric approach. David Combei, Adriana Stan, Dan Oneata, Nicolas M. Müller, Horia Cucu |
| 2025 | Unsupervised Rhythm and Voice Conversion to Improve ASR on Dysarthric Speech. Karl El Hajal, Enno Hermann, Sevada Hovsepyan, Mathew Magimai-Doss |
| 2025 | Unveiling Audio Deepfake Origins: A Deep Metric learning And Conformer Network Approach With Ensemble Fusion. Ajinkya Kulkarni, Sandipana Dowerah, Tanel Alumäe, Mathew Magimai-Doss |
| 2025 | Using Neurogram Similarity Index Measure (NSIM) to Model Hearing Loss and Cochlear Neural Degeneration. Ahsan J. Cheema, Sunil Puria |
| 2025 | Using and comprehending language in face-to-face conversation. Judith Holler |
| 2025 | Using gender, phonation and age to interpret automatically discovered speech attributes for explainable speaker recognition. Carole Millot, Clara Ponchard, Cédric Gendrot, Jean-François Bonastre, Orane Dufour |
| 2025 | VCapAV: A Video-Caption Based Audio-Visual Deepfake Detection Dataset. Yuxi Wang, Yikang Wang, Qishan Zhang, Hiromitsu Nishizaki, Ming Li |
| 2025 | VIB-based Real Pre-emphasis Audio Deepfake Source Tracing. Thien-Phuc Doan, Kihun Hong, Souhwan Jung |
| 2025 | VS-Singer: Vision-Guided Stereo Singing Voice Synthesis with Consistency Schrödinger Bridge. Zijing Zhao, Kai Wang, Hao Huang, Ying Hu, Liang He, Jichen Yang |
| 2025 | Variability in Intervocalic /t/ and Community Diversity in Australian English. Hannah White, Joshua Penney, Felicity Cox |
| 2025 | Variability in performance across four generations of automatic speaker recognition systems. Lauren Harrington, Vincent Hughes, Philip Harrison, Paul Foulkes, Jessica Wormald, Finnian Kelly, David van der Vloed |
| 2025 | Vector Quantized Cross-lingual Unsupervised Domain Adaptation for Speech Emotion Recognition. Pravin Mote, Donita Robinson, Elizabeth Richerson, Carlos Busso |
| 2025 | Vela: Scalable Embeddings with Voice Large Language Models for Multimodal Retrieval. Ruofan Hu, Yan Xia, Minjie Hong, Jieming Zhu, Bo Chen, Xiaoda Yang, Minghui Fang, Tao Jin |
| 2025 | ViCocktail: Automated Multi-Modal Data Collection for Vietnamese Audio-Visual Speech Recognition. Thai-Binh Nguyen, Thi Van Nguyen, Quoc Truong Do, Chi Mai Luong |
| 2025 | ViToSA: Audio-Based Toxic Spans Detection on Vietnamese Speech Utterances. Huy Ba Do, Vy Le-Phuong Huynh, Luan Thanh Nguyen |
| 2025 | VibE-SVC: Vibrato Extraction with High-frequency F0 Contour for Singing Voice Conversion. Joon-Seung Choi, Dong-Min Byun, Hyung-Seok Oh, Seong-Whan Lee |
| 2025 | Video-to-Audio Generation with Fine-grained Temporal Semantics. Yuchen Hu, Yu Gu, Chenxing Li, Rilin Chen, Dong Yu |
| 2025 | VietASR: Achieving Industry-level Vietnamese ASR with 50-hour labeled data and Large-Scale Speech Pretraining. Jianheng Zhuo, Yifan Yang, Yiwen Shao, Yong Xu, Dong Yu, Kai Yu, Xie Chen |
| 2025 | Vision-Integrated High-Quality Neural Speech Coding. Yao Guo, Yang Ai, Rui-Chen Zheng, Hui-Peng Du, Xiao-Hang Jiang, Zhen-Hua Ling |
| 2025 | Visual Cues Support Robust Turn-taking Prediction in Noise. Sam O'Connor Russell, Naomi Harte |
| 2025 | Visual features of the oral region in Polish sibilants produced by children with various sibilance patterns. Agata Sage, Zuzanna Miodonska, Michal Krecichwost, Ewa Kwasniok, Pawel Badura |
| 2025 | VisualSpeech: Enhancing Prosody Modeling in TTS Using Video. Shumin Que, Anton Ragni |
| 2025 | Visually-Adaptive Guided Robust Speech Recognition with Parameter-Efficient Adaptation. Zhao Yang, Rui Jiang, Yue Heng Yeo, Xiao Fu, Wei Xi, Jizhong Zhao |
| 2025 | Vo-Ve: An Explainable Voice-Vector for Speaker Identity Evaluation. Jaejun Lee, Kyogu Lee |
| 2025 | Vocal-tract model with two directions: Static design for a dummy head and dynamic design for a speaking machine. Takayuki Arai |
| 2025 | VocalAgent: Large Language Models for Vocal Health Diagnostics with Safety-Aware Evaluation. Yubin Kim, Taehan Kim, Wonjune Kang, Eugene Park, Joonsik Yoon, Dongjae Lee, Xin Liu, Daniel McDuff, Hyeonhoon Lee, Cynthia Breazeal, Hae Won Park |
| 2025 | Vocoder-Projected Feature Discriminator. Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Yuto Kondo |
| 2025 | Voice Activity-based Text Segmentation for ASR Text Denormalization. Sashi Novitasari, Takashi Fukuda, Gakuto Kurata |
| 2025 | Voice Adaptation for Swiss German. Samuel Stucki, Jan Deriu, Mark Cieliebak |
| 2025 | Voice Conversion Improves Cross-Domain Robustness for Spoken Arabic Dialect Identification. Badr M. Abdullah, Matthew Baas, Bernd Möbius, Dietrich Klakow |
| 2025 | Voice Conversion for Likability Control via Automated Rating of Speech Synthesis Corpora. Hitoshi Suda, Shinnosuke Takamichi, Satoru Fukayama |
| 2025 | Voice Impression Control in Zero-Shot TTS. Kenichi Fujita, Shota Horiguchi, Yusuke Ijima |
| 2025 | Voice Quality Dimensions as Interpretable Primitives for Speaking Style for Atypical Speech and Affect. Jaya Narain, Vasudha Kowtha, Colin Lea, Lauren Tooley, Dianna Yee, Vikramjit Mitra, Zifang Huang, Miquel Espi Marques, Jon Huang, Carlos Avendaño, Shirley Ren |
| 2025 | Voice Reconstruction through Large-Scale TTS Models: Comparing Zero-Shot and Fine-tuning Approaches to Personalise TTS in Assistive Communication. Éva Székely, Péter Mihajlik, Máté Soma Kádár, László Tóth |
| 2025 | Voice-Based Dysphagia Detection: Leveraging Self-Supervised Speech Representation. Injune Hwang, Jung-Min Kim, Ju Seok Ryu, Kyogu Lee |
| 2025 | Voice-ENHANCE: Speech Restoration using a Diffusion-based Voice Conversion Framework. Kyungguen Byun, Jason Filos, Erik Visser, Sunkuk Moon |
| 2025 | VoiceMark: Zero-Shot Voice Cloning-Resistant Watermarking Approach Leveraging Speaker-Specific Latents. Haiyun Li, Zhiyong Wu, Xiaofeng Xie, Jingran Xie, Yaoxun Xu, Hanyang Peng |
| 2025 | VoiceNet: Multilingual On-Device Phoneme-To-Audio Alignment. Kun Jin, Siva Penke, Srinivasa Algubelli |
| 2025 | VoiceNoNG: Robust High-Quality Speech Editing Model without Hallucinations. Sung-Feng Huang, Heng-Cheng Kuo, Zhehuai Chen, Xuesong Yang, Pin-Jui Ku, Ante Jukic, Huck Yang, Yu Tsao, Yu-Chiang Frank Wang, Hung-yi Lee, Szu-Wei Fu |
| 2025 | VoiceQualityVC: A Voice Conversion System for Studying the Perceptual Effects of Voice Quality in Speech. Harm Lameris, Joakim Gustafsson, Éva Székely |
| 2025 | Voices of 'cyborg awesomeness': Posthuman embodiment of nonbinary gender expression in AI speech technologies. Maxwell Hope, Éva Székely |
| 2025 | VoxAging: Continuously Tracking Speaker Aging with a Large-Scale Longitudinal Dataset in English and Mandarin. Zhiqi Ai, Meixuan Bao, Zhiyong Chen, Zhi Yang, Xinnuo Li, Shugong Xu |
| 2025 | Voxplorer: Voice data exploration and projection in an interactive dashboard. Alessandro De Luca, Srikanth R. Madikeri, Volker Dellwo |
| 2025 | WAKE: Watermarking Audio with Key Enrichment. Yaoxun Xu, Jianwei Yu, Hangting Chen, Zhiyong Wu, Xixin Wu, Dong Yu, Rongzhi Gu, Yi Luo |
| 2025 | WCTC-Biasing: Retraining-free Contextual Biasing ASR with Wildcard CTC-based Keyword Spotting and Inter-layer Biasing. Yu Nakagome, Michael Hentschel |
| 2025 | WIND: Accelerated RNN-T Decoding with Windowed Inference for Non-blank Detection. Hainan Xu, Vladimir Bataev, Lilit Grigoryan, Boris Ginsburg |
| 2025 | WTFormer: A Wavelet Conformer Network for MIMO Speech Enhancement with Spatial Cues Peservation. Lu Han, Junqi Zhao, Renhua Peng |
| 2025 | WavShape: Information-Theoretic Speech Representation Learning for Fair and Privacy-Aware Audio Processing. Oguzhan Baser, Ahmet Ege Tanriverdi, Kaan Kale, Sandeep Chinchali, Sriram Vishwanath |
| 2025 | Weakly Supervised Data Refinement and Flexible Sequence Compression for Efficient Thai LLM-based ASR. Mingchen Shao, Xinfa Zhu, Chengyou Wang, Bingshen Mu, Hai Li, Ying Yan, Junhui Liu, Danming Xie, Lei Xie |
| 2025 | Web-Based Application for Real-Time Biofeedback of Vocal Resonance in Gender-Affirming Voice Training: Design and Usability Evaluation. Tara McAllister, Collin Eagen, Yi Shan, Peter Traver, Daphna Harel, Tae Hong Park, Vesna D. Novak |
| 2025 | Weight Factorization and Centralization for Continual Learning in Speech Recognition. Enes Yavuz Ugan, Ngoc-Quan Pham, Alexander Waibel |
| 2025 | What Do Humans Hear When Interacting? Experiments on Selective Listening for Evaluating ASR of Spoken Dialogue Systems. Kiyotada Mori, Seiya Kawano, Chaoran Liu, Carlos Toshinori Ishi, Angel F. Garcia Contreras, Koichiro Yoshino |
| 2025 | What do self-supervised speech models know about Dutch? Analyzing advantages of language-specific pre-training. Marianne de Heer Kloots, Hosein Mohebbi, Charlotte Pouw, Gaofei Shen, Willem H. Zuidema, Martijn Bentum |
| 2025 | What the Filler? Both ASR Systems and Humans Struggle More With Other Kinds of Disfluencies Than With Filler Particles. Saskia Wepner, Lucas Eckert, Gernot Kubin, Barbara Schuppler |
| 2025 | When Humans Growl and Birds Speak: High-Fidelity Voice Conversion from Human to Animal and Designed Sounds. Minsu Kang, Seolhee Lee, Choonghyeon Lee, Namhyun Cho |
| 2025 | When The MOS Predictor Asks For Training Annotation In Cross Lingual/Domain Adaptation. Natacha Miniconi, Meysam Shamsi, Anthony Larcher |
| 2025 | When focus shapes the flow: prosodic restructuring in Mandarin complex nominals. Anqi Xu, Yu-Yin Hsu |
| 2025 | WhiStress: Enriching Transcriptions with Sentence Stress Detection. Iddo Yosha, Dorin Shteyman, Yossi Adi |
| 2025 | Whilter: A Whisper-based Data Filter for "In-the-Wild" Speech Corpora Using Utterance-level Multi-Task Classification. William Ravenscroft, George Close, Kit Bower-Morris, Jamie Stacey, Dmitry Sityaev, Kris Y. Hong |
| 2025 | Whisper-Based Multilingual Alzheimer's Disease Detection and Improvements for Low-Resource Language. Kaichen Jia, Jinpeng Li, Ke Li, Wei-Qiang Zhang |
| 2025 | WhisperD: Dementia Speech Recognition and Filler Word Detection with Whisper. Emmanuel Akinrintoyo, Nadine Abdelhalim, Nicole Salomons |
| 2025 | WhisperMSS: A Two-Stage Framework for Mandarin Singing Transcription and Segmentation Using Pretrained Models. Ruoxuan Liang, Xiangjian Zeng, Zhen Liu, Qingqiang Wu, Ruichen Zhang, Le Ren |
| 2025 | Who Gets the Mic? Investigating Gender Bias in the Speaker Assignment of a Speech-LLM. Dariia Puhach, Amir H. Payberah, Éva Székely |
| 2025 | Who knows best? Effects of speech disfluencies on incentivized decision-making. Ambika Kirkland, Jens Edlund |
| 2025 | Who, When, and What: Leveraging the "Three Ws" Concept for Emotion Recognition in Conversation. Xiaohan Shi, Xingfeng Li, Tomoki Toda |
| 2025 | Why is children's ASR so difficult? Analyzing children's phonological error patterns using SSL-based phoneme recognizers. Koharu Horii, Naohiro Tawara, Atsunori Ogawa, Shoko Araki |
| 2025 | Word Level Timestamp Generation for Automatic Speech Recognition and Translation. Ke Hu, Krishna C. Puvvada, Elena Rastorgueva, Zhehuai Chen, He Huang, Shuoyang Ding, Kunal Dhawan, Hainan Xu, Jagadeesh Balam, Boris Ginsburg |
| 2025 | Word stress in self-supervised speech models: A cross-linguistic comparison. Martijn Bentum, Louis ten Bosch, Tomas O. Lentz |
| 2025 | Word-Level Error Analysis in Decoding Systems: From Speech Recognition to Brain-Computer Interfaces. Jingya Huang, Aashish N. Patel, Sowmya Manojna Narasimha, Gal Mishne, Vikash Gilja |
| 2025 | X-ARES: A Comprehensive Framework for Assessing Audio Encoder Performance. Junbo Zhang, Heinrich Dinkel, Yadong Niu, Chenyu Liu, Si Cheng, Anbei Zhao, Jian Luan |
| 2025 | You Are What You Say: Exploiting Linguistic Content for VoicePrivacy Attacks. Ünal Ege Gaznepoglu, Anna Leschanowsky, Ahmad Aloradi, Prachi Singh, Daniel Tenbrinck, Emanuël A. P. Habets, Nils Peters |
| 2025 | ZSDEVC: Zero-Shot Diffusion-based Emotional Voice Conversion with Disentangled Mechanism. Hsing-Hang Chou, Yun-Shao Lin, Ching-Chin Sung, Yu Tsao, Chi-Chun Lee |
| 2025 | Zero-Shot Learning for Acoustic Event Classification Using an Attribute Vector and Conditional GAN. Kohei Uehara, Ryoichi Takashima, Tetsuya Takiguchi |
| 2025 | Zero-Shot Mono-to-Binaural Speech Synthesis. Alon Levkovitch, Julian Salazar, Soroosh Mariooryad, R. J. Skerry-Ryan, Nadav Bar, W. Bastiaan Kleijn, Eliya Nachmani |
| 2025 | Zero-Shot Speech-Based Depression and Anxiety Assessment with LLMs. Erfan Loweimi, Sofia de la Fuente Garcia, Saturnino Luz |
| 2025 | xLSTM-SENet: xLSTM for Single-Channel Speech Enhancement. Nikolai Lund Kühne, Jan Østergaard, Jesper Jensen, Zheng-Hua Tan |