INTERSPEECH A

1181 papers

YearTitle / Authors
2025"Alexa, can you forget me?" Machine Unlearning Benchmark in Spoken Language Understanding.
Alkis Koudounas, Claudio Savelli, Flavio Giobergia, Elena Baralis
2025"Dyadosyncrasy", Idiosyncrasy and Demographic Factors in Turn-Taking.
Julio Cesar Cavalcanti, Gabriel Skantze
2025"KAN you hear me?" Exploring Kolmogorov-Arnold Networks for Spoken Language Understanding.
Alkis Koudounas, Moreno La Quatra, Eliana Pastor, Sabato Marco Siniscalchi, Elena Baralis
202526th Annual Conference of the International Speech Communication Association, Interspeech 2025, Rotterdam, The Netherlands, 17-21 August 2025.
Odette Scharenborg, Catharine Oertel, Khiet Truong
20252D Immersed Boundary Method in Vocal Tract Acoustics: An Eulerian-Lagrangian Model for Simulation of Diphthongs.
Rongshuai Wu, Debasish Ray Mohapatra, Sidney Fels
202575-Speaker Annot-16: A benchmark dataset for speech articulatory rt-MRI annotation with articulator contours and phonetic alignment.
Xuan Shi, Yubin Zhang, Yijing Lu, Marcus Ma, Tiantian Feng, Asterios Toutios, Haley Hsu, Louis Goldstein, Shrikanth Narayanan
2025A Bayesian Approach to L2 Fluency Ratings by Native and Nonnative Listeners.
Kakeru Yazawa, Takayuki Konishi
2025A Cascaded Multimodal Framework for Automatic Social Communication Severity Assessment in Children with Autism Spectrum Disorder.
Jihyun Mun, Sunhee Kim, Minhwa Chung
2025A Chinese Heart Failure Status Speech Database with Universal and Personalised Classification.
Yue Pan, Liwei Liu, Changxin Li, Xingyao Wang, Yili Xia, Hanyue Zhang, Ming Chu
2025A Comparative Study on Proactive and Passive Detection of Deepfake Speech.
Chia-Hua Wu, Wanying Ge, Xin Wang, Junichi Yamagishi, Yu Tsao, Hsin-Min Wang
2025A Comprehensive Real-World Assessment of Audio Watermarking Algorithms: Will They Survive Neural Codecs?
Yigitcan Özer, Woosung Choi, Joan Serrà, Mayank Kumar Singh, Wei-Hsiang Liao, Yuki Mitsufuji
2025A Cookbook for Community-driven Data Collection of Impaired Speech in Low-Resource Languages.
Sumaya Ahmed Salihs, Isaac Wiafe, Jamal-Deen Abdulai, Elikem Doe Atsakpo, Gifty Ayoka, Richard Cave, Akon Obu Ekpezu, Catherine Holloway, Katrin Tomanek, Fiifi Baffoe Payin Winful
2025A Copula-Based Generative Score-Level Fusion Model for Speaker Verification.
Sandro Cumani
2025A Data-Driven Diffusion-based Approach for Audio Deepfake Explanations.
Petr Grinberg, Ankur Kumar, Surya Koppisetti, Gaurav Bharaj
2025A Dataset for Automatic Assessment of TTS Quality in Spanish.
Alejandro Sosa Welford, Leonardo Pepino
2025A Deformable Convolution GAN Approach for Speech Dereverberation in Cochlear Implant Users.
Hsin-Tien Chiang, John H. L. Hansen
2025A Domain Robust Pre-Training Method with Local Prototypes for Speaker Verification.
Qing Gu, Yan Song, Haoyu Song, Nan Jiang, Lirong Dai, Ian McLoughlin
2025A Gradient Effect of Hand Beat Timing on Spoken Word Recognition.
Chengjia Ye, James M. McQueen, Hans Rutger Bosker
2025A Hybrid Approach to Combining Role Diarization with ASR for Professional Conversations.
Bongjun Kim, Arindam Ghosh, Mark C. Fuhs, Anurag Chowdhury, Deblin Bagchi, Monika Woszczyna
2025A Joint Network for Singing Melody Extraction from Polyphonic Music with Attention Aggregation and Self-Consistency Training.
Jiabo Jing, Ying Hu, Hao Huang, Liang He, Zhijian Ou
2025A Lightweight Hybrid Dual Channel Speech Enhancement System under Low-SNR Conditions.
Zheng Wang, Xiaobin Rong, Yu Sun, Tianchi Sun, Zhibin Lin, Jing Lu
2025A Multi-Dialectal Dataset for German Dialect ASR and Dialect-to-Standard Speech Translation.
Verena Blaschke, Miriam Winkler, Constantin Förster, Gabriele Wenger-Glemser, Barbara Plank
2025A Multi-Stream Framework Utilizing 3D Human Reconstruction for Cued Speech Recognition.
Katerina Papadimitriou, Gerasimos Potamianos
2025A Multimodal Chinese Dataset for Cross-lingual Sarcasm Detection.
Xiyuan Gao, Bruce Xiao Wang, Meiling Zhang, Shuming Huang, Zhu Li, Shekhar Nayak, Matt Coler
2025A Naturally Elicited Multimodal Stress Database and Speech Breathing Based Stress Detection.
Karumannil Mohamed Ismail Yasar Arafath, Mohammed Abeer K. C., Aurobinda Routray
2025A Neural Codec Approach for Noise-Robust Bandwidth Expansion.
Xi Liu, Mu Yang, Szu-Jui Chen, John H. L. Hansen
2025A Novel Deep Learning Framework for Efficient Multichannel Acoustic Feedback Control.
Yuan-Kuei Wu, Juan Azcarreta Ortiz, Kashyap Patel, Buye Xu, Jung-Suk Lee, Sanha Lee, Ashutosh Pandey
2025A Perception-Based L2 Speech Intelligibility Indicator: Leveraging a Rater's Shadowing and Sequence-to-sequence Voice Conversion.
Haopeng Geng, Daisuke Saito, Nobuaki Minematsu
2025A Practitioner's Guide to Building ASR Models for Low-Resource Languages: A Case Study on Scottish Gaelic.
Ondrej Klejch, William Lamb, Peter Bell
2025A Robust Hybrid ACC-PM Approach for Personal Sound Zones.
Yaqi Zhu, Lei Zhou, Hongqing Liu, Liming Shi, Lu Gan
2025A Self-Training Approach for Whisper to Enhance Long Dysarthric Speech Recognition.
Shiyao Wang, Jiaming Zhou, Shiwan Zhao, Yong Qin
2025A Semantic Information-based Hierarchical Speech Enhancement Method Using Factorized Codec and Diffusion Model.
Yang Xiang, Canan Huang, Desheng Hu, Jingguang Tian, Xinhui Hu, Chao Zhang
2025A Siamese Network-Based Framework for Voice Mimicry Proficiency Assessment Using X-Vector Embeddings.
Bhasi K. C., Rajeev Rajan
2025A Silent Speech Decoding System from EEG and EMG with Heterogenous Electrode Configurations.
Masakazu Inoue, Motoshige Sato, Kenichi Tomeoka, Nathania Nah, Eri Hatakeyama, Kai Arulkumaran, Ilya Horiguchi, Shuntaro Sasai
2025A Simple-Yet-Effective Data Augmentation Method for Speaker Identification in Novels.
Wenjie Zhong, Jason Naradowsky, Yusuke Miyao
2025A Study of Real-world Audio-Visual Corpus Design and Production: A Perspective from MISP Challenges.
Hang Chen, Jun Du, Qing Wang, Juan Xie, Shi-Fu XIong
2025A Study of Speech Embedding Similarities Between Australian Aboriginal and High-Resource Languages.
Eliathamby Ambikairajah, Jingyao Wu, Ting Dang, Vidhyasaharan Sethu
2025A Study on Speech Assessment with Visual Cues.
Shafique Ahmed, Ryandhimas E. Zezario, Nasir Saleem, Amir Hussain, Hsin-Min Wang, Yu Tsao
2025A Study on The Impact of Foundation Models on Automatic Depression Detection from Speech Signals.
Bubai Maji, Monorama Swain, Shazia Nasreen, Debabrata Majumdar, Rajlakshmi Guha, Aurobinda Routray, Anders Søgaard
2025A Three-Stage Beamforming with Harmonic Guidance for Multi-Channel Speech Enhancement.
Nurali Alip, Tianrui Wang, Rui Cao, Meng Ge, Jingru Lin, Longbiao Wang, Jianwu Dang
2025A Two-Stage Hierarchical Deep Filtering Framework for Real-Time Speech Enhancement.
Shenghui Lu, Hukai Huang, Jinanglong Yao, Kaidi Wang, Qingyang Hong, Lin Li
2025A Watermark for Auto-Regressive Speech Generation Models.
Yihan Wu, Ruibo Chen, Georgios Milis, Junfeng Guo, Heng Huang
2025A real-time MRI study on asymmetry in velum dynamics during VCV production with nasal sounds.
Chetan Sharma, Vaishnavi Chandwanshi, Shreya Shrikant Karkun, Aditya Anand Gupta, Prasanta Kumar Ghosh
2025A semi-automatic pipeline for transcribing and segmenting child speech.
Polychronia Christodoulidou, James Tanner, Jane Stuart-Smith, Michael McAuliffe, Mridhula Murali, Amy Smith, Lauren Taylor, Joanne Cleland, Anja Kuschmann
2025A simple method for predicting Clinical Scores in Huntington's Disease by leveraging ASR's uncertainty on spontaneous speech.
Hadrien Titeux, Quang Tuan Rémy Nguyen, Andres Gil-Salcedo, Anne-Catherine Bachoud-Lévi, Emmanuel Dupoux
2025A-SMiLE: Affective Sparse Mixture-of-Experts Adapter with Multi-Task Learning for Spoken Dialogue Models.
Yi-Wen Chao, Yizhou Peng, Dianwen Ng, Yukun Ma, Chongjia Ni, Bin Ma, Eng Siong Chng
2025AA-SLLM: An Acoustically Augmented Speech Large Language Model for Speech Emotion Recognition.
Jialong Mai, Xiaofen Xing, Weidong Chen, Yuanbo Fang, Xiangmin Xu
2025ABHINAYA - A System for Speech Emotion Recognition In Naturalistic Conditions Challenge.
Soumya Dutta, Smruthi Balaji, Varada R, Viveka Salinamakki, Sriram Ganapathy
2025AC/DC: LLM-based Audio Comprehension via Dialogue Continuation.
Yusuke Fujita, Tomoya Mizumoto, Atsushi Kojima, Lianbo Liu, Yui Sudo
2025ADCeleb: A Longitudinal Speech Dataset from Public Figures for Early Detection of Alzheimer's Disease.
Kunxiao Gao, Anna Favaro, Najim Dehak, Laureano Moro-Velázquez
2025ADI-20: Arabic Dialect Identification dataset and models.
Haroun Elleuch, Salima Mdhaffar, Yannick Estève, Fethi Bougares
2025AF-Vocoder: Artifact-Free Neural Vocoder with Global Artifact Filter.
Zhuangqi Chen, Xianjun Xia, Xiaohuai Le, Siyu Sun, Chuanzeng Huang
2025AISHELL-5: The First Open-Source In-Car Multi-Channel Multi-Speaker Speech Dataset for Automatic Speech Diarization and Recognition.
Yuhang Dai, He Wang, Xingchen Li, Zihan Zhang, Shuiyuan Wang, Lei Xie, Xin Xu, Hongxiao Guo, Shaoji Zhang, Hui Bu, Wei Chen
2025APTTS: Adversarial Post-training in Latent Flow Matching for Fast and High-fidelity Text-to-Speech.
Hyungchan Yoon, Chanwoo Lee, Hoodong Lee, Stanley Jungkyu Choi
2025ARiSE: Auto-Regressive Multi-Channel Speech Enhancement.
Pengjie Shen, Xueliang Zhang, Zhong-Qiu Wang
2025ASDA: Audio Spectrogram Differential Attention Mechanism for Self-Supervised Representation Learning.
Junyu Wang, Tianrui Wang, Meng Ge, Longbiao Wang, Jianwu Dang
2025ASR Confidence Estimation using True Class Lexical Similarity Score.
Nagarathna Ravi, Thishyan Raj T, Ravi Teja Chaganti, Vipul Arora
2025ASR-FAIRBENCH: Measuring and Benchmarking Equity Across Speech Recognition Systems.
Anand Kumar Rai, Satyam Rahangdale, Utkarsh Anand, Animesh Mukherjee
2025ASR-based segmentation for the analysis of larger child-speech datasets: Performance evaluation on vowels from Australian-English speaking children aged 4 to 11 years.
Rui Cai, Titia Benders
2025ASVspoof2019 vs. ASVspoof5: Assessment and Comparison.
Avishai Weizman, Yehuda Ben-Shimol, Itshak Lapidot
2025ATMM-SAGA: Alternating Training for Multi-Module with Score-Aware Gated Attention SASV system.
Amro Asali, Yehuda Ben-Shimol, Itshak Lapidot
2025Accelerating Autoregressive Speech Synthesis Inference With Speech Speculative Decoding.
Zijian Lin, Yang Zhang, Yougen Yuan, Yuming Yan, Jinjiang Liu, Zhiyong Wu, Pengfei Hu, Qun Yu
2025Accelerating Diffusion-based Text-to-Speech Model Trainingwith Dual Modality Alignment.
Jeongsoo Choi, Zhikang Niu, Ji-Hoon Kim, Chunhui Wang, Joon Son Chung, Xie Chen
2025Accelerating Flow-Matching-Based Text-to-Speech via Empirically Pruned Step Sampling.
Qixi Zheng, Yushen Chen, Zhikang Niu, Ziyang Ma, Xiaofei Wang, Kai Yu, Xie Chen
2025Accent Normalization Using Self-Supervised Discrete Tokens with Non-Parallel Data.
Qibing Bai, Sho Inoue, Shuai Wang, Zhongjie Jiang, Yannan Wang, Haizhou Li
2025Accessible Delivery of Visual-Acoustic Biofeedback for Speech Sound Disorder.
Tara McAllister, Peter Traver, Amanda Eads, William Haack, Helen Carey, Yi Shan, Wendy Liang, Tae Hong Park
2025Accessible Real-time Eye-gaze Tracking for Neurocognitive Health Assessment: A Multimodal Web-based Approach.
Daniel Tisdale, Jackson Liscombe, David Pautler, Michael Neumann, Vikram Ramanarayanan
2025Accurate, fast, cheap: Choose three. Replacing Multi-Head-Attention with Bidirectional Recurrent Attention for Long-Form ASR.
Martin Ratajczak, Jean-Philippe Robichaud, Jennifer Drexler Fox
2025Acoustic Detection of UAV Abnormality Using One Ground-Based Acoustic Vector Sensor.
Dengjian Zhou, Jianghan Hai, Sijia Liao, Yue Ivan Wu, Kainam Thomas Wong, Xiujuan Zheng
2025Acoustic Features of Mandarin Tone Production in Noise: A Comparison Between Chinese Native Speakers and Korean L2 Learners.
Jinxin Ji, Yiying Hu, Xiaohu Yang, Gang Peng
2025Acoustic Representation and Realization of Weak Elements Subcategories: In the Case of Tianjin Mandarin.
Zhijie Li, Hui Feng
2025Acoustic and Linguistic Biomarkers for Cognitive Impairment Detection from Speech.
Catarina Botelho, David Gimeno-Gómez, Francisco Teixeira, John Mendonça, Patrícia Pereira, Diogo A. P. Nunes, Thomas Rolland, Anna Pompili, Rubén Solera-Ureña, Maria Ponte, David Martins de Matos, Carlos D. Martínez-Hinarejos, Isabel Trancoso, Alberto Abad
2025Acoustic scattering AI for non-invasive object classifications: A case study on hair assessment.
Long-Vu Hoang, Tuan Nguyen, Huy Dat Tran
2025Acoustic similarities, articulatory uniqueness: Speech production mechanisms in individuals with congenital lip paralysis.
Anne Hermes, Ivana Didirková, Philipp Buech, Gilles Vannuscorps
2025Acquiring Pronunciation from Speech Audio via Multi-task Learning.
Siqi Sun, Korin Richmond
2025AdaKWS: Towards Robust Keyword Spotting with Test-Time Adaptation.
Yang Xiao, Tianyi Peng, Yanghao Zhou, Rohan Kumar Das
2025Adapting Whisper for Parameter-efficient Code-Switching Speech Recognition via Soft Prompt Tuning.
Hongli Yang, Yizhou Peng, Hao Huang, Sheng Li
2025Adapting Whisper for Streaming Speech Recognition via Two-Pass Decoding.
Haoran Zhou, Xingchen Song, Brendan Fahy, Qiaochu Song, Binbin Zhang, Zhendong Peng, Anshul Wadhawan, Denglin Jiang, Apurv Verma, Vinay Ramesh, Srivas Prasad, Michele M. Franceschini
2025Adapting Whisper for low-resource Hindi-English Code-Mix speech with on-the-fly Augmentation & LLM-Synthesised Data.
Astik Biswas, Oleg Shevelev, Amine Abdaoui, Vivek Tyagi, Abdelmoumene Boumadane
2025Adaptive Across-Subcenter Representation Learning for Imbalanced Anomalous Sound Detection.
Dong Wang, Jiqing Han, Guibin Zheng, Tieran Zheng, Yongjun He
2025Adaptive Differential Denoising for Respiratory Sounds Classification.
Gaoyang Dong, Zhicheng Zhang, Ping Sun, Minghui Zhang
2025Adaptive Knowledge Distillation for Device-Directed Speech Detection.
Hyung-Gun Chi, Florian Pesce, Wonil Chang, Oggi Rudovic, Arturo Argueta, Stefan Braun, Vineet Garg, Ahmed Hussen Abdelaziz
2025Addressing Task Conflicts in Stuttering Detection via MMoE-Based Multi-Task Learning.
Xiaokang Liu, Xingfeng Li, Yudong Yang, Lan Wang, Nan Yan
2025Advancing Emotion Recognition via Ensemble Learning: Integrating Speech, Context, and Text Representations.
Xiaohan Shi, Jinyi Mi, Xingfeng Li, Tomoki Toda
2025Advancing Pediatric ASR: The Role of Voice Generation in Disordered Speech.
Karen Rosero, Ali N. Salman, Shreeram Suresh Chandra, Berrak Sisman, Cortney Van't Slot, Alex A. Kane, Rami R. Hallac, Carlos Busso
2025Adversarial Attacks on Text-dependent Speaker Verification System.
Sreekanth Sankala, Venkatesh Parvathala, Ramesh Gundluru, K. Sri Rama Murty
2025Adversarial Deep Metric Learning for Cross-Modal Audio-Text Alignment in Open-Vocabulary Keyword Spotting.
Youngmoon Jung, Yong-Hyeok Lee, Myunghun Jung, Jaeyoung Roh, Chang Woo Han, Hoon-Young Cho
2025AfriHuBERT: A self-supervised speech representation model for African languages.
Jesujoba O. Alabi, Xuechen Liu, Dietrich Klakow, Junichi Yamagishi
2025Age-related changes in multisensory integration of emotions in an audiovisual face-prosody-semantics Stroop task.
Yi Lin, Shumeng Ni, Yangfan Lu
2025Agent-based modelling, sound change, and metaphony in Southern Italian varieties of Italo-Romance.
Lilian von Bressensdorf, Pia Greca, Jonathan Harrington
2025Aligning ASR Evaluation with Human and LLM Judgments: Intelligibility Metrics Using Phonetic, Semantic, and NLI Approaches.
Bornali Phukon, Xiuwen Zheng, Mark Hasegawa-Johnson
2025Alzheimer's Dementia Detection Using Perplexity from Paired Large Language Models.
Yao Xiao, Heidi Christensen, Stefan Goetze
2025Alzheimer's Disease Detection Using Co-Attention Mechanism for Acoustic and ASR-Transcribed Text Features.
Yongqi Shao, Tao Fang
2025Amplifying Artifacts with Speech Enhancement in Voice Anti-spoofing.
Thanapat Trachu, Thanathai Lertpetchpun, Ekapol Chuangsuwanich
2025An Effective Anomalous Sound Detection Method Based on Global and Local Attribute Mining.
Nan Jiang, Yan Song, Qing Gu, Haoyu Song, Lirong Dai, Ian McLoughlin
2025An Effective Training Framework for Light-Weight Automatic Speech Recognition Models.
Abdul Hannan, Alessio Brutti, Shah Nawaz, Mubashir Noman
2025An Exploration of Interpretable Deep Learning Models for the Assessment of Mild Cognitive Impairment.
Emma Cathrine Liisborg Leschly, Oliver Roesler, Michael Neumann, Jackson Liscombe, Abhishek Hosamath, Lakshmi Arbatti, Line H. Clemmensen, Melanie Ganz, Vikram Ramanarayanan
2025An Exploratory Framework for LLM-assisted Human Annotation of Speech Datasets.
Alexander Johnson, Harsh Deshpande, Emmy Phung, Ahmad Emami
2025An Investigative Study on Recent Sharpness- and Flatness-Based Optimizers for Enhanced Self-Supervised Speaker Verification.
Abderrahim Fathan, Jahangir Alam, Xiaolin Zhu
2025An approach to measuring the performance of Automatic Speech Recognition(ASR) models in the context of Large Language Model(LLM) powered applications.
Sujith Pulikodan, Sahapthan K, Prasanta Kumar Ghosh, Visruth Sanka, Nihar Desai
2025An interpretable speech foundation model for depression detection by revealing prediction-relevant acoustic features from long speech.
Qingkun Deng, Saturnino Luz, Sofia de la Fuente Garcia
2025Analysis and Evaluation of Synthetic Data Generation in Speech Dysfluency Detection.
Jinming Zhang, Xuanru Zhou, Jiachen Lian, Shuhe Li, William Li, Zoe Ezzes, Rian Bogley, Lisa Wauters, Zachary A. Miller, Jet Vonk, Brittany Morin, Maria Luisa Gorno-Tempini, Gopala Anumanchipalli
2025Analysis and Extension of a Near-End Listening Enhancement Method Based on Long-Term Fractile Noise Statistics.
Filippo Villani, Wai-Yip Chan, Zheng-Hua Tan, Jan Østergaard, Jesper Jensen
2025Analysis of ABC Frontend Audio Systems for the NIST-SRE24.
Sara Barahona, Anna Silnova, Ladislav Mosner, Junyi Peng, Oldrich Plchot, Johan Rohdin, Lin Zhang, Jiangyu Han, Petr Pálka, Federico Landini, Lukás Burget, Themos Stafylakis, Sandro Cumani, Dominik Bobos, Miroslav Hlavácek, Martin Kodovsky, Tomás Pavlícek
2025Analysis of Avian Biphonic Vocalization Using Computational Modelling.
Noumida A, Rajeev Rajan
2025Analysis of Phonetic Level Similarities Across Languages in Emotional Speech.
Pravin Mote, Abinay Reddy Naini, Donita Robinson, Elizabeth Richerson, Carlos Busso
2025Analysis of Semantic and Acoustic Token Variability Across Speech, Music, and Audio Domains.
Takanori Ashihara, Marc Delcroix, Tsubasa Ochiai, Kohei Matsuura, Shota Horiguchi
2025Analysis of the ABC Classification Backends for NIST SRE24.
Sandro Cumani, Anna Silnova, Sara Barahona, Ladislav Mosner, Oldrich Plchot, Johan Rohdin
2025Analyzing Mitigation Strategies for Catastrophic Forgetting in End-to-End Training of Spoken Language Models.
Chi-Yuan Hsiao, Ke-Han Lu, Kai-Wei Chang, Chih-Kai Yang, Wei-Chih Chen, Hung-yi Lee
2025Analyzing the Impact of Accent on English Speech: Acoustic and Articulatory Perspectives.
Gowtham Premananth, Vinith Kugathasan, Carol Y. Espy-Wilson
2025Analyzing the Importance of Blank for CTC-Based Knowledge Distillation.
Benedikt Hilmes, Nick Rossenbach, Ralf Schlüter
2025Anne Rowling Neurological Speech Corpus: clinically annotated longitudinal dataset for developing speech biomarkers in neurodegenerative disorders.
Johnny Tam, Christine Weaver, Oliver Watts, Siddharthan Chandran, Suvankar Pal, Rowling Speech Consortium
2025Anomalous Sound Detection Based Feature Fusion and Dual-path Non-linear Independent Components Estimation.
Yawei Wang, Qiaoling Zhang, Yi Zhang, Junyao Hu
2025Apical vs. Regular Vowel Duration: A Corpus-based Analysis of Contextual Influences in Standard Mandarin.
Jingyi Sun, Bowei Shao, Martine Adda-Decker
2025Approaching Dialogue State Tracking via Aligning Speech Encoders and LLMs.
Simon Sedlácek, Bolaji Yusuf, Jan Svec, Pradyoth Hegde, Santosh Kesiraju, Oldrich Plchot, Jan Cernocký
2025ArVoice: A Multi-Speaker Dataset for Arabic Speech Synthesis.
Hawau Olamide Toyin, Rufael Marew, Humaid Alblooshi, Samar M. Magdy, Hanan Aldarmaki
2025Are You Being Sarcastic? Prosodic Cues to Irony Perception in German.
Sophia Fünfgeld, Angelika Braun, Katharina Zahner-Ritter
2025Are loan sequences different from foreign sequences? A perception study with Japanese listeners on coronal obstruent - high front vowel sequences.
Silke Hamann, Andrea Alicehajic
2025ArticulateX: End-to-End Monolingual Speech Translation in Articulator Space.
Vishal Kumar, Vinayak Abrol
2025Articulatory Feature Prediction from Surface EMG during Speech Production.
Jihwan Lee, Kevin Huang, Kleanthis Avramidis, Simon Pistrosch, Monica González Machorro, Yoonjeong Lee, Björn W. Schuller, Louis Goldstein, Shrikanth Narayanan
2025Articulatory Strategy in Vowel Production as a Basis for Speaker Discrimination.
Justin J. H. Lo, Patrycja Strycharczuk, Sam Kirkham
2025Articulatory Vowel Distinctiveness in Spanish.
Kristin Teplansky, Emily Rangel, Mimi LaValley, Jinuk Kwon, Beiming Cao, Jun Wang
2025Articulatory clarity and variability before and after surgery for tongue cancer.
Thomas Tienkamp, Fleur van Ast, Roos van der Veen, Teja Rebernik, Raoul Buurke, Nikki Hoekzema, Katharina Polsterer, Hedwig Sekeres, Rob van Son, Martijn Wieling, Max J. H. Witjes, Sebastiaan A. H. J. de Visscher, Defne Abur
2025Articulatory modeling of the S-shaped F2 trajectories observed in Öhman's spectrographic analysis of VCV syllables.
Frédéric Berthommier
2025Articulatory variations in Apical Vowels in Southwestern Mandarin.
Jing Huang, Feng-fan Hsieh, Yueh-Chin Chang
2025Assessing the Performance and Efficiency of Mamba ASR in Low-Resource Scenarios.
Rodolfo Zevallos, Martí Cortada Garcia, Sarah Solito, Carlos Mena, Alex Peiró Lilja, Javier Hernando
2025Assessing the feasibility of Large Language Models for detecting micro-behaviors in team interactions during space missions.
Ankush Raut, Projna Paromita, Sydney R. Begerowski, Suzanne T. Bell, Theodora Chaspari
2025Assessment of L2 Oral Proficiency using Speech Large Language Models.
Rao Ma, Mengjie Qian, Siyuan Tang, Stefano Bannò, Kate M. Knill, Mark J. F. Gales
2025Assessment of the synthetic quality and controllability of laughing onset in speech-laugh synthesis.
Ryo Setoguchi, Yoshiko Arimoto
2025Attention Is Not Always the Answer: Optimizing Voice Activity Detection with Simple Feature Fusion.
Kumud Tripathi, Chowdam Venkata Kumar, Pankaj Wasnik
2025Attention Models and Auditory Transduction Features for Noise Robustness.
Cathal Ó Faoláin, Andrew Hines
2025Attention-Free Dual-Mode ASR with Latency-Controlled Selective State Spaces.
Takafumi Moriya, Masato Mimura, Kiyoaki Matsui, Hiroshi Sato, Kohei Matsuura
2025AttentiveMOS: A Lightweight Attention-Only Model forSpeech Quality Prediction.
Imran E. Kibria, Donald S. Williamson
2025Attractor-Based Speech Separation of Multiple Utterances by Unknown Number of Speakers.
Yuzhu Wang, Archontis Politis, Konstantinos Drossos, Tuomas Virtanen
2025Audio Deepfake Source Tracing using Multi-Attribute Open-Set Identification and Verification.
Pierre Falez, Tony Marteau, Damien Lolive, Arnaud Delhay
2025Audio-Based Classification and Geographic Regression of Austrian Dialects.
Lorenz Gutscher, Michael Pucher
2025Audiobox TTA-RAG: Improving Zero-Shot and Few-Shot Text-To-Audio with Retrieval-Augmented Generation.
Mu Yang, Bowen Shi, Matthew Le, Wei-Ning Hsu, Andros Tjandra
2025Augment Mandarin to Cantonese Speech Databases via Retrieval-Augmented Generation and Speech Synthesis.
Fan Liu, Cheng Gong, Boyu Zhu, Ruihao Jing, Chunyu Qiang, Tianrui Wang, Xiao-Lei Zhang, Xuelong Li
2025AuralNet: Hierarchical Attention-based 3D Binaural Localization of Overlapping Speakers.
Linya Fu, Yu Liu, Zhijie Liu, Zedong Yang, Zhong-Qiu Wang, Youfu Li, He Kong
2025AusKidTalk: Using Strategic Data Collection and Out-of-Domain Tools to Semi-Automate Novel Corpora Annotation.
Tünde Szalay, Mostafa Shahin, Tharmakulasingam Sirojan, Zheng Nan, Renata Huang, Kirrie J. Ballard, Beena Ahmed
2025Auto-Landmark: Acoustic Landmark Dataset and Open-Source Toolkit for Landmark Extraction.
Xiangyu Zhang, Daijiao Liu, Tianyi Xiao, Cihan Xiao, Tünde Szalay, Mostafa Shahin, Beena Ahmed, Julien Epps
2025Automated evaluation of children's speech fluency for low-resource languages.
Bowen Zhang, Nur Afiqah Abdul Latiff, Justin Kan, Rong Tong, Donny Soh, Xiaoxiao Miao, Ian McLoughlin
2025Automatic Detection and Sub-typing of Primary Progressive Aphasia from Speech: Integrating Task-Specific Features and Spatio-Semantic Graphs.
Fritz Peters, W. Richard Bevan-Jones, Grace Threlfall, Jenny M. Harris, Julie S. Snowden, Matthew Jones, Jennifer C. Thompson, Daniel J. Blackburn, Heidi Christensen
2025Automatic Dialectal Transcription: An Evaluation on Finnish and Norwegian.
Olli Kuparinen
2025Automatic Labeling and Correction of Noisy Labels for Robust Self-Supervised Speaker Verification.
Abderrahim Fathan, Jahangir Alam
2025Automatic Speech Recognition Biases in Newcastle English: an Error Analysis.
Dana Serditova, Kevin Tang, Jochen Steffens
2025Automatic Speech Recognition for Low-Resourced Middle Eastern Languages.
Razhan Hameed, Sina Ahmadi, Hanah Hadi, Rico Sennrich
2025Automatic Speech Recognition of African American English: Lexical and Contextual Effects.
Hamid Mojarad, Kevin Tang
2025Automatic classification of stop realisation with wav2vec2.0.
James Tanner, Morgan Sonderegger, Jane Stuart-Smith, Jeff Mielke, Tyler Kendall
2025Automatic detection of speech sound disorders in German-speaking children: augmenting the data with typically developed speech.
Darline Monika Marx, Marco Matassoni, Alessio Brutti
2025AxLSTMs: learning self-supervised audio representations with xLSTMs.
Sarthak Yadav, Sergios Theodoridis, Zheng-Hua Tan
2025BR-ASR: Efficient and Scalable Bias Retrieval Framework for Contextual Biasing ASR in Speech LLM.
Xun Gong, Anqi Lv, Wangyou Zhang, Zhiming Wang, Huijia Zhu, Yanmin Qian
2025Backchannel prediction for natural spoken dialog systems using general speaker and listener information.
Yoshinori Fukunaga, Ryota Nishimura, Kengo Ohta, Norihide Kitaoka
2025Band-SCNet: A Causal, Lightweight Model for High-Performance Real-Time Music Source Separation.
Junqi Yang, Yuhong Yang, Weiping Tu, Xin Zhao, Cedar Lin
2025Band-Split Self-supervised Mamba for Infant-centered Audio Analysis.
Xulin Fan, Jialu Li, Mark Hasegawa-Johnson, Nancy L. McElwain
2025Bayesian Learning for Domain-Invariant Speaker Verification and Anti-Spoofing.
Jin Li, Man-Wai Mak, Johan Rohdin, Kong Aik Lee, Hynek Hermansky
2025Beat gestures made by human-like avatars affect speech perception.
Matteo Maran, Renske Rötjes, Anna R. E. Schreurs, Hans Rutger Bosker
2025Benchmarking Neural Speech Codec Intelligibility with SITool.
Anna Leschanowsky, Kishor Kayyar Lakshminarayana, Anjana Rajasekhar, Lyonel Behringer, Ibrahim Kilinc, Guillaume Fuchs, Emanuël A. P. Habets
2025Benchmarking Time-localized Explanations for Audio Classification Models.
Cecilia Bolaños, Leonardo Pepino, Martín Meza, Luciana Ferrer
2025Benchmarking and Confidence Evaluation of LALMs For Temporal Reasoning.
Debarpan Bhattacharya, Apoorva Kulkarni, Sriram Ganapathy
2025Better Pseudo-labeling with Multi-ASR Fusion and Error Correction by SpeechLLM.
Jeena J. Prakash, Blessingh Kumar, Kadri Hacioglu, Bidisha Sharma, Sindhuja Gopalan, Malolan Chetlur, Shankar Venkatesan, Andreas Stolcke
2025Better Semi-supervised Learning for Multi-domain ASR Through Incremental Retraining and Data Filtering.
Andrés Carofilis, Pradeep Rangappa, Srikanth R. Madikeri, Shashi Kumar, Sergio Burdisso, Jeena J. Prakash, Esaú Villatoro-Tello, Petr Motlícek, Bidisha Sharma, Kadri Hacioglu, Shankar Venkatesan, Saurabh Vyas, Andreas Stolcke
2025Beyond Attacks: Advancing Fake Speech Detection with Attack-Agnostic Methods.
Shilpa Chandra, Akansha Tyagi, Shiven Patel, Padmanabhan Rajan
2025Beyond Conventional Metrics: using Entropic Triangles to Explain Balancing Methods in Acoustic Scene Classification.
Claudia Montero-Ramírez, Alba Martínez-Serrano, Jorge Garcelán-Gómez, Francisco J. Valverde-Albacete, Carmen Peláez-Moreno
2025Beyond Hard Sharing: Efficient Multi-Task Speech-to-Text Modeling with Supervised Mixture of Experts.
Hojun Jin, Eunsoo Hong, Ziwon Hyung, Sungjun Lim, Seungjin Lee, Keunseok Cho
2025Beyond Manual Transcripts: The Potential of Automated Speech Recognition Errors in Improving Alzheimer's Disease Detection.
Yin-Long Liu, Rui Feng, Jia-Xin Chen, Yi-Ming Wang, Jia-Hong Yuan, Zhen-Hua Ling
2025Beyond Similarity Scoring: Detecting Entailment and Contradiction in Multilingual and Multimodal Contexts.
Othman Istaiteh, Salima Mdhaffar, Yannick Estève
2025Beyond Traditional Speech Modifications : Utilizing Self Supervised Features for Enhanced Zero-Shot Children ASR.
Abhijit Sinha, Hemant Kumar Kathania, Mikko Kurimo
2025BiCrossMamba-ST: Speech Deepfake Detection with Bidirectional Mamba Spectro-Temporal Cross-Attention.
Yassine El Kheir, Tim Polzehl, Sebastian Möller
2025Bidirectional Spoken-Written Text Conversion with Large Language Models.
Muyeol Choi, HyunJung Choi, Yohan Lim, Jeong-Uk Bang, Minkyu Lee, Seon Hui Kim, Seung Yun, Donghyun Kim, Minsoo Kim, Sanghun Kim
2025Bilingual Speakers Exhibit Cognitive Fatigue: A Speech Disfluencies Case Study on Research Talks.
Ashwin Ram, Marisol Muñoz, Zoi Gkalitsiou, Alexandros G. Dimakis
2025BitTTS: Highly Compact Text-to-Speech Using 1.58-bit Quantization and Weight Indexing.
Masaya Kawamura, Takuya Hasumi, Yuma Shirahata, Ryuichi Yamamoto
2025Bona fide Cross Testing Reveals Weak Spot in Audio Deepfake Detection Systems.
Kwok Chin Yuen, Jia Qi Yip, Zhen Qiu, Chi-Hung Chi, Kwok-Yan Lam
2025Boosting StoRM Convergence with Metric Guidance and Non-uniform State-Sampling for Optimal Dereverberation.
Chandra Mohan Sharma, Arnab Kumar Roy, Anupam Mandal, Prasanta Kumar Ghosh, Prasanna Kumar Kr
2025Boundary-Conscious Pruning: Hard Set-Aware Model Compression for Efficient Speaker Recognition.
Seongkyu Mun, Jubum Han
2025Brain-tuned Speech Models Better Reflect Speech Processing Stages in the Brain.
Omer Moussa, Mariya Toneva
2025Breaking Resource Barriers in Speech Emotion Recognition via Data Distillation.
Yi Chang, Zhao Ren, Zhonghao Zhao, Thanh Tam Nguyen, Kun Qian, Tanja Schultz, Björn W. Schuller
2025Bridging ASR and LLMs for Dysarthric Speech Recognition: Benchmarking Self-Supervised and Generative Approaches.
Ahmed Aboeitta, Ahmed Sharshar, Youssef Nafea, Shady Shehata
2025Bridging Audio and Vision: Zero-Shot Audiovisual Segmentation by Connecting Pretrained Models.
Seung-jae Lee, Paul Hongsuck Seo
2025Bridging Speech and Singing: Multi-stage Speech-Prompted Singing Voice Conversion with Speaker Embedding Adaptation.
Mingda Liu, Jiatong Shi
2025Bridging the Training-Inference Gap in TTS: Training Strategies for Robust Generative Postprocessing for Low-Resource Speakers.
Frank Zalkow, Paolo Sani, Kishor Kayyar Lakshminarayana, Emanuël A. P. Habets, Nicola Pia, Christian Dittmar
2025Bringing Interpretability to Neural Audio Codecs.
Samir Sadok, Julien Hauret, Éric Bavu
2025Building an Accurate Open-Source Hebrew ASR System through Crowdsourcing.
Yanir Marmor, Yair Lifshitz, Yoad Snapir, Kinneret Misgav
2025CAGCRN: Real-Time Speech Enhancement with a Lightweight Model for Joint Acoustic Echo Cancellation and Noise Suppression.
Yuyang Wang, Yonghui Liu, Jianbing Liu, Kai Niu, Zhiqiang He
2025CAMER: Contribution-Aware Multimodal Emotion Recognition.
Sun-Kyung Lee, Jong-Hwan Kim
2025CAPR: Confidence-Aware Prompt Refinement in Large Language Models.
Jen-Tzung Chien, Po-Chun Huang
2025CBA-Whisper: Curriculum Learning-Based AdaLoRA Fine-Tuning on Whisper for Low-Resource Dysarthric Speech Recognition.
Tianyi Tan, Xin'an Chen, Xiaohuai Le, Wenzhi Fan, Xianjun Xia, Chuanzeng Huang, Jing Lu
2025CBA: Backdoor Attack on Deep Speech Classification via Audio Compression.
Yuheng Huang, Ying Ren, Wenjie Zhang, Diqun Yan
2025CEREALES : a new dataset of Quebec French accented speech with applications to speech recognition.
Lucas Maison, Thomas Soulas, Marie-Jean Meurs
2025CHSER: A Dataset and Case Study on Generative Speech Error Correction for Child ASR.
Natarajan Balaji Shankar, Zilai Wang, Kaiyuan Zhang, Mohan Shi, Abeer Alwan
2025CLAP-ART: Automated Audio Captioning with Semantic-rich Audio Representation Tokenizer.
Daiki Takeuchi, Binh Thien Nguyen, Masahiro Yasuda, Yasunori Ohishi, Daisuke Niizumi, Noboru Harada
2025CLEP-DG: Contrastive Learning for Speech Emotion Domain Generalization via Soft Prompt Tuning.
Jiacheng Shi, Yanfu Zhang, Ye Gao
2025CMSP-ST: Cross-modal Mixup with Speech Purification for End-to-End Speech Translation.
Jiale Ou, Hongying Zan
2025CMT-LLM: Contextual Multi-Talker ASR Utilizing Large Language Models.
Jiajun He, Naoki Sawada, Koichi Miyazaki, Tomoki Toda
2025CNVSRC 2024: The Second Chinese Continuous Visual Speech Recognition Challenge.
Zehua Liu, Xiaolou Li, Chen Chen, Lantian Li, Dong Wang
2025CS-FLEURS: A Massively Multilingual and Code-Switched Speech Dataset.
Brian Yan, Injy Hamed, Shuichiro Shimizu, Vasista Sai Lodagala, William Chen, Olga Iakovenko, Bashar Talafha, Amir Hussein, Alexander Polok, Kalvin Chang, Dominik Klement, Sara Althubaiti, Puyuan Peng, Matthew Wiesner, Thamar Solorio, Ahmed Ali, Sanjeev Khudanpur, Shinji Watanabe
2025CabinSep: IR-Augmented Mask-Based MVDR for Real-Time In-car Speech Separation with Distributed Heterogeneous Arrays.
Runduo Han, Yanxin Hu, Yihui Fu, Zihan Zhang, Yukai Jv, Li Chen, Lei Xie
2025Calm-Whisper: Reduce Whisper Hallucination On Non-Speech By Calming Crazy Heads Down.
Yingzhi Wang, Anas Alhmoud, Saad Alsahly, Muhammad Alqurishi, Mirco Ravanelli
2025Can AI Understand Mandarin Speech Prosody? A Framework and Benchmark Showcase.
Zilong Wang, Xiaoxue Zhang, Xinyang Jiang, Kaitao Song, Jue Yu
2025Can ASR generate valid measures of child reading fluency?
Wieke Harmsen, Roeland van Hout, Catia Cucchiarini, Helmer Strik
2025Can Emotion Fool Anti-spoofing?
Aurosweta Mahapatra, Ismail Rasim Ulgen, Abinay Reddy Naini, Carlos Busso, Berrak Sisman
2025Can Multimodal Foundation Models Help Analyze Child-Inclusive Autism Diagnostic Videos?
Aditya Kommineni, Digbalay Bose, Tiantian Feng, So Hyun Kim, Helen Tager-Flusberg, Somer Bishop, Catherine Lord, Sudarsana Kadiri, Shrikanth Narayanan
2025Can Quantized Audio Language Models Perform Zero-Shot Spoofing Detection?
Bikash Dutta, Rishabh Ranjan, Shyam Sathvik, Mayank Vatsa, Richa Singh
2025Can Speech Accurately Detect Depression in Patients With Comorbid Dementia? An Approach for Mitigating Confounding Effects of Depression and Dementia.
Sophie Young, Fuxiang Tao, Bahman Mirheidari, Madhurananda Pahar, Markus Reuber, Heidi Christensen
2025Can We Reconstruct a Dysarthric Voice with the Large Speech Model Parler TTS?
Ariadna Sanchez, Simon King
2025Can We Trust Machine Learning? The Reliability of Features from Open-Source Speech Analysis Tools for Speech Modeling.
Tahiya Chowdhury, Verónica Romero
2025Can we train ASR systems on Code-switch without real code-switch data? Case study for Singapore's languages.
Tuan Nguyen, Huy Dat Tran
2025Cantonese Punctuation Restoration using LLM Annotated Data.
King Yiu Suen, Rudolf Chow, Albert Y. S. Lam
2025Causal Structure Discovery for Error Diagnostics of Children's ASR.
Vishwanath Pratap Singh, Md. Sahidullah, Tomi Kinnunen
2025Chain-of-Thought Distillation with Fine-Grained Acoustic Cues for Speech Emotion Recognition.
Jialong Mai, Xiaofen Xing, Yangbiao Li, Xiangmin Xu
2025Chain-of-Thought Training for Open E2E Spoken Dialogue Systems.
Siddhant Arora, Jinchuan Tian, Hayato Futami, Jee-weon Jung, Jiatong Shi, Yosuke Kashiwagi, Emiru Tsunoo, Shinji Watanabe
2025Challenges and practical guidelines for atypical speech data collection, annotation, usage and sharing: A multi-project perspective.
Zhengjun Yue, Mara Barberis, Tanvina Patel, Judith Dineley, Willemijn Doedens, Lottie Stipdonk, Yuanyuan Zhang, Elke De Witte, Erfan Loweimi, Hugo Van hamme, Djaina Satoer, Marina B. Ruiter, Laureano Moro-Velázquez, Nicholas Cummins, Odette Scharenborg
2025Challenges in Automated Processing of Speech from Child Wearables: The Case of Voice Type Classifier.
Tarek Kunze, Marianne Métais, Hadrien Titeux, Lucas Elbert, Joseph Coffey, Emmanuel Dupoux, Alejandrina Cristià, Marvin Lavechin
2025Character Error Rate Estimation for Semi-Supervised Training of Speech Recognition for Arabic Dialects.
Chanho Park, Oscar Saz
2025Characterization of voice cue sensitivity and vocal emotion recognition across the adult lifespan.
Laura Rachman, Deniz Baskent
2025Children's Voice Privacy: First Steps and Emerging Challenges.
Ajinkya Kulkarni, Francisco Teixeira, Enno Hermann, Thomas Rolland, Isabel Trancoso, Mathew Magimai-Doss
2025ClapFM-EVC: High-Fidelity and Flexible Emotional Voice Conversion with Dual Control from Natural Language and Speech.
Yu Pan, Yanni Hu, Yuguang Yang, Jixun Yao, Jianhao Ye, Hongbin Zhou, Lei Ma, Jianjun Zhao
2025ClaritySpeech: Dementia Obfuscation in Speech.
Dominika C. Woszczyk, Ranya Aloufi, Soteris Demetriou
2025ClearerVoice-Studio: Bridging Advanced Speech Processing Research and Practical Deployment.
Shengkui Zhao, Zexu Pan, Bin Ma
2025Clinical Annotations for Automatic Stuttering Severity Assessment.
Ana Rita Valente, Rufael Marew, Hawau Olamide Toyin, Hamdan Al-Ali, Anelise Bohnen, Inma Becerra, Elsa Marta Soares, Gonçalo Leal, Hanan Aldarmaki
2025Clustering-based Hard Negative Sampling for Supervised Contrastive Speaker Verification.
Piotr Masztalski, Michal Romaniuk, Jakub Zak, Mateusz Matuszewski, Konrad Kowalczyk
2025Co-Speech Motion for Virtual Agents in Dialogue Using LLM-Driven Primitive Action Selection.
Muhammad Yeza Baihaqi, Angel F. Garcia Contreras, Seiya Kawano, Koichiro Yoshino
2025Co-registration of real-time MRI and respiration for speech research.
Yubin Zhang, Prakash Kumar, Ye Tian, Ziwei Zhao, Xuan Shi, Kevin Huang, Kevin Lee, Haley Hsu, Shrikanth Narayanan, Krishna S. Nayak, Louis Goldstein
2025Cocktail-Party Audio-Visual Speech Recognition.
Thai-Binh Nguyen, Ngoc-Quan Pham, Alexander Waibel
2025Code Mix TTS: An Approach to Infer Human Like Speech for Multi-Lingual Input Texts.
Vishal Gourav, Phanindra Mankale
2025Codec-Based Deepfake Source Tracing via Neural Audio Codec Taxonomy.
Xuanjun Chen, I-Ming Lin, Lin Zhang, Jiawei Du, Haibin Wu, Hung-yi Lee, Jyh-Shing Roger Jang
2025Collecting, Curating, and Annotating Good Quality Speech deepfake dataset for Famous Figures: Process and Challenges.
Hashim Ali, Surya Subramani, Raksha Varahamurthy, Nithin Sai Adupa, Lekha Bollinani, Hafiz Malik
2025CommissionsQC: a Québec French Speech Corpus for Automatic Speech Recognition.
Coralie Serrand, Amira Morsli, Gilles Boulianne
2025Comparative Analysis of Fast and High-Fidelity Neural Vocoders for Low-Latency Streaming Synthesis in Resource-Constrained Environments.
Reo Yoneyama, Masaya Kawamura, Ryo Terashima, Ryuichi Yamamoto, Tomoki Toda
2025Comparative Evaluation of Acoustic Feature Extraction Tools for Clinical Speech Analysis.
Anna Seo Gyeong Choi, Alexander Richardson, Ryan Partlan, Sunny X. Tang, Sunghye Cho
2025Comparison of Acoustic and Textual Features for Dysarthria Severity Classification in Amyotrophic Lateral Sclerosis.
Y. S. Upendra Vishwanath, Tanuka Bhattacharjee, Deekshitha G, Sathvik Udupa, Chowdam Venkata Thirumala Kumar, Madassu Keerthipriya, Darshan Chikktimmegowda, Dipti Baskar, Yamini Belur, Seena Vengalil, Atchayaram Nalini, Prasanta Kumar Ghosh
2025Comparison-Based Automatic Evaluation for Meeting Summarization.
Ziwei Gong, Lin Ai, Harsh Deshpande, Alexander Johnson, Emmy Phung, Zehui Wu, Ahmad Emami, Julia Hirschberg
2025Concurrent Speech and Auditory Tag Clouds for Non-Visual Web Interaction.
Dhia Eddine Merzougui, Nilesh Tete, Fabrice Maurel, Gaël Dias, Mohammed Hasanuzzaman, Aurélien Bournonville, Edgar Madelaine, Thomas Berthelin Le Tellier, François Ledoyen, Laure Poutrain-Lejeune, François Rioult, Jérémie Pantin
2025Conformer-based Ultrasound-to-Speech Conversion.
Ibrahim Ibrahimov, Csaba Zainkó, Gábor Gosztolya
2025Constrained LDDMM for Dynamic Vocal Tract Morphing: Integrating Volumetric and Real-Time MRI.
Tharinda Piyadasa, Joan Glaunès, Amelia Gully, Michael Proctor, Kirrie J. Ballard, Tünde Szalay, Naeim Sanaei, Sheryl Foster, David Waddington, Craig T. Jin
2025Context is all you need? Low-resource conversational ASR profits from context, coming from the same or from the other speaker.
Julian Linke, Jana Winkler, Barbara Schuppler
2025Context-Driven Dynamic Pruning for Large Speech Foundation Models.
Masao Someki, Shikhar Bharadwaj, Atharva Anand Joshi, Chyi-Jiunn Lin, Jinchuan Tian, Jee-weon Jung, Markus Müller, Nathan Susanj, Jing Liu, Shinji Watanabe
2025Contextual Paralinguistic Data Creation for Multi-Modal Speech-LLM: Data Condensation and Spoken QA Generation.
Qiongqiong Wang, Hardik B. Sailor, Tianchi Liu, Ai Ti Aw
2025Contextual predictability effects on acoustic distinctiveness in read Polish speech.
Zofia Malisz, Jan Foremski, Malgorzata Kul
2025Contextualized Automatic Speech Recognition with Dynamic Vocabulary Prediction and Activation.
Zhennan Lin, Kaixun Huang, Wei Ren, Linju Yang, Lei Xie
2025Continual Speech Learning with Fused Speech Features.
Guitao Wang, Jinming Zhao, Hao Yang, Guilin Qi, Tongtong Wu, Gholamreza Haffari
2025Continuous Learning for Children's ASR: Overcoming Catastrophic Forgetting with Elastic Weight Consolidation and Synaptic Intelligence.
Edem Ahadzi, Vishwanath Pratap Singh, Tomi Kinnunen, Ville Hautamäki
2025Continuous prediction of backchannel timing for human-robot interaction.
Michael Paierl, Martin Hagmüller, Barbara Schuppler
2025Contrastive Learning-based Syllable-Level Mispronunciation Detection and Diagnosis for Speech Audiometry.
Longbin Jin, Donghun Min, Jung Eun Shin, Eun Yi Kim
2025Conveying Gender Through Speech: Insights from Trans Men.
Alice Ross, Cliodhna Hughes, Eddie L. Ungless, Catherine Lai
2025Coping with segmental-prosodic incongruity in spoken word recognition in Japanese.
Terumichi Ariga
2025Corpus-Based Insights into Mandarin Neutral Tone: Effects of Tonal Context and Structural Patterns in Spontaneous Speech.
Jingyi Sun, Nicolas Audibert, Yaru Wu, Martine Adda-Decker
2025Count Your Speakers! Multitask Learning for Multimodal Speaker Diarization.
Prabhav Singh, Jesús Villalba, Najim Dehak
2025Counterfactual Activation Editing for Post-hoc Prosody and Mispronunciation Correction in TTS Models.
Kyowoon Lee, Artyom Stitsyuk, Gunu Jho, Inchul Hwang, Jaesik Choi
2025Creaky Voice Facilitates More Efficient Phonological Processing of Mandarin Tone 3.
Zixia Fan, Ronny Ibrahim, Joshua Penney, Felicity Cox
2025Cross-Attention-Based Target Sound Extraction by Fully Leveraging Enrollment in a Shared Latent Space.
Xue Yang, Guiru Shen, Yu Yang
2025Cross-Modal Watermarking for Authentic Audio Recovery and Tamper Localization in Synthesized Audiovisual Forgeries.
Minyoung Kim, Sehwan Park, Sungmin Cha, Paul Hongsuck Seo
2025Cross-attention and Self-attention for Audio-visual Speaker Diarization in MISP-Meeting Challenge.
Zhaoyang Li, Haodong Zhou, Longjie Luo, Xiaoxiao Li, Yongxin Chen, Lin Li, Qingyang Hong
2025Cross-corpus open-set Speech Emotion Recognition Method Based on Spatiotemporal Features with Inverse-Entropy Regularization.
Zhaohui Zhou, Hui Luo
2025Cross-lingual Data Selection Using Clip-level Acoustic Similarity for Enhancing Low-resource Automatic Speech Recognition.
Shunsuke Mitsumori, Sara Kashiwagi, Keitaro Tanaka, Shigeo Morishima
2025Cross-modal Knowledge Transfer Learning as Graph Matching Based on Optimal Transport for ASR.
Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai
2025CrossPhon: An Auto Phone Mapping Tool to Streamline Cross-language Modeling for Phone Alignment of Low-resource Languages.
Hongchen Wu, Yixin Gu
2025Crowdsourcing MUSHRA Tests in the Age of Generative Speech Technologies: A Comparative Analysis of Subjective and Objective Testing Methods.
Laura Lechler, Chamran Moradi, Ivana Balic
2025Cryfish: On deep audio analysis with Large Language Models.
Anton Mitrofanov, Sergey Novoselov, Tatiana Prisyach, Vladislav Marchevskiy, Arseniy Karelin, Nikita Khmelev, Dmitry Dutov, Stepan Malykh, Igor Agafonov, Aleksandr Nikitin, Oleg Petrov
2025D-GAT: Dual Graph Attention Network for Global HRTF Interpolation.
Junsheng Hu, Shaojie Li, Qintuya Si, De Hu
2025DAFMSVC: One-Shot Singing Voice Conversion with Dual Attention Mechanism and Flow Matching.
Wei Chen, Binzhu Sha, Dan Luo, Jing Yang, Zhuo Wang, Fan Fan, Zhiyong Wu
2025DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models.
Heng-Jui Chang, Hongyu Gong, Changhan Wang, James R. Glass, Yu-An Chung
2025DGMO: Training-Free Audio Source Separation through Diffusion-Guided Mask Optimization.
Geonyoung Lee, Geonhee Han, Paul Hongsuck Seo
2025DLF-EEND: Dynamic Layer Fusion for End-to-End Speaker Diarization.
Wooil Kim, Bongsu Jung
2025DRI-GAN: A Novel Dual Real Input GAN with Triplet Loss for Cross-Lingual and Noisy SLU.
Ankit Kumar, Munir Georges
2025DS-Codec: Dual-Stage Training with Mirror-to-NonMirror Architecture Switching for Speech Codec.
Peijie Chen, Wenhao Guan, Kaidi Wang, Weijie Wu, Hukai Huang, Qingyang Hong, Lin Li
2025DYNAC: Dynamic Vocabulary-based Non-Autoregressive Contextualization for Speech Recognition.
Yui Sudo, Yosuke Fukumoto, Muhammad Shakeel, Yifan Peng, Chyi-Jiunn Lin, Shinji Watanabe
2025Data Augmentation using Speech Synthesis for Speaker-Independent Dysarthria Severity Classification.
Minseop Kim, Minsu Han, Seokyoung Hong, Myoung-Wan Koo
2025Data-driven approaches to pitch modelling in two Mexican Spanish ethnolects: K-means Clustering & GAMMs.
Gilly Marchini, Jeremy Steffman
2025Decoding Alzheimer's: Interpretable Visual and Logical Attention in Picture Description Tasks.
Ning Wang, Bingyang Wen, Minghui Wu, Yang Sun, Zongru Shao, Haojie Zhou, K. P. Subbalakshmi
2025Decoding Listener's Identity: Person Identification from EEG Signals Using a Lightweight Spiking Transformer.
Zheyuan Lin, Siqi Cai, Haizhou Li
2025Decoding Speaker-Normalized Pitch from EEG for Mandarin Perception.
Jia-Xin Chen, Yi-Ming Wang, Ziyu Zhang, Jiayang Han, Yin-Long Liu, Rui Feng, Xiuyuan Liang, Zhen-Hua Ling, Jia-Hong Yuan
2025Deep learning based spatial aliasing reduction in beamforming for audio capture.
Mateusz Guzik, Giulio Cengarle, Daniel Arteaga
2025Deep-Simplex Multichannel Speech Separation.
Tzlil Avidan, Bracha Laufer-Goldshtein
2025DeepFilterGAN: A Full-band Real-time Speech Enhancement System with GAN-based Stochastic Regeneration.
Sanberk Serbest, Tijana Stojkovic, Milos Cernak, Andrew Harper
2025Defend for Self-Vocoding: A Novel Enhanced Decoder Network for Watermark Recovery.
Yu-Sheng Lin, Ching-Yu Yang, Hsing-Hang Chou, Ya-Tse Wu, Bo-Hao Su, Chi-Chun Lee
2025Defending Speech-enabled LLMs Against Adversarial Jailbreak Threats.
Antonios Alexos, Raghuveer Peri, Sai Muralidhar Jayanthi, Metehan Cekic, Srikanth Vishnubhotla, Kyu J. Han, Srikanth Ronanki
2025Defending Unauthorized Voice Cloning with Watermark-Aware Codecs.
Jiankun Zhao, Lingwei Meng, Chengxi Deng, Helen Meng, Xixin Wu
2025Delayed-KD: Delayed Knowledge Distillation based CTC for Low-Latency Streaming ASR.
Longhao Li, Yangze Li, Hongfei Xue, Jie Liu, Shuai Fang, Kai Wang, Lei Xie
2025DepressGEN: Synthetic Data Generation Framework for Depression Detection.
Wenrui Liang, Rong Zhang, Xuezhen Zhang, Ying Ma, Wei-Qiang Zhang
2025Developing High-Quality TTS for Punjabi and Urdu: Benchmarking against MMS Models.
Fatima Naseem, Maham Sajid, Farah Adeeba, Sahar Rauf, Asad Mustafa, Sarmad Hussain
2025Developing a High-performance Framework for Speech Emotion Recognition in Naturalistic Conditions Challenge for Emotional Attribute Prediction.
Thanathai Lertpetchpun, Tiantian Feng, Dani Byrd, Shrikanth Narayanan
2025Developing a LeFF Transformer Model for Exacerbated Speech Detection in COPD and Asthma.
Yuyang Yan, Sami O. Simons, Visara Urovi
2025Developing a Top-tier Framework in Naturalistic Conditions Challenge for Categorized Emotion Prediction: From Speech Foundation Models and Learning Objective to Data Augmentation and Engineering Choices.
Tiantian Feng, Thanathai Lertpetchpun, Dani Byrd, Shrikanth Narayanan
2025Development and Validation of a Wav2Vec 2.0-Based Cross-Language Methodology for Measurement of Articulatory Precision.
Tanya Talkar, Kan Kawabata, Connor Higgins, Sean Tobyne
2025Dhvani: A Weakly-supervised Phonemic Error Detection and Personalized Feedback System for Hindi.
Arnav Rustagi, Satvik Bajpai, Nimrat Kaur, Siddharth
2025DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech.
Deok-Hyeon Cho, Hyung-Seok Oh, Seung-Bin Kim, Seong-Whan Lee
2025Dialogue Response Prefetching Based on Semantic Similarity and Prediction Confidence of Language Model.
Kiyotada Mori, Seiya Kawano, Angel F. Garcia Contreras, Koichiro Yoshino
2025Diarization-Guided Multi-Speaker Embeddings.
Joonas Kalda, Clément Pagés, Tanel Alumäe, Hervé Bredin
2025DiceHuBERT: Distilling HuBERT with a Self-Supervised Learning Objective.
Hyung-Gun Chi, Zakaria Aldeneh, Tatiana Likhomanenko, Oggi Rudovic, Takuya Higuchi, Li-Wei Chen, Shinji Watanabe, Ahmed Hussen Abdelaziz
2025DiffDSR: Dysarthric Speech Reconstruction Using Latent Diffusion Model.
Xueyuan Chen, Dongchao Yang, Wenxuan Wu, Minglin Wu, Jing Xu, Xixin Wu, Zhiyong Wu, Helen Meng
2025DiffEmotionVC: A Dual-Granularity Disentangled Diffusion Framework for Any-to-Any Emotional Voice Conversion.
Xiaosu Su, Bowen Yang, Xiaowei Yi, Yun Cao
2025DiffMV-ETS: Diffusion-based Multi-Voice Electromyography-to-Speech Conversion using Speaker-Independent Speech Training Targets.
Kevin Scheck, Tom Dombeck, Zhao Ren, Peter Wu, Michael Wand, Tanja Schultz
2025DiffStereo: End-to-End Mono-to-Stereo Audio Generation with Diffusion Transformer.
Suqi Zhang, Zheqi Dai, Yongyi Zang, Yin Cao, Qiuqiang Kong
2025Differentiable K-means for Fully-optimized Discrete Token-based ASR.
Kentaro Onda, Yosuke Kashiwagi, Emiru Tsunoo, Hayato Futami, Shinji Watanabe
2025Differentiable Reward Optimization for LLM based TTS system.
Changfeng Gao, Zhihao Du, Shiliang Zhang
2025Diffusion Buffer: Online Diffusion-based Speech Enhancement with Sub-Second Latency.
Bunlong Lay, Rostilav Makarov, Timo Gerkmann
2025Direct-path Relative Harmonic Coefficients Detection for Multi-source Direction-of-Arrival Estimation in Reverberant Environments.
Liang Tao, Maoshen Jia, Yonggang Hu
2025Direction-Aware Neural Acoustic Fields for Few-Shot Interpolation of Ambisonic Impulse Responses.
Christopher Ick, Gordon Wichern, Yoshiki Masuyama, François G. Germain, Jonathan Le Roux
2025Directional Speech Recognition with Full-Duplex Capability.
Ju Lin, Yiteng Huang, Ming Sun, Frank Seide, Florian Metze
2025Discl-VC: Disentangled Discrete Tokens and In-Context Learning for Controllable Zero-Shot Voice Conversion.
Kaidi Wang, Wenhao Guan, Ziyue Jiang, Hukai Huang, Peijie Chen, Weijie Wu, Qingyang Hong, Lin Li
2025Discovering Directions of Uncertainty in Speech Inpainting.
Kfir Cohen, Lior Wolf, Bracha Laufer-Goldshtein
2025Discrete Audio Representations for Automated Audio Captioning.
Jingguang Tian, Haoqin Sun, Xinhui Hu, Xinkang Xu
2025Discrete Tokens Exhibit Interlanguage Speech Intelligibility Benefit: an Analytical Study Towards Accent-robust ASR Only with Native Speech Data.
Kentaro Onda, Keisuke Imoto, Satoru Fukayama, Daisuke Saito, Nobuaki Minematsu
2025Disentangling Dual-Encoder Masked Autoencoder for Respiratory Sound Classification.
Peidong Wei, Shiyu Miao, Lin Li
2025Disentangling Speaker and Content in Pre-trained Speech Models with Latent Diffusion for Robust Speaker Verification.
Zhe Li, Man-Wai Mak, Jen-Tzung Chien, Mert Pilanci, Zezhong Jin, Helen Meng
2025Distilling a speech and music encoder with task arithmetic.
Fabian Ritter Gutierrez, Yi-Cheng Lin, Jui-Chiang Wei, Jeremy H. M. Wong, Eng Siong Chng, Nancy F. Chen, Hung-yi Lee
2025DnR-nonverbal: Cinematic Audio Source Separation DatasetContaining Non-Verbal Sounds.
Takuya Hasumi, Yusuke Fujita
2025Do you read me? - flow of speech effect on speaker recognition systems.
Alicja Martinek, Joanna Gajewska, Ewelina Bartuzi-Trokielewicz
2025Does English fish sound like French fiche? Perceptual similarity judgments versus acoustic similarity.
Rory Turnbull, Elisa Kiefer, Sharon Peperkamp
2025Does effortful speech production indicate communication difficulty caused by noise and hearing aid support?
Lena-Marie Huttner, Jeppe H. Christensen, Gitte Keidser, Tobias May, Torsten Dau, Sergi Rotger-Griful
2025Dog2vec: Self-Supervised Pre-Training for Canine Vocal Representation.
Xingyuan Li, Kenny Q. Zhu, Mengyue Wu
2025Domain Adaptation Method and Modality Gap Impact in Audio-Text Models for Prototypical Sound Classification.
Emiliano Acevedo, Martín Rocamora, Magdalena Fuentes
2025DuRep: Dual-Mode Speech Representation Learning via ASR-Aware Distillation.
Prabash Reddy Male, Swayambhu Nath Ray, Harish Arsikere, Akshat Jaiswal, Prakhar Swarup, Prantik Sen, Debmalya Chakrabarty, K. V. Vijay Girish, Nikhil Bhave, Frederick Weber, Sambuddha Bhattacharya, Sri Garimella
2025Dual Orthogonality Sub-center Loss for Enhanced Anomalous Sound Detection.
Dong Wang, Jiqing Han, Tieran Zheng, Guibin Zheng, Yongjun He
2025DualCodec: A Low-Frame-Rate, Semantically-Enhanced Neural Audio Codec for Speech Generation.
Jiaqi Li, Xiaolong Lin, Zhekai Li, Shixi Huang, Yuancheng Wang, Chaoren Wang, Zhenpeng Zhan, Zhizheng Wu
2025Dynamic Acoustic Model Architecture Optimization in Training for ASR.
Jingjing Xu, Zijian Yang, Albert Zeyer, Eugen Beck, Ralf Schlüter, Hermann Ney
2025Dynamic Context-Aware Streaming Pretrained Language Model For Inverse Text Normalization.
Luong Ho, Khanh Le, Vinh Pham, Bao Nguyen, Tan Tran, Duc Chau
2025Dynamic Layer Gating for Speech Enhancement.
Venkatesh Parvathala, K. Sri Rama Murty
2025Dysarthric Speech Recognition Using Curriculum Learning and Multi-stream Architecture.
I-Ting Hsieh, Chung-Hsien Wu
2025Dysfluent WFST: A Framework for Zero-Shot Speech Dysfluency Transcription and Detection.
Chenxu Guo, Jiachen Lian, Xuanru Zhou, Jinming Zhang, Shuhe Li, Zongli Ye, Peter Park, Anaisha Das, Zoe Ezzes, Jet Vonk, Brittany Morin, Rian Bogley, Lisa Wauters, Zachary A. Miller, Maria Luisa Gorno-Tempini, Gopala Anumanchipalli
2025E2E-BPVC: End-to-End Background-Preserving Voice Conversion via In-Context Learning.
Yihan Liu, Zhengyang Chen, Leying Zhang, Yanmin Qian
2025EAA: Emotion-Aware Audio Large Language Models with Dual Cross-Attention and Context-Aware Instruction Tuning.
Hongfei Du, Sidi Lu, Gang Zhou, Ye Gao
2025EASY: Emotion-aware Speaker Anonymization via Factorized Distillation.
Jixun Yao, Hexin Liu, Eng Siong Chng, Lei Xie
2025EATS-Speech: Emotion-Adaptive Transformation and Priority Synthesis for Zero-Shot Text-to-Speech.
Jingyuan Xing, Zhipeng Li, Shuaiqi Chen, Xiaofen Xing, Xiangmin Xu
2025EEG-based Speech Decoding Based on Multi-mode Joint Modeling.
Peiran Li, Fei Chen, Xixin Wu
2025EEG-based Voice Conversion : Hearing the Voice of Your Brain.
Yizhong Geng, Wenxin Fu, Qihang Lu, Bingsong Bai, Cong Wang, Yingming Gao, Ya Li
2025EME-TTS: Unlocking the Emphasis and Emotion Link in Speech Synthesis.
Haoxun Li, Leyuan Qu, Jiaxi Hu, Taihao Li
2025Echoes of Phonetics: Unveiling Relevant Acoustic Cues for ASR via Feature Attribution.
Dennis Fucci, Marco Gaido, Matteo Negri, Mauro Cettolo, Luisa Bentivogli
2025Effect of Loudspeaker Emitted Speech on ASR performance.
Vikram C. M, Sanjoy Pal, Nidhi Mantri, Gopal Kumar Agrawal
2025Effect of Noise Floor in Room Impulse Response on Speech Perception Under Spherical Harmonics-based Spatial Sound Reproduction.
Yunqi C. Zhang, Dhruv Jagmohan, Hong Kit Li, C. T. Justine Hui, Yusuke Hioka
2025Effect of physical exercise on voice in people living with COPD.
Lauren G. Reinders, Loes van Bemmel, Alexander Mackay, David Nobbs, Frits M. E. Franssen, Hester Gietema, Simona Schäfer, Sami O. Simons
2025Effective Context in Neural Speech Models.
Yen Meng, Sharon Goldwater, Hao Tang
2025Effective and Efficient One-pass Compression of Speech Foundation Models Using Sparsity-aware Self-pinching Gates.
Haoning Xu, Zhaoqing Li, Youjun Chen, Huimeng Wang, Guinan Li, Mengzhe Geng, Chengxi Deng, Xunying Liu
2025Effects of Prosodic Information on Dialect Classification Using Whisper Features.
Phoebe Parsons, Heming Strømholt Bremnes, Knut Kvale, Torbjørn Svendsen, Giampiero Salvi
2025Effects of Speaker Count, Duration, and Accent Diversity on Zero-Shot Accent Robustness in Low-Resource ASR.
Zheng Xin Yong, Vineel Pratap, Michael Auli, Jean Maillard
2025Efficient Data Selection for Domain Adaptation of ASR Using Pseudo-Labels and Multi-Stage Filtering.
Pradeep Rangappa, Andrés Carofilis, Jeena J. Prakash, Shashi Kumar, Sergio Burdisso, Srikanth R. Madikeri, Esaú Villatoro-Tello, Bidisha Sharma, Petr Motlícek, Kadri Hacioglu, Shankar Venkatesan, Saurabh Vyas, Andreas Stolcke
2025Efficient Multilingual ASR Finetuning via LoRA Language Experts.
Jiahong Li, Yiwen Shao, Jianheng Zhuo, Chenda Li, Liliang Tang, Dong Yu, Yanmin Qian
2025Efficient Neural and Numerical Methods for High-QualityOnline Speech Spectrogram Inversion via Gradient Theorem.
Andres Fernandez, Juan Azcarreta Ortiz, Çagdas Bilen, Jesus Monge-Alvarez
2025Efficient Noise-Robust Hybrid Audiovisual Encoder with Joint Distillation and Pruning for Audiovisual Speech Recognition.
Zhengyang Li, Pascal Reichert, Thomas Graave, Patrick Blumenberg, Tim Fingscheidt
2025Efficient Speech Enhancement via Embeddings from Pre-trained Generative Audioencoders.
Xingwei Sun, Heinrich Dinkel, Yadong Niu, Linzhang Wang, Junbo Zhang, Jian Luan
2025Efficient Streaming Speech Quality Prediction with Spiking Neural Networks.
Mattias Nilsson, Riccardo Miccini, Julian Rossbroich, Clément Laroche, Tobias Piechowiak, Friedemann Zenke
2025Efficient Streaming TTS Acoustic Model with Depthwise RVQ Decoding Strategies in a Mamba Framework.
Joun Yeop Lee, Sangjun Park, Byoung Jin Choi, Ji-Hyun Lee, Min-Kyung Kim, Hoon-Young Cho
2025Efficient Trie-based Biasing using K-step Prediction for Rare Word Recognition.
Kwok Chin Yuen, Jia Qi Yip
2025Efficient and Direct Duplex Modeling for Speech-to-Speech Language Model.
Ke Hu, Ehsan Hosseini-Asl, Chen Chen, Edresson Casanova, Subhankar Ghosh, Piotr Zelasko, Zhehuai Chen, Jason Li, Jagadeesh Balam, Boris Ginsburg
2025Efficient and Microphone-Fault-Tolerant 3D Sound Source Localization.
Yiyuan Yang, Shitong Xu, Niki Trigoni, Andrew Markham
2025Egocentric Speaker Classification in Child-Adult Dyadic Interactions: From Sensing to Computational Modeling.
Tiantian Feng, Anfeng Xu, Xuan Shi, Somer Bishop, Shrikanth Narayanan
2025Eigenvoice Synthesis based on Model Editing for Speaker Generation.
Masato Murata, Koichi Miyazaki, Tomoki Koriyama, Tomoki Toda
2025EmbedAug: An Augmentation Scheme for End-to-End Automatic Speech Recognition.
Ashish Panda, Sunil Kumar Kopparapu
2025EmoDB 2.0: A Database of Emotional Speech in a World that is not Black or White but Grey.
Felix Burkhardt, Oliver Schrüfer, Uwe D. Reichel, Hagen Wierstorf, Anna Derington, Florian Eyben, Björn W. Schuller
2025EmoJudge: LLM Based Post-Hoc Refinement for Multimodal Speech Emotion Recognition.
Prabhav Singh, Jesús Villalba
2025EmoSpeechAuth: Emotion-Aware Speaker Verification.
Magdalena Golebiowska, Piotr Syga
2025EmoSphere-SER: Enhancing Speech Emotion Recognition Through Spherical Representation with Auxiliary Classification.
Deok-Hyeon Cho, Hyung-Seok Oh, Seung-Bin Kim, Seong-Whan Lee
2025Emotion-Guided Graph Attention Networks for Speech-Based Depression Detection under Emotion-Inducting Tasks.
Yuqiu Zhou, Yongjie Zhou, Yudong Yang, Yang Liu, Jun Huang, Shuzhi Zhao, Rongfeng Su, Lan Wang, Nan Yan
2025EmotionRankCLAP: Bridging Natural Language Speaking Styles and Ordinal Speech Emotion via Rank-N-Contrast.
Shreeram Suresh Chandra, Lucas Goncalves, Junchen Lu, Carlos Busso, Berrak Sisman
2025Employing self-supervised learning models for cross-linguistic child speech maturity classification.
Theo Zhang, Madurya Suresh, Anne Warluamont, Kasia Hitczenko, Alejandrina Cristià, Margaret Cychosz
2025Empowering Large Language Models for End-to-End Speech Translation Leveraging Synthetic Data.
Yu Pu, Xiaoqian Liu, Guangyu Zhang, Zheng Yan, Wei-Qiang Zhang, Xie Chen
2025EnCodecMAE: leveraging neural codecs for universal audio representation learning.
Leonardo Pepino, Pablo Riera, Luciana Ferrer
2025Enabling the replicability of speech synthesis perceptual evaluations.
Sébastien Le Maguer, Gwénolé Lecorvé, Damien Lolive, Naomi Harte, Juraj Simko
2025End-to-End DOA-Guided Speech Extraction in Noisy Multi-Talker Scenarios.
Kangqi Jing, Wenbin Zhang, Yu Gao
2025End-to-End Diarization utilizing Attractor Deep Clustering.
David Palzer, Matthew Maciejewski, Eric Fosler-Lussier
2025End-to-End Indian Language Dubbing with Zero-Shot Speaker Preservation.
Giri Raju, Sandeep Konam
2025End-to-End Speech Translation Guided by Robust Translation Capability of Large Language Model.
Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi
2025End-to-End Speech Translation for Low-Resource Languages Using Weakly Labeled Data.
Aishwarya Pothula, Bhavana Akkiraju, Srihari Bandarupalli, Charan Devarkonda, Santosh Kesiraju, Anil Kumar Vuppala
2025Enhanced Hybrid Transducer and Attention Encoder Decoder with Text Data.
Yun Tang, Eesung Kim, Vijendra Raj Apsingekar
2025Enhancing Acoustic-to-Articulatory Inversion with Multi-Target Pretraining for Low-Resource Settings.
Jesuraj Bandekar, Prasanta Kumar Ghosh
2025Enhancing Acoustic-to-Articulatory Speech Inversion by Incorporating Nasality.
Saba Tabatabaee, Suzanne Boyce, Liran Oren, Mark Tiede, Carol Y. Espy-Wilson
2025Enhancing Audio Deepfake Detection by Improving Representation Similarity of Bonafide Speech.
Seung-Bin Kim, Hyun-seo Shin, Jungwoo Heo, Chan-yeong Lim, Kyo-Won Koo, Jisoo Son, Sanghyun Hong, Souhwan Jung, Ha-Jin Yu
2025Enhancing GOP in CTC-Based Mispronunciation Detection with Phonological Knowledge.
Aditya Kamlesh Parikh, Cristian Tejedor García, Catia Cucchiarini, Helmer Strik
2025Enhancing Generalization of Speech Large Language Models with Multi-Task Behavior Imitation and Speech-Text Interleaving.
Jingran Xie, Xiang Li, Hui Wang, Yue Yu, Yang Xiang, Xixin Wu, Zhiyong Wu
2025Enhancing Low-Resource Language and Instruction Following Capabilities of Audio Language Models.
Potsawee Manakul, Guangzhi Sun, Warit Sirichotedumrong, Kasima Tharnpipitchai, Kunat Pipatanakul
2025Enhancing Lyrics Transcription on Music Mixtures with Consistency Loss.
Jiawen Huang, Felipe Sousa, Emir Demirel, Emmanouil Benetos, Igor Gadelha
2025Enhancing Retrieval-Augmented Audio Captioning with Generation-Assisted Multimodal Querying and Progressive Learning.
Changin Choi, Sungjun Lim, Wonjong Rhee
2025Enhancing Serialized Output Training for Multi-Talker ASR with Soft Monotonic Alignment and Utterance-level Timestamp.
Fengyun Tan, Tao Wei, Kun Zou, Ning Cheng, Shaojun Wang, Jing Xiao
2025Enhancing Speech Emotion Recognition with Multi-Task Learning and Dynamic Feature Fusion.
Honghong Wang, Jing Deng, Fanqin Meng, Rong Zheng
2025Enhancing Speech Instruction Understanding and Disambiguation in Robotics via Speech Prosody.
David Sasu, Benedict Quartey, Kweku Andoh Yamoah, Natalie Schluter
2025Enhancing Syllabic Recognition via Speech-EEG Phase Analysis and Non-Activity State Modeling.
Rini A. Sharon, Hema A. Murthy
2025Enhancing Target-speaker Automatic Speech Recognition Using Multiple Speaker Embedding Extractors with Virtual Speaker Embedding.
Ju-Seok Seong, Jeong-Hwan Choi, Ye-Rin Jeoung, Ilseok Kim, Joon-Hyuk Chang
2025Enhancing Transcripts of Open-Source Automatic Speech Recognition Models Through Fine-Tuning with Laughter and Speech-Laugh.
Phuoc Hoang Ho, Dragos Alexandru Balan, Dirk K. J. Heylen, Khiet P. Truong
2025EnvSDD: Benchmarking Environmental Sound Deepfake Detection.
Han Yin, Yang Xiao, Rohan Kumar Das, Jisheng Bai, Haohe Liu, Wenwu Wang, Mark D. Plumbley
2025Equivalence and differences: Formant patterns of labialization and pharyngealization in Tashlhiyt.
Philipp Buech, Anne Hermes, Rachid Ridouane
2025Evaluating ASR Robustness to Spontaneous Speech Errors: A Study of WhisperX Using a Speech Error Database.
John Alderete, Macarious Kin Fung Hui, Aanchan Mohan
2025Evaluating Automatic Speech Recognition Pipelines for Mandarin-English Bilingual Child Language Assessment in Telehealth.
Hongchen Wu, Yao Du, Zirong Li, Yixin Gu, Disha Thotappala Jayaprakash, Li Sheng
2025Evaluating Deep Speaker Embedding Robustness to Domain, Sampling Rate, and Codec Variations.
Alexandre Ferro Filho, Diogo Fernandes Costa Silva, Pedro Elias Engelberg Silva Borges, Arlindo Rodrigues Galvão Filho
2025Evaluating Large Language Models in Data Generation for Low-Resource Scenarios: A Case Study on Question Answering.
Ebru Arisoy, Merve Ünlü Menevse, Yusufcan Manav, Arzucan Özgür
2025Evaluating Logit-Based GOP Scores for Mispronunciation Detection.
Aditya Kamlesh Parikh, Cristian Tejedor García, Catia Cucchiarini, Helmer Strik
2025Evaluating Parameter Sharing for Spoofing-Aware Speaker Verification: A Case Study on the ASVspoof 5 Dataset.
Aykut Büker, Oguzhan Kurnaz, Sule Bekiryazici, Selim Can Demirtas, Cemal Hanilçi
2025Evaluating Progress of CALL System Users on Accentedness and Comprehensibility: An Acoustic and ASR-Based Approach.
Wenwei Dong, Catia Cucchiarini, Roeland van Hout, Helmer Strik
2025Evaluating Speech Enhancement Performance Across Demographics and Language.
José Giraldo, Alex Peiró Lilja, Carme Armentano-Oller, Rodolfo Zevallos, Cristina España-Bonet
2025Evaluating Speech Foundation Models for Automatic Speech Recognition in the Low-Resource Kanyen'kéha Language.
Mengzhe Geng, Patrick Littell, Aidan Pine, Robbie Jimerson, Gilles Boulianne, Vishwa Gupta, Rolando Coto-Solano, Anna Kazantseva, Marc Tessier, Delaney Lothian, Akwiratékha' Martin, Eric Joanis, Samuel Larkin, Roland Kuhn
2025Evaluating Wav2Vec2-Bert for Computer-Assisted Pronunciation Training for isiZulu.
Alexandra Fort, Francis Tyers
2025Evaluating the Effectiveness of Pre-Trained Audio Embeddings for Classification of Parkinson's Disease Speech Data.
Emmy Postma, Cristian Tejedor García
2025Evaluating the Usefulness of Non-Diagnostic Speech Data for Developing Parkinson's Disease Classifiers.
Terry Yi Zhong, Esther Janse, Cristian Tejedor García, Louis ten Bosch, Martha A. Larson
2025Evaluating the suitability of acoustic parameters for capturing breathy voice in non-pathological female speakers.
Chloe Patman, Paul Foulkes, Kirsty McDougall
2025Evaluation of Three Automatic Alignment Tools for the Processing of Non-native French.
Qian Zhou, Mathilde Hutin
2025Evaluation of a model for sound radiation from the vocal tract wall.
Peter Birkholz, Tianyi Zhang
2025ExagTTS: An Approach Towards Controllable Word Stress Incorporated TTS for Exaggerated Synthesized Speech Aiding Second Language Learners.
Anindita Mondal, Monica Surtani, Anil Kumar Vuppala, Parameswari Krishnamurthy, Chiranjeevi Yarra
2025Examining Test-Time Adaptation for Personalized Child Speech Recognition.
Zhonghao Shi, Xuan Shi, Anfeng Xu, Tiantian Feng, Harshvardhan Srivastava, Shrikanth Narayanan, Maja J. Mataric
2025Explainable Depression Detection using Masked Hard Instance Mining.
Patawee Prakrankamanant, Shinji Watanabe, Ekapol Chuangsuwanich
2025Explainable Speech Emotion Recognition Through Attentive Pooling: Insights from Attention-Based Temporal Localization.
Tahitoa Leygue, Astrid Sabourin, Christian Bolzmacher, Sylvain Bouchigny, Margarita Anastassova, Quoc-Cuong Pham
2025Exploiting Bispectral Features for Single-Channel Speech Enhancement.
Venkatesh Parvathala, Ramesh Gundluru, Sreekanth Sankala, K. Sri Rama Murty
2025Exploiting Context-dependent Duration Features for Voice Anonymization Attack Systems.
Natalia A. Tomashenko, Emmanuel Vincent, Marc Tommasi
2025Exploiting Echo Path Priors for Enhanced Stereo Acoustic Echo Cancellation.
Jinfu Wang, Ziteng Wang, Xin Liu, Yang Liu, Qing Shi, Zhengqiang Luo, Feiran Yang
2025Exploratory Analysis of Brainstem fMRI Data During Sustained Phonation.
Carey Smith, Hu Cheng, Pertti Palo, Daniel Aalto, Steven M. Lulich
2025Exploratory Study of Filled Pauses in Ukrainian Language: Phonetic Properties of Filled Pauses.
Anna Havras, Carlos Mendes, Helena Moniz, Gueorgui Hristovsky, João Miranda
2025Exploring Efficient Directional and Distance Cues for Regional Speech Separation.
Yiheng Jiang, Haoxu Wang, Yafeng Chen, Gang Qiao, Biao Tian
2025Exploring Generative Error Correction for Dysarthric Speech Recognition.
Moreno La Quatra, Alkis Koudounas, Valerio Mario Salerno, Sabato Marco Siniscalchi
2025Exploring Linear Variant Transformers and k-NN Memory Inference for Long-Form ASR.
Carlos Carvalho, Jinchuan Tian, William Chen, Yifan Peng, Alberto Abad, Shinji Watanabe
2025Exploring Pre-trained models on Ultrasound Modeling for Mice Autism Detection with Uniform Filter Bank and Attentive Scoring.
Yuchen Song, Yucong Zhang, Ming Li
2025Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR.
Mingyu Cui, Yifan Yang, Jiajun Deng, Jiawen Kang, Shujie Hu, Tianzi Wang, Zhaoqing Li, Shiliang Zhang, Xie Chen, Xunying Liu
2025Exploring Shared-Weight Mechanisms in Transformer and Conformer Architectures for Automatic Speech Recognition.
Thomas Rolland, Alberto Abad
2025Exploring auditory feedback mechanisms in speech recognition.
Louise Coppieters de Gibson, Philip N. Garner
2025Exploring the Effect of Segmentation and Vocabulary Size on Speech Tokenization for Speech Language Models.
Shunsuke Kando, Yusuke Miyao, Shinnosuke Takamichi
2025Exploring the Limits of Conformer CTC-Encoder for Speech Emotion Recognition using Large Language Models.
Edmilson Da Silva Morais, Hagai Aronowitz, Aharon Satt, Ron Hoory, Avihu Dekel, Brian Kingsbury, George Saon
2025Exploring the Power of Empirical Mode Decomposition for Sensing the Sound of Silence: A Pilot Study on Mice Autism Detection via Ultrasonic Vocalisation.
Chenhao Wu, Xiangjun Cai, Haojie Zhang, Tianrui Jia, Yilu Deng, Kun Qian, Björn W. Schuller, Yoshiharu Yamamoto, Jiang Liu
2025Extended High-frequency Cues to Phoneme Recognition: Insights from ASR.
Zhe-chen Guo, Bharath Chandrasekaran
2025Extended Loss: Incorporating Long Context into Training Models when using Short Audio Frames.
Quang Minh Dinh, Hoda Rezaee Kaviani, Mehrdad Hosseinzadeh, Yuanhao Yu
2025Extending the Fongbe to French Speech Translation Corpus: resources, models and benchmark.
D. Fortune Kponou, Salima Mdhaffar, Fréjus A. A. Laleye, Eugène C. Ezin, Yannick Estève
2025EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer.
Jiarui Hai, Yong Xu, Hao Zhang, Chenxing Li, Helin Wang, Mounya Elhilali, Dong Yu
2025FD-Bench: A Full-Duplex Benchmarking Pipeline Designed for Full Duplex Spoken Dialogue Systems.
Yizhou Peng, Yi-Wen Chao, Dianwen Ng, Yukun Ma, Chongjia Ni, Bin Ma, Eng Siong Chng
2025FFD: Fine-Finger Diffusion Model for Music to Fine-grained Finger Dance Generation.
Boya Dong, Wentao Lei, Li Liu
2025FLASepformer: Efficient Speech Separation with Gated Focused Linear Attention Transformer.
Haoxu Wang, Yiheng Jiang, Gang Qiao, Pengteng Shi, Biao Tian
2025FROST-EMA: Finnish and Russian Oral Speech Dataset of Electromagnetic Articulography Measurements with L1, L2 and Imitated L2 Accents.
Satu Hopponen, Tomi Kinnunen, Alexandre Nikolaev, Rosa González Hautamäki, Lauri Tavi, Einar Meister
2025FT-Boosted SV: Towards Noise Robust Speaker Verification for English Speaking Classroom Environments.
Saba Tabatabaee, Jing Liu, Carol Y. Espy-Wilson
2025FUSE-MOS: Fusion of Speech Embeddings for MOS Prediction with Uncertainty Quantification.
Enjamamul Hoq, Nikhil Gupta, Danielle Omondi, Ifeoma Nwogu
2025FUSE: Universal Speech Enhancement using Multi-Stage Fusion of Sparse Compression and Token Generation Models for the URGENT 2025 Challenge.
Nabarun Goswami, Tatsuya Harada
2025FaVC: A Validated, Transcribed, Parallel Farsi Speech Dataset for Voice Conversion.
Mina Serajian, Saeed Najafzadeh Rahaghi, Hadi Veisi, Saman Haratizadeh
2025Face2VoiceSync: Lightweight Face-Voice Consistency for Text-Driven Talking Face Generation.
Fang Kang, Yin Cao, Haoyu Chen
2025Facilitating Personalized TTS for Dysarthric Speakers Using Knowledge Anchoring and Curriculum Learning.
Yejin Jeon, Solee Im, Youngjae Kim, Gary Geunbae Lee
2025Fact-Controlled Diagnosis of Hallucinations in Medical Text Summarization.
Suhas BN, Han-Chin Shing, Lei Xu, Mitch Strong, Jon Burnsky, Jessica Ofor, Jordan R. Mason, Susan Chen, Sundararajan Srinivasan, Chaitanya Shivade, Jack Moriarty, Joseph Paul Cohen
2025Factorized RVQ-GAN For Disentangled Speech Tokenization.
Sameer Khurana, Dominik Klement, Antoine Laurent, Dominik Bobos, Juraj Novosad, Peter Gazdik, Ellen Zhang, Zili Huang, Amir Hussein, Ricard Marxer, Yoshiki Masuyama, Ryo Aihara, Chiori Hori, François G. Germain, Gordon Wichern, Jonathan Le Roux
2025Factors affecting the in-context learning abilities of LLMs for dialogue state tracking.
Pradyoth Hegde, Santosh Kesiraju, Jan Svec, Simon Sedlácek, Bolaji Yusuf, Oldrich Plchot, Deepak K. T, Jan Cernocký
2025FaiST: A Benchmark Dataset for Fairness in Speech Technology.
Maliha Jahan, Yinglun Sun, Priyam Mazumdar, Zsuzsanna Fagyal, Thomas Thebaud, Jesús Villalba, Mark Hasegawa-Johnson, Najim Dehak, Laureano Moro-Velázquez
2025FairASR: Fair Audio Contrastive Learning for Automatic Speech Recognition.
Jongsuk Kim, Jaemyung Yu, Minchan Kwon, Junmo Kim
2025Fairness in Dysarthric Speech Synthesis: Understanding Intrinsic Bias in Dysarthric Speech Cloning using F5-TTS.
Anuprabha M, Krishna Gurugubelli, Anil Kumar Vuppala
2025FasterVoiceGrad: Faster One-step Diffusion-Based Voice Conversion with Adversarial Diffusion Conversion Distillation.
Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Yuto Kondo
2025Feature Importance across Domains for Improving Non-Intrusive Speech Intelligibility Prediction in Hearing Aids.
Ryandhimas E. Zezario, Sabato Marco Siniscalchi, Fei Chen, Hsin-Min Wang, Yu Tsao
2025Federated Learning with Feature Space Separation for Speaker Recognition.
Ying Meng, Zhihua Fang, Liang He
2025Few-Shot Speech Deepfake Detection Adaptation with Gaussian Processes.
Neta Glazer, David Chernin, Idan Achituve, Sharon Gannot, Ethan Fetaya
2025Few-step Adversarial Schrödinger Bridge for Generative Speech Enhancement.
Seungu Han, Sungho Lee, Juheon Lee, Kyogu Lee
2025Fifteen Years of Child-Centered Long-Form Recordings: Promises, Resources, and Remaining Challenges to Validity.
Loann Peurey, Marvin Lavechin, Tarek Kunze, Manel Khentout, Lucas Gautheron, Emmanuel Dupoux, Alejandrina Cristià
2025Finding the Human Voice in AI: Insights on the Perception of AI-Voice Clones from Naturalness and Similarity Ratings.
Linda Bakkouche, Charles McGhee, Emily Lau, Stephanie Cooper, Xinbing Luo, Madeleine Rees, Kai Alter, Brechtje Post, Julia Schwarz
2025Fine-Tuning ASR for Stuttered Speech: Personalized vs. Generalized Approaches.
Dena F. Mujtaba, Nihar R. Mahapatra
2025Fine-Tuning Text-to-Speech Diffusion Models Using Reinforcement Learning with Human Feedback.
Jingyi Chen, Ju-Seung Byun, Micha Elsner, Pichao Wang, Andrew Perrault
2025Fine-tune Before Structured Pruning: Towards Compact and Accurate Self-Supervised Models for Speaker Diarization.
Jiangyu Han, Federico Landini, Johan Rohdin, Anna Silnova, Mireia Díez, Jan Cernocký, Lukás Burget
2025Fine-tuning Parakeet-TDT for Dysarthric Speech Recognition in the Speech Accessibility Project Challenge.
Kaito Takahashi, Keigo Hojo, Toshimitsu Sakai, Yukoh Wakabayashi, Norihide Kitaoka
2025Fine-tuning Strategies for Automatic Speech Recognition of Low-Resource Speech with Autism Spectrum Disorder.
Yeseul Park, Bowon Lee
2025Finetune Large Pre-Trained Model Based on Frequency-Wise Multi-Query Attention Pooling for Anomalous Sound Detection.
Nan Jiang, Yan Song, Qing Gu, Haoyu Song, Lirong Dai, Ian McLoughlin
2025First Analyze Then Enhance: A Task-Aware System for Speech Separation, Denoising, and Dereverberation.
Shaoxiang Dang, Li Li, Shogo Seki, Hiroaki Kudo
2025First Steps Towards Voice Anonymization for Code-Switching Speech.
Sarina Meyer, Ekaterina Kolos, Ngoc Thang Vu
2025Flexible VAD-PVAD Transition: A Detachable PVAD Module for Dynamic Encoder RNN VAD.
En-Lun Yu, Chien-Chun Wang, Jeih-weih Hung, Shih-Chieh Huang, Berlin Chen
2025FlowSE: Efficient and High-Quality Speech Enhancement via Flow Matching.
Ziqian Wang, Zikai Liu, Xinfa Zhu, Yike Zhu, Mingshuai Liu, Jun Chen, Longshuai Xiao, Chao Weng, Lei Xie
2025FlowTSE: Target Speaker Extraction with Flow Matching.
Aviv Navon, Aviv Shamsian, Yael Segal-Feldman, Neta Glazer, Gil Hetz, Joseph Keshet
2025Focal Modulation Network: A Novel Solution for Polyphonic Music Instrument Recognition without Attention and Aggregation Strategy.
Lekshmi Chandrika Reghunath, Rajeev Rajan
2025FoleyMaster: High-Quality Video-to-Audio Synthesis via MLLM-Augmented Prompt Tuning and Joint Semantic-Temporal Adaptation.
Liming Liang, Luo Chen, Yuehan Jin, Xianwei Zhuang, Yuxin Xie, Yongkang Yin, Yuexian Zou
2025Foundation Model Hidden Representations for Heart Rate Estimation from Auscultation.
Jingping Nie, Tien Dung Tran, Karan Thakkar, Vasudha Kowtha, Jon Huang, Carlos Avendaño, Erdrin Azemi, Vikramjit Mitra
2025FreeCodec: A Disentangled Neural Speech Codec with Fewer Tokens.
Youqiang Zheng, Weiping Tu, Yueteng Kang, Jie Chen, Yike Zhang, Li Xiao, Yuhong Yang, Long Ma
2025French Listening Tests for the Assessment of Intelligibility, Quality, and Identity of Body-Conducted Speech Enhancement.
Thomas Joubaud, Julien Hauret, Véronique Zimpfer, Éric Bavu
2025French schwa is not acoustically distinct from its two lexical neighbors /ø/ and /œ/.
Mathilde Hutin, Mélanie Lancien, Noam Faust
2025Frequency-Domain Enhanced Extreme Bandwidth Extension Network with ICCRN for Superior Speech Quality.
Hongtao Bao, Xueliang Zhang
2025From Context to Code-switching: Examining the Interplay of Language Proficiency and Multilingualism in Speech.
Debasmita Bhattacharya, Aanya Tolat, Julia Hirschberg
2025From KAN to GR-KAN: Advancing Speech Enhancement with KAN-Based Methodology.
Haoyang Li, Yuchen Hu, Chen Chen, Sabato Marco Siniscalchi, Songting Liu, Eng Siong Chng
2025From Pretraining to Performance: Benchmarking Self-Supervised Speech Models for Interspeech-25 SER Challenge.
Drishya Uniyal, Vinayak Abrol
2025From Scarcity to Sufficiency: Speech Recognition Pipeline for Zero-resource Language.
Nikolay Karpov, Sofia Kostandian, Nune Tadevosyan, Alexan Ayrapetyan, Andrei Andrusenko, Ara Yeroyan, Mher Yerznkanyan, Vitaly Lavrukhin
2025From Sharpness to Better Generalization for Speech Deepfake Detection.
Wen Huang, Xuechen Liu, Xin Wang, Junichi Yamagishi, Yanmin Qian
2025From Speech Science to Language Transparence.
Alexander Waibel
2025From Static to Dynamic: Enhancing AAC with Generative Imagery and Zero-Shot TTS.
Juliana Francis, Joakim Gustafsson, Éva Székely
2025From Talking and Listening Devices to Intelligent Communicative Machines.
Roger K. Moore
2025From Weak Labels to Strong Results: Utilizing 5, 000 Hours of Noisy Classroom Transcripts with Minimal Accurate Data.
Ahmed Adel Attia, Dorottya Demszky, Jing Liu, Carol Y. Espy-Wilson
2025From Words to Waves: Analyzing Concept Formation in Speech and Text-Based Foundation Models.
Asim Ersoy, Basel Ahmad Mousi, Shammur Absar Chowdhury, Firoj Alam, Fahim Dalvi, Nadir Durrani
2025Frozen Large Language Models Can Perceive Paralinguistic Aspects of Speech.
Wonjune Kang, Junteng Jia, Chunyang Wu, Wei Zhou, Egor Lakomkin, Yashesh Gaur, Leda Sari, Suyoun Kim, Ke Li, Jay Mahadeokar, Ozlem Kalinli
2025Fully End-to-end Streaming Open-vocabulary Keyword Spotting with W-CTC Forced Alignment.
Dohyun Kim, Jiwook Hwang
2025Fully Few-shot Class-incremental Audio Classification Using Multi-level Embedding Extractor and Ridge Regression Classifier.
Yongjie Si, Yanxiong Li, Jiaxin Tan, Qianhua He, Il-Youp Kwak
2025Functional Connectivity and Hilbert-Based Features for Covert Speech EEG Variability Analysis and Classification.
Saravanakumar Duraisamy, Maurice Rekrut, Luis A. Leiva
2025GALAXY: A Large-Scale Open-Domain Dataset for Multimodal Learning.
Yihan Wu, Yichen Lu, Yijing Chen, Jiaqi Song, William Chen, Ruihua Song, Shinji Watanabe
2025GIA-MIC: Multimodal Emotion Recognition with Gated Interactive Attention and Modality-Invariant Learning Constraints.
Jiajun He, Jinyi Mi, Tomoki Toda
2025GLCLAP: A Novel Contrastive Learning Pre-trained Model for Contextual Biasing in ASR.
Yuxiang Kong, Fan Cui, Liyong Guo, Heinrich Dinkel, Lichun Fan, Junbo Zhang, Jian Luan
2025GST-BERT-TTS: Prosody Prediction Without Accentual Labels For Multi-Speaker TTS Using BERT With Global Style Tokens.
Tadashi Ogura, Takuma Okamoto, Yamato Ohtani, Erica Cooper, Tomoki Toda, Hisashi Kawai
2025GTA: Towards Generative Text-To-Audio Retrieval via Multi-Scale Tokenizer.
Minghui Fang, Shengpeng Ji, Jialong Zuo, Xize Cheng, Wenrui Liu, Xiaoda Yang, Ruofan Hu, Jieming Zhu, Zhou Zhao
2025GTAnet: Geometry-Guided Temporal Attention for EEG-Based Sound Source Tracking in Cocktail Party Scenarios.
Saurav Pahuja, Gabriel Ivucic, Siqi Cai, Dashanka De Silva, Haizhou Li, Tanja Schultz
2025Gaze-Enhanced Multimodal Turn-Taking Prediction in Triadic Conversations.
Seongsil Heo, Christi Miller, Calvin Murdock, Michael J. Proulx
2025GenECA: A General-Purpose Framework for Real-Time Adaptive Multimodal Embodied Conversational Agents.
Santosh V. Patapati, Aashrith Tatineni, Trisanth Srinivasan
2025Generalizable Audio Deepfake Detection via Hierarchical Structure Learning and Feature Whitening in Poincaré sphere.
Mingru Yang, Yanmei Gu, Qianhua He, Yanxiong Li, Peirong Zhang, Yongqiang Chen, Zhiming Wang, Huijia Zhu, Jian Liu, Weiqiang Wang
2025Generalizable Audio Spoofing Detection using Non-Semantic Representations.
Arnab Das, Yassine El Kheir, Carlos Franzreb, Tim Herzig, Tim Polzehl, Sebastian Möller
2025Generating Consistent Prosodic Patterns from Open-Source TTS Systems.
Ha Eun Shim, Olivia Yung, Paige Tuttösí, Boey Kwan, Angelica Lim, Yue Wang, H. Henny Yeung
2025GigaAM: Efficient Self-Supervised Learner for Speech Recognition.
Aleksandr Kutsakov, Alexandr Maximenko, Georgii Gospodinov, Pavel Bogomolov, Fyodor Minkin
2025GoP2Vec: A few shot learning for pronunciation assessment with goodness of pronunciation (GoP) based representations from an i-vector framework and augmentation.
Meenakshi Sirigiraju, Chiranjeevi Yarra
2025Gradual modeling of the Lombard effect by modifying speaker embeddings from a Text-To-Speech model.
Thiago Henrique Gomes Lobato, Magnus Schäfer
2025Grammatical Error Detection on Spontaneous Children's Speech Using Iterative Pseudo Labeling.
Christopher Gebauer, Lars Rumberg, Lars Köhn, Hanna Ehlert, Edith Beaulac, Jörn Ostermann
2025Granary: Speech Recognition and Translation Dataset in 25 European Languages.
Nithin Rao Koluguri, Monica Sekoyan, George Zelenfroynd, Sasha Meister, Shuoyang Ding, Sofia Kostandian, He Huang, Nikolay Karpov, Jagadeesh Balam, Vitaly Lavrukhin, Yifan Peng, Sara Papi, Marco Gaido, Alessio Brutti, Boris Ginsburg
2025Grapheme-Coherent Phonemic and Prosodic Annotation of Speech by Implicit and Explicit Grapheme Conditioning.
Hien Ohnaka, Yuma Shirahata, Byeongseon Park, Ryuichi Yamamoto
2025GraphemeAug: A Systematic Approach to Synthesized Hard Negative Keyword Spotting Examples.
Harry Zhang, Kurt Partridge, Pai Zhu, Neng Chen, Hyun Jin Park, Dhruuv Agarwal, Quan Wang
2025H-QuEST: Accelerating Query-by-Example Spoken Term Detection with Hierarchical Indexing.
Akanksha Singh, Yi-Ping Phoebe Chen, Vipul Arora
2025HASRD: Hierarchical Acoustic and Semantic Representation Disentanglement.
Amir Hussein, Sameer Khurana, Gordon Wichern, François G. Germain, Jonathan Le Roux
2025HK-GenSpeech: A Generative AI Scene Creation Framework for Speech Based Cognitive Assessment.
Vi Jun Sean Yong, Serkan Kumyol, Pau Le Lisa Low, Winnie Suk Wai Leung, Tristan Braud
2025HWB-Net: A Novel High-Performance and Efficient Hybrid Waveform Bandwidth Extension Method.
Xin Liu, Shulin He, Xueliang Zhang
2025HYFuse: Aligning Heterogeneous Speech Pre-Trained Representations in Hyperbolic Space for Speech Emotion Recognition.
Orchid Chetia Phukan, Girish, Mohd Mujtaba Akhtar, Swarup Ranjan Behera, Pailla Balakrishna Reddy, Arun Balaji Buduru, Rajesh Sharma
2025Harnessing Text-to-Speech Voice Cloning Models for Improved Audiological Speech Assessment.
Lidea Shahidi, Erdem Baha Topbas, Thu Ngan Dang, Tobias Goehring
2025Hear Me Out: Interactive evaluation and bias discovery platform for speech-to-speech conversational AI.
Shree Harsha Bokkahalli Satish, Gustav Eje Henter, Éva Székely
2025Hearing deficits of transformer-based ASR for anechoic and spatial signals.
Dirk Eike Hoffner, Simon Weihe, Thomas Brand, Bernd T. Meyer
2025Hearing from Silence: Reasoning Audio Descriptions from Silent Videos via Vision-Language Model.
Yong Ren, Chenxing Li, Le Xu, Hao Gu, Duzhen Zhang, Yujie Chen, Manjie Xu, Ruibo Fu, Shan Yang, Dong Yu
2025Heart Rate as a Proxy Measure to Assess Human Confidence in Spoken Speech.
Harish Battula, Gauri Deshpande, Yagna Gudipalli, Sachin Patel
2025HiFiTTS-2: A Large-Scale High Bandwidth Speech Dataset.
Ryan Langman, Xuesong Yang, Paarth Neekhara, Shehzeen Hussain, Edresson Casanova, Evelina Bakhturina, Jason Li
2025How do both phonological and syntactic complexity influence speech planning?
Ivan Yuen, Katherine Demuth, Stefanie Shattuck-Hufnagel
2025How sibilant spectra shape gender perception in prepubertal children: A voice morphing study.
Riccarda Funk, Melanie Weirich, Adrian P. Simpson
2025How to Connect Speech Foundation Models and Large Language Models? What Matters and What Does Not.
Francesco Verdini, Pierfrancesco Melucci, Stefano Perna, Francesco Cariaggi, Marco Gaido, Sara Papi, Szymon Mazurek, Marek Kasztelnik, Luisa Bentivogli, Sébastien Bratières, Paolo Merialdo, Simone Scardapane
2025How to Recover Long Audio Sequences Through Gradient Inversion Attack With Dynamic Segment-based Reconstruction.
Xijie Zeng, Frank Rudzicz
2025HuBERT-VIC: Improving Noise-Robust Automatic Speech Recognition of Speech Foundation Model via Variance-Invariance-Covariance Regularization.
Hyebin Ahn, Kangwook Jang, Hoirin Kim
2025Hybrid Data Sampling for ASR: Integrating Acoustic Diversity and Transcription Uncertainty.
Komei Hiruta, Yosuke Yamano, Hideaki Tamori
2025Hybrid Expert Knowledge and Self-Supervised Learning for Diagnostic Modeling of Adductor Spasmodic and Primary Myotonic Dysphonia.
Zhou Du, Hang Chen, Huijun Ding, Jun Du, Zhen Chen
2025Hybrid HMM-SVM classifier using frication-based features for detection of non-normative sibilant articulation patterns in Polish children's speech.
Zuzanna Miodonska
2025I want a horror - comedy - movie: Slips-of-the-Tongue Impact Conversational Recommender System Performance.
Maria Teleki, Lingfeng Shi, Chengkai Liu, James Caverlee
2025IDIR: Identifying and Distilling Informative Relations for Speaker Verification.
Chong-Xin Gan, Zhe Li, Zezhong Jin, Zilong Huang, Man-Wai Mak, Kong Aik Lee
2025Identification of Pathological Pronunciation Profiles in ASR Transcription Errors.
Margot Masson, Isabelle Ferrané, Julie Mauclair
2025Identifying Primary Stress Across Related Languages and Dialects with Transformer-based Speech Encoder Models.
Nikola Ljubesic, Ivan Porupski, Peter Rupnik
2025Identifying Vocal and Facial Biomarkers of Depression in Large-Scale Remote Recordings: A Multimodal Study Using Mixed-Effects Modeling.
Nelson Hidalgo Julia, Robert Lewis, Craig Ferguson, Simon Goldberg, Wendy Lau, Caroline Swords, Gabriela Valdivia, Christine D. Wilson-Mendenhall, Raquel Tartar, Rosalind W. Picard, Richard Davidson
2025Impact of Background Noise on Turn-Taking Dynamics in Triadic Conversations.
Valeska Slomianka, Tobias May, Torsten Dau
2025Improved Intelligibility of Dysarthric Speech using Conditional Flow Matching.
Shoutrik Das, Nishant Singh, Arjun Gangwar, S. Umesh
2025Improving Audio Classification by Transitioning from Zero- to Few-Shot.
James Taylor, Wolfgang Mack
2025Improving Automatic Speech Recognition for Children's Reading Assessment with Disfluency-aware Language Models.
Jazmín Vidal, Luciana Ferrer, Juan Esteban Kamienkowski, Pablo Riera
2025Improving Bird Classification with Primary Color Additives.
Ezhini Rasendiran R, Chandresh Kumar Maurya
2025Improving Child Speech Recognition and Reading Mistake Detection by Using Prompts.
Lingyun Gao, Cristian Tejedor García, Catia Cucchiarini, Helmer Strik
2025Improving Cross-Attention based on Positional Alignment during Inference for Robust Long-form Speech Recognition.
Changhan Oh, Kiyoung Park, Jeom-ja Kang, Woo Yong Choi, Hwa Jeon Song
2025Improving End-to-end Mixed-case ASR with Knowledge Distillation and Integration of Voice Activity Cues.
Sashi Novitasari, Takashi Fukuda, Gakuto Kurata
2025Improving Generalization of End-to-End ASR through Diversity and Independence Regularization.
Ye-Eun Ko, Mun-Hak Lee, Dong-Hyun Kim, Joon-Hyuk Chang
2025Improving Linguistic Diversity of Large Language Models with Possibility Exploration Fine-Tuning.
Long Mai, Julie Carson-Berndsen
2025Improving Low-Resource Dialect Classification Using Retrieval-based Voice Conversion.
Lea Fischbach, Akbar Karimi, Caroline Kleen, Alfred Lameli, Lucie Flek
2025Improving Multilingual Speech Models on ML-SUPERB 2.0: Fine-tuning with Data Augmentation and LID-Aware CTC.
Qingzheng Wang, Jiancheng Sun, Yifan Peng, Shinji Watanabe
2025Improving Noise Robustness of LLM-based Zero-shot TTS via Discrete Acoustic Token Denoising.
Ye-Xin Lu, Hui-Peng Du, Fei Liu, Yang Ai, Zhen-Hua Ling
2025Improving Practical Aspects of End-to-End Multi-Talker Speech Recognition for Online and Offline Scenarios.
Aswin Shanmugam Subramanian, Amit Das, Naoyuki Kanda, Jinyu Li, Xiaofei Wang, Yifan Gong
2025Improving Respiratory Sound Classification with Architecture-Agnostic Knowledge Distillation from Ensembles.
Miika Toikkanen, June-Woo Kim
2025Improving Speech Emotion Recognition Through Cross Modal Attention Alignment and Balanced Stacking Model.
Lucas H. Ueda, João Lima, Leonardo Marques, Paula Dornhofer Paro Costa
2025Improving Synthetic Data Training for Contextual Biasing Models with a Keyword-Aware Cost Function.
Kwok Chin Yuen, Jia Qi Yip, Eng Siong Chng
2025Improving User Impression of Spoken Dialogue Systems by Controlling Para-linguistic Expression Based on Intimacy.
Shoki Kawanishi, Akinori Ito, Yuya Chiba, Takashi Nose
2025In This Environment, As That Speaker: A Text-Driven Framework for Multi-Attribute Speech Conversion.
Jiawei Jin, Zhihan Yang, Yixuan Zhou, Zhiyong Wu
2025In-context Language Learning for Endangered Languages in Speech Recognition.
Zhaolin Li, Jan Niehues
2025In-context learning capabilities of Large Language Models to detect suicide risk among adolescents from speech transcripts.
Filomene Roquefort, Alexandre Ducorroy, Rachid Riad
2025Incorporating Linguistic Constraints from External Knowledge Source for Audio-Visual Target Speech Extraction.
Wenxuan Wu, Shuai Wang, Xixin Wu, Helen Meng, Haizhou Li
2025Individualized speech enhancement for hearing-impaired listeners.
Chuan Wen, Sarah Verhulst
2025Infant Cry Emotion Recognition Using Improved ECAPA-TDNN with Multi-scale Feature Fusion and Attention Enhancement.
Junyu Zhou, Yanxiong Li, Haolin Yu
2025InfiniteAudio: Infinite-Length Audio Generation with Consistency.
Chaeyoung Jung, Hojoon Ki, Ji-Hoon Kim, Junmo Kim, Joon Son Chung
2025Influence of Proficiency and L2 Experience on Dynamic Spectral Cue Utilization in L2 Vowel Perception and Production.
Linda Bakkouche, Brechtje Post
2025Influence of Room Acoustics on Objective Voice Assessment Methods in the Context of Speech and Language Therapy.
Sven Franz, Tanja Grewe, Bernd T. Meyer, Jörg Bitzer
2025Influence of wall coverings of 3D-printed vocal tract models on measured transfer functions.
Peter Birkholz, Dominik Schäfer, Patrick Häsner, Jihyeon Yun, Iris Kruppke, Rémi Blandin
2025Instantaneous changes in acoustic signals reflect syllable progression and cross-linguistic syllable variation.
Haley Hsu, Dani Byrd, Khalil Iskarous, Louis Goldstein
2025Intelligibility Prediction for Time-Modified Speech Signals Using Spectro-Temporal Modulation Features.
Aymen Bashir, Haolan Wang, Amin Edraki, Wai-Yip Chan, Jesper Jensen
2025Intelligibility of Text-to-Speech Systems for Mathematical Expressions.
Sujoy Roychowdhury, Ranjani H. G., Sumit Soman, Nishtha Paul, Subhadip Bandyopadhyay, Siddhanth Iyengar
2025Inter-Speaker Relative Cues for Text-Guided Target Speech Extraction.
Wang Dai, Archontis Politis, Tuomas Virtanen
2025Interactive Fusion of Multi-View Speech Embeddings via Pretrained Large-Scale Speech Models for Speech Emotional Attribute Prediction in Naturalistic Conditions.
Yuyun Liu, Yujia Gu, Jiahao Luo, Wenming Zheng, Cheng Lu, Yuan Zong
2025Interspeech 2025 URGENT Speech Enhancement Challenge.
Kohei Saijo, Wangyou Zhang, Samuele Cornell, Robin Scheibler, Chenda Li, Zhaoheng Ni, Anurag Kumar, Marvin Sach, Yihui Fu, Wei Wang, Tim Fingscheidt, Shinji Watanabe
2025Intrasentential English in Swedish TTS: perceived English-accentedness.
Christina Tånnander, David House, Jonas Beskow, Jens Edlund
2025Introducing EMOPARKNZ: the Emotional Speech Database from New Zealand English Speakers with Parkinson's Disease.
Itay Ben-Dom, Catherine I. Watson, Clare M. McCann
2025Investigating Affect Mining Techniques for Annotation Sample Selection in the Creation of Finnish Affective Speech Corpus.
Kalle Lahtinen, Einari Vaaras, Liisa Mustanoja, Okko Räsänen
2025Investigating Gender Bias in Text-to-Audio Generation Models.
Aarish Shah Mohsin, Mohammad Nadeem, Shahab Saquib Sohail, Tughrul Arslan, Mandar Gogate, Nasir Saleem, Amir Hussain
2025Investigating Glottal Stop Coda Loss During Sound Change of Checked Syllables Based on Speech-EGG Voice Offset Alignment.
Bingliang Zhao, Xiyu Wu
2025Investigating Stochastic Methods for Prosody Modeling in Speech Synthesis.
Paul Mayer, Florian Lux, Alejandro Pérez González de Martos, Angelina Elizarova, Lindsey Vanderlyn, Dirk Väth, Ngoc Thang Vu
2025Investigating continuous autoregressive generative speech enhancement.
Haici Yang, Gordon Wichern, Ryo Aihara, Yoshiki Masuyama, Sameer Khurana, François G. Germain, Jonathan Le Roux
2025Investigating effects of sex hormones, cycle phases and age on female fundamental frequency.
Melanie Weirich, Adrian P. Simpson
2025Investigating the Impact of Word Informativeness on Speech Emotion Recognition.
Sofoklis Kakouros
2025Investigating the Reasonable Effectiveness of Speaker Pre-Trained Models and their Synergistic Power for SingMOS Prediction.
Orchid Chetia Phukan, Girish, Mohd Mujtaba Akhtar, Swarup Ranjan Behera, Pailla Balakrishna Reddy, Arun Balaji Buduru, Rajesh Sharma
2025Investigating the Reasoning Abilities of Large Language Models for Understanding Spoken Language in Interpersonal Interactions.
Pranjal Aggarwal, Ghritachi Mahajani, Pavan Kumar Malasani, Vaibhav Jamadagni, Caroline J. Wendt, Ehsanul Haque Nirjhar, Theodora Chaspari
2025Is Synthetic Data Truly Effective for Training Speech Language Models?
Tomoya Mizumoto, Atsushi Kojima, Yusuke Fujita, Lianbo Liu, Yui Sudo
2025Is it all about race?: A Cross-examination of /s/ in a Multilingual (Nigerian) Context.
Oluwasegun Amoniyan
2025Is your model big enough? Training and interpreting large-scale monolingual speech foundation models.
Yaroslav Getman, Tamás Grósz, Tommi Lehtonen, Mikko Kurimo
2025Iterative Refinement, Not Training Objective, Makes HuBERT Behave Differently from wav2vec 2.0.
Robin Huo, Ewan Dunbar
2025J-SPAW: Japanese speaker verification and spoofing attacks recorded in-the-wild dataset.
Sayaka Shiota, Suzuka Horie, Kouta Kanno, Shinnosuke Takamichi
2025J-j-j-just Stutter: Benchmarking Whisper's Performance Disparities on Different Stuttering Patterns.
Charan Sridhar, Shaomei Wu
2025JIS: A Speech Corpus of Japanese Idol Speakers with Various Speaking Styles.
Yuto Kondo, Hirokazu Kameoka, Kou Tanaka, Takuhiro Kaneko
2025Joint Rate Allocation and Sensor Selection for Speech Enhancement in Wireless Acoustic Sensor Networks.
De Hu, Qilong Li
2025Joint Reference Microphone Selection and Filter Order Determination in Multi-channel Active Noise Control.
De Hu, Shuyao Liu, Yanrong He
2025Joint Target-Speaker ASR and Activity Detection.
Chikara Maeda, Muhammad Shakeel, Yui Sudo
2025Jointly Improving Dialect Identification and ASR in Indian Languages using Multimodal Feature Fusion.
Saurabh Kumar, Amartyaveer, Prasanta Kumar Ghosh
2025Kinship in Speech: Leveraging Linguistic Relatedness for Zero-Shot TTS in Indian Languages.
Utkarsh Pathak, Chandra Sai Krishna Gunda, Anusha Prakash, Keshav Agarwal, Hema A. Murthy
2025Knowledge Distillation Method for Pruned RNN-T Models via Pruning Bounds Sharing and Losses Confusion.
Xiaocan Zhang, Weiwei Jiang, Guibin Zheng, Chenhao Jing, Jiqing Han, Tieran Zheng
2025L3C-DeepMFC: Low-Latency Low-Complexity Deep Marginal Feedback Cancellation with Closed-Loop Fine Tuning for Hearing Aids.
Fengyuan Hao, Brian C. J. Moore, Huiyong Zhang, Xiaodong Li, Chengshi Zheng
2025LASPA: Language Agnostic Speaker Disentanglement with Prefix-Tuned Cross-Attention.
Aditya Srinivas Menon, Raj Prakash Gohil, Kumud Tripathi, Pankaj Wasnik
2025LATE: Open Source Toolkit for Latvian and Latgalian Speech Transcription.
Arturs Znotins, Didzis Gosko, Normunds Gruzitis
2025LHCP-ASR: An English Speech Corpus of High-Energy Particle Physics Talks for Narrow-Domain ASR Benchmarking.
Jaume Santamaria-Jorda, Pablo Segovia-Martínez, Gonçal V. Garcés Díaz-Munío, Joan Albert Silvestre-Cerdà, Adrià Giménez, Rubén Gaspar Aparicio, René Fernández Sánchez, Jorge Civera, Albert Sanchís, Alfons Juan
2025LID Models are Actually Accent Classifiers: Implications and Solutions for LID on Accented Speech.
Niyati Bafna, Matthew Wiesner
2025LIST: Language-Independent Speech Token for Multilingual Speech Synthesis with Language Models.
Chang Liu, Zhen-Hua Ling, Yu Gu
2025LLM-Synth4KWS: Scalable Automatic Generation and Synthesis of Confusable Data for Custom Keyword Spotting.
Pai Zhu, Quan Wang, Dhruuv Agarwal, Kurt Partridge
2025LLM-based Generative Error Correction for Rare Words with Synthetic Data and Phonetic Context.
Natsuo Yamashita, Masaaki Yamamoto, Hiroaki Kokubo, Yohei Kawaguchi
2025LLM-based phoneme-to-grapheme for phoneme-based speech recognition.
Te Ma, Min Bi, Saierdaer Yusuyin, Hao Huang, Zhijian Ou
2025LRBA: Stealthy Backdoor Attacks on Speech Classification via Latent Rearrangement in VITS.
Zexin Li, Wenhan Yao, Ye Xiao, Jinsu Yang, Fen Xiao, Weiping Wen
2025LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec.
Yiwei Guo, Zhihan Li, Chenpeng Du, Hankun Wang, Xie Chen, Kai Yu
2025LSPnet: an ultra-low bitrate hybrid neural codec.
Bowen Zhang, Ian McLoughlin, Xiaoxiao Miao, A. S. Madhukumar
2025Label Semantic-Driven Contrastive Learning for Speech Emotion Recognition.
Jiaxi Hu, Leyuan Qu, Haoxun Li, Taihao Li
2025Label-Context-Dependent Internal Language Model Estimation for CTC.
Zijian Yang, Minh-Nghia Phan, Ralf Schlüter, Hermann Ney
2025Language and Accent Familiarity Effects on the Use of Acoustic Cues in Talker Identification.
Shengyue Xiong, Zhe-chen Guo, Bharath Chandrasekaran
2025Language-Agnostic Speech Tokenizer for Spoken Term Detection with Efficient Retrieval.
Anup Singh, Kris Demuynck, Vipul Arora
2025Language-Agnostic Suicidal Risk Detection Using Large Language Models.
June-Woo Kim, Wonkyo Oh, Haram Yoon, Sung-Hoon Yoon, Dae-Jin Kim, Dong-Ho Lee, Sang-Yeol Lee, Chan-Mo Yang
2025Language-Aware Prompt Tuning for Parameter-Efficient Seamless Language Expansion in Multilingual ASR.
Hongli Yang, Sheng Li, Hao Huang, Ayiduosi Tuohan, Yizhou Peng
2025Language-Guided Contrastive Audio-Visual Masked Autoencoder with Automatically Generated Audio-Visual-Text Triplets from Videos.
Yuchi Ishikawa, Shota Nakada, Hokuto Munakata, Kazuhiro Saito, Tatsuya Komatsu, Yoshimitsu Aoki
2025Large Language Models based ASR Error Correction for Child Conversations.
Anfeng Xu, Tiantian Feng, So Hyun Kim, Somer Bishop, Catherine Lord, Shrikanth Narayanan
2025Lateral Channel Formation in Australian English /l/: Insights from Magnetic Resonance Imaging.
Tünde Szalay, Michael Proctor, Amelia Gully, Tharinda Piyadasa, Craig T. Jin, David Waddington, Naeim Sanaei, Sheryl Foster, Kirrie J. Ballard
2025Layer-Wise Decision Fusion for Fake Audio Detection Using XLS-R.
Yixuan Xiao, Ngoc Thang Vu
2025Learning More with Less: Self-Supervised Approaches forLow-Resource Speech Emotion Recognition.
Ziwei Gong, Pengyuan Shi, Kaan Donbekci, Lin Ai, Run Chen, David Sasu, Zehui Wu, Julia Hirschberg
2025Learning Optimal Prosody Embedding Codebook based on F0 and Energy.
David Portes, Ales Horák
2025Learning Phonetic Context-Dependent Viseme for Enhancing Speech-Driven 3D Facial Animation.
Hyung Kyu Kim, Hak Gu Kim
2025Legally validated evaluation framework for voice anonymization.
Nathalie Vauquier, Brij Mohan Lal Srivastava, Seyed Ahmad Hosseini, Emmanuel Vincent
2025Length Aware Speech Translation for Video Dubbing.
Aswin Shanmugam Subramanian, Harveen Singh Chadha, Vikas Joshi, Shubham Bansal, Jian Xue, Rupeshkumar Mehta, Jinyu Li
2025Lessons Learned from the URGENT 2024 Speech Enhancement Challenge.
Wangyou Zhang, Kohei Saijo, Samuele Cornell, Robin Scheibler, Chenda Li, Zhaoheng Ni, Anurag Kumar, Marvin Sach, Wei Wang, Yihui Fu, Shinji Watanabe, Tim Fingscheidt, Yanmin Qian
2025Lessons Learnt: Revisit Key Training Strategies for Effective Speech Emotion Recognition in the Wild.
Jing-Tong Tzeng, Bo-Hao Su, Ya-Tse Wu, Hsing-Hang Chou, Chi-Chun Lee
2025Leveraging AM and FM Rhythm Spectrograms for Dementia Classification and Assessment.
Parismita Gogoi, Vishwanath Pratap Singh, Seema Khadirnaikar, Soma Siddhartha, Sishir Kalita, Jagabandhu Mishra, Md. Sahidullah, Priyankoo Sarmah, S. R. M. Prasanna
2025Leveraging Cascaded Binary Classification and Multimodal Fusion for Dementia Detection through Spontaneous Speech.
Yin-Long Liu, Yuanchao Li, Rui Feng, Liu He, Jia-Xin Chen, Yi-Ming Wang, Yu-Ang Chen, Yan-Han Peng, Jia-Hong Yuan, Zhen-Hua Ling
2025Leveraging Geographic Metadata for Dialect-Aware Speech Recognition.
Pouya Mehralian, Hugo Van hamme
2025Leveraging Information Retrieval to Enhance Spoken Language Understanding Prompts in Few-Shot Learning.
Pierre Lepagnol, Sahar Ghannay, Thomas Gerald, Christophe Servan, Sophie Rosset
2025Leveraging LLM and Self-Supervised Training Models for Speech Recognition in Chinese Dialects: A Comparative Analysis.
Tianyi Xu, Hongjie Chen, Qing Wang, Hang Lv, Jian Kang, Jie Li, Zhennan Lin, Yongxiang Li, Lei Xie
2025Leveraging LLM for Stuttering Speech: A Unified Architecture Bridging Recognition and Event Detection.
Shangkun Huang, Jing Deng, Jintao Kang, Rong Zheng
2025Leveraging LLMs for Written to Spoken Style Data Transformation to Enhance Spoken Dialog State Tracking.
Haris Gulzar, Monikka Roslianna Busto, Akiko Masaki, Takeharu Eda, Ryo Masumura
2025Leveraging Large Language Models for Sarcastic Speech Annotation in Sarcasm Detection.
Zhu Li, Yuqing Zhang, Xiyuan Gao, Shekhar Nayak, Matt Coler
2025Leveraging Large Language Models for Spontaneous Speech-Based Suicide Risk Detection.
Yifan Gao, Jiao Fu, Long Guo, Hong Liu
2025Leveraging Multi-Level Features of ATST with Conformer-Based Dual-Branch Network for Sound Event Detection.
Lipeng Dai, Qing Wang, Jie Zhang, Shengyu Peng, Yu Guan, Wu Guo
2025Leveraging Ordinal Information for Speech-based Depression Classification.
Lishi Zuo, Man-Wai Mak
2025Leveraging SSL Speech Features and Mamba for Enhanced DeepFake Detection.
Hoan My Tran, Damien Lolive, David Guennec, Aghilas Sini, Arnaud Delhay, Pierre-François Marteau
2025Leveraging Self-Supervised Learning Based Speaker Diarization for MISP 2025 AVSD Challenge.
Zeyan Song, Tianchi Sun, Ronghui Hu, Kai Chen, Jing Lu
2025Leveraging Text and Speech Processing for Suicide Risk Classification in Chinese Adolescents.
Justyna Krzywdziak, Bartlomiej Eljasiak, Joanna Stepien, Michal Swiatek, Agnieszka Pruszek
2025Leveraging Unlabeled Audio for Audio-Text Contrastive Learning via Audio-Composed Text Features.
Tatsuya Komatsu, Hokuto Munakata, Yuchi Ishikawa
2025Leveraging Unlabeled Audio-Visual Data in Speech Emotion Recognition using Knowledge Distillation.
Varsha Pendyala, Pedro Morgado, William A. Sethares
2025Lexical competition in the process of Cantonese tone merging: Diverse Impact Mechanisms Across Different Individuals and Tone Pairs.
Lishan Li, Yaolin Zhou, Xiaoying Xu
2025Lexical stress affects lenition: The case of Italian palato-alveolar affricates.
Bowei Shao, Philipp Buech, Anne Hermes, Maria Giavazzi
2025LiRI Corpus Platform: Demonstration of a Web-Based Infrastructure for Multimodal Corpus Analysis.
Teodora Vukovic, Jérémy Zehr, Jonathan Schaber, Igor Mustac, Nikolina Rajovic, Daniel McDonald, Johannes Graën, Noah Bubenhofer
2025LiSTEN: Learning Soft Token Embeddings for Neural Audio LLMs.
Pooneh Mousavi, Shubham Gupta, Cem Subakan, Mirco Ravanelli
2025LightL2S: Ultra-Low Complexity Lip-to-Speech Synthesis for Multi-Speaker Scenarios.
Yifan Liang, Kang Yang, Fangkun Liu, Andong Li, Xiaodong Li, Chengshi Zheng
2025Lightweight Front-end Enhancement for Robust ASR via Frame Resampling and Sub-Band Pruning.
Siyi Zhao, Wei Wang, Yanmin Qian
2025Lightweight Speech Enhancement Model Based on Harmonic Attention and Phase Estimation with Skin-Attachable Accelerometer.
Yonghun Song, Yeeun Kim, Yoonyoung Chung
2025Lightweight Speech Enhancement for Mandarin Esophageal Speech.
Jia-Jyu Su, Yen-Ting Lin, Wu-Hao Li, Chao-Kai Chang, Yan-Zhi Chen, Chen-Yu Chiang
2025Lightweight and Robust Multi-Channel End-to-End Speech Recognition with Spherical Harmonic Transform.
Xiangzhu Kong, Hao Huang, Zhijian Ou
2025LinearVC: Linear Transformations of Self-Supervised Features Through the Lens of Voice Conversion.
Herman Kamper, Benjamin van Niekerk, Julian Zaïdi, Marc-André Carbonneau
2025Linguistic Masking and Its Release in Simulated Electric-acoustic Hearing.
Yuting Ding, Xuefei Wang, Fei Chen
2025Listen through the Sound: Generative Speech Restoration Leveraging Acoustic Context Representation.
Soo-Whan Chung, Min-Seok Choi
2025Listen, Analyze, and Adapt to Learn New Attacks: An Exemplar-Free Class Incremental Learning Method for Audio Deepfake Source Tracing.
Yang Xiao, Rohan Kumar Das
2025LitMAS: A Lightweight and Generalized Multi-Modal Anti-Spoofing Framework for Biometric Security.
Nidheesh Gorthi, Kartik Thakral, Rishabh Ranjan, Richa Singh, Mayank Vatsa
2025Location-Aware Target Speaker Extraction for Hearing Aids.
Daniel-José Alcala Padilla, Nils L. Westhausen, Swati Vivekananthan, Bernd T. Meyer
2025LombardTokenizer: Disentanglement and Control of Vocal Effort in a Neural Speech Codec.
Maxime Jacquelin, Maëva Garnier, Laurent Girin, Rémy Vincent, Olivier Perrotin
2025Long-Context Speech Synthesis with Context-Aware Memory.
Zhipeng Li, Xiaofen Xing, Jingyuan Xing, Hangrui Hu, Heng Lu, Xiangmin Xu
2025Loquacious Set: 25, 000 Hours of Transcribed and Diverse English Speech Recognition Data for Research and Commercial Use.
Titouan Parcollet, Yuan Tseng, Shucong Zhang, Rogier C. van Dalen
2025Low Complex IIR Adaptive Hear-Through Ambient Filtering for Overcoming Practical Constraints in Earbuds.
Rishabh Gupta, MLNS Karthik, Yughendaran Palanivel
2025M3L: A Multi-Modal and Multi-Lingual Depression Detection Framework.
Jiajun You, Shuai Wang, Xun Gong, Xiang Wan
2025MADUV: The 1st INTERSPEECH Mice Autism Detection via Ultrasound Vocalization Challenge.
Zijiang Yang, Meishu Song, Xin Jing, Haojie Zhang, Kun Qian, Bin Hu, Kota Tamada, Toru Takumi, Björn W. Schuller, Yoshiharu Yamamoto
2025MASV: Speaker Verification with Global and Local Context Mamba.
Yang Liu, Li Wan, Yiteng Huang, Ming Sun, Xinhao Mei, Xubo Liu, Yangyang Shi, Florian Metze
2025MATER: Multi-level Acoustic and Textual Emotion Representation for Interpretable Speech Emotion Recognition.
Hyo Jin Jon, Longbin Jin, Hyuntaek Jung, Hyunseo Kim, Donghun Min, Eun Yi Kim
2025MDDM: A Multi-view Discriminative Enhanced Diffusion-based Model for Speech Enhancement.
Nan Xu, Zhaolong Huang, Xiaonan Zhi
2025MFLA: Monotonic Finite Look-ahead Attention for Streaming Speech Recognition.
Yinfeng Xia, Huiyan Li, Chenyang Le, Manhong Wang, Yutao Sun, Xingyang Ma, Yanmin Qian
2025MIKU-PAL: An Automated and Standardized Multimodal Method for Speech Paralinguistic and Affect Labeling.
Yifan Cheng, Ruoyi Zhang, Jiatong Shi
2025MM-MovieDubber: Towards Multi-Modal Learning for Multi-Modal Movie Dubbing.
Junjie Zheng, Zihao Chen, Chaofan Ding, Yunming Liang, Yihan Fan, Huan Yang, Lei Xie, Xinhan Di
2025MMLoRA: Multitask Memory Parameter-Efficient Fine-Tuning for Multimodal SER.
Yuanbo Fang, Xiaofen Xing, Xueru Li, Weibin Zhang, Xiangmin Xu
2025MOPSA: Mixture of Prompt-Experts Based Speaker Adaptation for Elderly Speech Recognition.
Chengxi Deng, Xurong Xie, Shujie Hu, Mengzhe Geng, Yicong Jiang, Jiankun Zhao, Jiajun Deng, Guinan Li, Youjun Chen, Huimeng Wang, Haoning Xu, Mingyu Cui, Xunying Liu
2025MOVER: Combining Multiple Meeting Recognition Systems.
Naoyuki Kamo, Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani
2025MPE-TTS: Customized Emotion Zero-Shot Text-To-Speech Using Multi-Modal Prompt.
Zhichao Wu, Yueteng Kang, Songjun Cao, Long Ma, Qiulin Li, Qun Yang
2025MSDA: Combining Pseudo-labeling and Self-Supervision for Unsupervised Domain Adaptation in ASR.
Dimitrios Damianos, Georgios Paraskevopoulos, Alexandros Potamianos
2025MSFNet: A Nested Model for Multi-Sampling-Frequency Speech Enhancement.
Venkatesh Parvathala, K. Sri Rama Murty
2025MTSE: Multi-Target Speaker Extraction for Conversation Scenarios.
Thomas Serre, Mathieu Fontaine, Eric Benhaim, Slim Essid
2025MVP: Multi-source Voice Pathology detection.
Alkis Koudounas, Moreno La Quatra, Gabriele Ciravegna, Marco Fantini, Erika Crosetti, Giovanni Succo, Tania Cerquitelli, Sabato Marco Siniscalchi, Elena Baralis
2025Mamba-based Hybrid Model for Speech Enhancement.
Se-Ha Kim, Tae-Gyeong Kim, Chang-Jae Chun
2025Medusa: A Multimodal Deep Fusion Multi-Stage Training Framework for Speech Emotion Recognition in Naturalistic Conditions.
Georgios Chatzichristodoulou, Despoina Kosmopoulou, Antonios Kritikos, Anastasia Poulopoulou, Efthymios Georgiou, Athanasios Katsamanis, Vassilis Katsouros, Alexandros Potamianos
2025Mel-McNet: A Mel-Scale Framework for Online Multichannel Speech Enhancement.
Yujie Yang, Bing Yang, Xiaofei Li
2025MelRe: Vision-Based Mel-Spectrogram Restoration.
Kaixuan Luan, Xiaoda Yang, Shile Cai, Ruofan Hu, Minghui Fang, Wenrui Liu, Jialong Zuo, Jiaqi Duan, Yuhang Ma, Junyu Lu
2025Meta-Learning Approaches for Speaker-Dependent Voice Fatigue Models.
Roseline Polle, Agnes Norbury, Alexandra Livia Georgescu, Nicholas Cummins, Stefano Goria
2025Meta-PerSER: Few-Shot Listener Personalized Speech Emotion Recognition via Meta-learning.
Shi-Xin Fang, Liang-Yeh Shen, Yi-Cheng Lin, Huang-Cheng Chou, Hung-yi Lee
2025MiSTR: Multi-Modal iEEG-to-Speech Synthesis with Transformer-Based Prosody Prediction and Neural Phase Reconstruction.
Mohammed Salah Al-Radhi, Géza Németh, Branislav Gerazov
2025Mimic Blocker: Self-Supervised Adversarial Training for Voice Conversion Defense with Pretrained Feature Extractors.
Gwangyeol Yu, Junhyeok Lee, Seoryeong Kim, Jimin Lee, Jehyuk Lee
2025Mispronunciation Detection Without L2 Pronunciation Dataset in Low-Resource Setting: A Case Study in Finland Swedish.
Nhan Phan, Mikko Kuronen, Maria Kautonen, Riikka Ullakonoja, Anna von Zansen, Yaroslav Getman, Ekaterina Voskoboinik, Tamás Grósz, Mikko Kurimo
2025Mitigating Audiovisual Mismatch in Visual-Guide Audio Captioning.
Le Xu, Chenxing Li, Yong Ren, Yujie Chen, Yu Gu, Ruibo Fu, Shan Yang, Dong Yu
2025Mitigating Language Mismatch in SSL-Based Speaker Anonymization.
Zhe Zhang, Wen-Chin Huang, Xin Wang, Xiaoxiao Miao, Junichi Yamagishi
2025Mitigating Non-Target Speaker Bias in Guided Speaker Embedding.
Shota Horiguchi, Takanori Ashihara, Marc Delcroix, Atsushi Ando, Naohiro Tawara
2025Mitigating Overfitting During Speech Foundation Model Fine-tuning: Applications to Dysarthric Speech Detection.
Yan Xiong, Visar Berisha, Julie Liss, Chaitali Chakrabarti
2025Mitigating Subgroup Disparities in Multi-Label Speech Emotion Recognition: A Pseudo-Labeling and Unsupervised Learning Approach.
Yi-Cheng Lin, Huang-Cheng Chou, Hung-yi Lee
2025Mixture of LoRA Experts for Low-Resourced Multi-Accent Automatic Speech Recognition.
Raphaël Bagat, Irina Illina, Emmanuel Vincent
2025Modality-Agnostic Multimodal Emotion Recognition using a Contrastive Masked Autoencoder.
Georgios Chochlakis, Turab Iqbal, Woo Hyun Kang, Zhaocheng Huang
2025Modality-Specific Speech Enhancement and Noise-Adaptive Fusion for Acoustic and Body-Conduction Microphone Framework.
Yunsik Kim, Yoonyoung Chung
2025Model as Loss: A Self-Consistent Training Paradigm.
Saisamarth Rajesh Phaye, Milos Cernak, Andrew Harper
2025Modeling Formant Dynamics in Mandarin /ai/: Effects of Speech Style and Speech Rate.
Yunzhuo Xiang, Jingyi Sun
2025Modeling Multi-Turn Spoken Language Understanding with Dynamic Graph Convolutional Networks.
Yi Huang, Si Chen, Jingyu Yao, Junlan Feng
2025Modeling Probabilistic Reduction using Information Theory and Naive Discriminative Learning.
Anna Stein, Kevin Tang
2025Modeling Vowel System Typology Using Iterated Confusion Minimization.
John McGahay
2025Monotonic Attention for Robust Text-to-Speech Synthesis in Large Language Model Frameworks.
Yike Zhang, Yiming Li, Jie Chen, Qinghua Wu, Songjun Cao, Long Ma
2025Multi-Channel Acoustic Echo Cancellation Based on Direction-of-Arrival Estimation.
Fei Zhao, Xueliang Zhang, Zhong-Qiu Wang
2025Multi-Channel Sequence-to-Sequence Neural Diarization: Experimental Results for The MISP 2025 Challenge.
Ming Cheng, Fei Su, Cancan Li, Juan Liu, Ming Li
2025Multi-Modal Multi-Task Affective States Recognition Based on Label Encoder Fusion.
Maxim Markitantov, Elena Ryumina, Heysem Kaya, Alexey Karpov
2025Multi-Teacher Language-Aware Knowledge Distillation for Multilingual Speech Emotion Recognition.
Mehedi Hasan Bijoy, Dejan Porjazovski, Tamás Grósz, Mikko Kurimo
2025Multi-lingual and Zero-Shot Speech Recognition by Incorporating Classification of Language-Independent Articulatory Features.
Ryo Magoshi, Shinsuke Sakai, Jaeyoung Lee, Tatsuya Kawahara
2025Multi-task learning for speech emotion recognition in naturalistic conditions.
Bartlomiej Zgórzynski, Juliusz Wójtowicz-Kruk, Piotr Masztalski, Wladyslaw Sredniawa
2025Multi-view Fusion and Parameter Perturbation for Few-Shot Class-Incremental Audio Classification.
Yulu Fang, Mingyue He, Qisheng Xu, Jianqiao Zhao, Cheng Yang, Kele Xu, Yong Dou
2025MultiActor-Audiobook: Zero-Shot Audiobook Generation with Faces and Voices of Multiple Speakers.
Kyeongman Park, Seongho Joo, Kyomin Jung
2025Multichannel Keyword Spotting for Noisy Conditions.
Dzmitry Saladukha, Ivan Koriabkin, Kanstantsin Artsiom, Aliaksei Rak, Nikita Ryzhikov
2025Multilingual Query-by-Example KWS for Indian Languages using Transliteration.
Kirandevraj R, Vinod K. Kurmi, Vinay P. Namboodiri, C. V. Jawahar
2025Multilingual Speech Assessment Using Cross-Attention and Multitask Learning.
Sehyun Oh, Minhwa Chung, Sunhee Kim
2025Multimodal Assessment of Speech Impairment in Amyotrophic Lateral Sclerosis Using Audio-Visual and Machine Learning Approaches.
Francesco Pierotti, Andrea Bandini
2025Multimodal Biomarkers for Schizophrenia: Towards Individual Symptom Severity Estimation.
Gowtham Premananth, Philip Resnik, Sonia Bansal, Deanna L. Kelly, Carol Y. Espy-Wilson
2025Multimodal Dynamics of Hand Gestures and Pauses in Multiparty Interactions.
Delphine Charuau, Naomi Harte
2025Multimodal Emotion Diarization: Frame-Wise Integration of Text and Audio Representations.
Ziv Tamir, Thomas Thebaud, Jesús Villalba, Najim Dehak, Oren Kurland
2025Multimodal Fusion with Semi-Supervised Learning Minimizes Annotation Quantity for Modeling Videoconference Conversation Experience.
Andrew Chang, Chenkai Hu, Ji Qi, Zhuojian Wei, Kexin Zhang, Viswadruth Akkaraju, David Poeppel, Dustin Freeman
2025Multimodal Prosody Modeling: A Use Case for Multilingual Sentence Mode Prediction.
Bogdan Vlasenko, Mathew Magimai-Doss
2025Multimodal Silent Recognition of Phonemes Using Radar and Optopalatographic Silent Speech Interfaces.
João Menezes, Aubin Mouras, Arne-Lukas Fietkau, Dani Kazzy, Peter Birkholz
2025Multimodal Speech, Language and Orofacial Analysis for Remote Assessment of Positive, Negative and Cognitive Symptoms in Schizophrenia.
Michael Neumann, Hardik Kothare, Beverly Insel, Anzalee Khan, Danyah Nadim, Jean-Pierre Lindenmayer, Vikram Ramanarayanan
2025Multimodal Speech-Based Biomarkers Outperform the ALS Functional Rating Scale in Predicting Individual Disease Progression in ALS.
Hardik Kothare, Michael Neumann, Vikram Ramanarayanan
2025Multimodal Zero-Shot Framework for Deepfake Hate Speech Detection in Low-Resource Languages.
Rishabh Ranjan, Ayinala Likhith, Mayank Vatsa, Richa Singh
2025Multimodal and Multitask Learning for Predicting Multiple Scores in L2 English Speech.
Sehyun Oh, Sunhee Kim, Minhwa Chung
2025Multistage Universal Speech Enhancement System for URGENT Challenge.
Xiaohuai Le, Zhuangqi Chen, Siyu Sun, Xianjun Xia, Chuanzeng Huang
2025Multitalker Babble in English Vowel Perception Training: A Comparison between Humans and Neural Models.
Wenwei Dong, Alif Silpachai, Catia Cucchiarini, Helmer Strik
2025Multitask Learning with Fused Attention for Improved ASR and Mispronunciation Detection in Children's Speech Sound Disorders.
Selina S. Sung, Seunghee Ha, Tae-Jin Yoon, Jungmin So
2025Multivariate Probabilistic Assessment of Speech Quality.
Fredrik Cumlin, Xinyu Liang, Victor Ungureanu, Chandan K. A. Reddy, Christian Schüldt, Saikat Chatterjee
2025NAM-to-Speech Conversion with Multitask-Enhanced Autoregressive Models.
Neil Shah, Shirish Karande, Vineet Gandhi
2025NGPU-LM: GPU-Accelerated N-Gram Language Model for Context-Biasing in Greedy ASR Decoding.
Vladimir Bataev, Andrei Andrusenko, Lilit Grigoryan, Aleksandr Laptev, Vitaly Lavrukhin, Boris Ginsburg
2025NIRANTAR: Continual Learning with New Languages and Domains on Real-world Speech Data.
Tahir Javed, Kaushal Santosh Bhogale, Mitesh M. Khapra
2025NanoCodec: Towards High-Quality Ultra Fast Speech LLM Inference.
Edresson Casanova, Paarth Neekhara, Ryan Langman, Shehzeen Hussain, Subhankar Ghosh, Xuesong Yang, Ante Jukic, Jason Li, Boris Ginsburg
2025Naturalness-Aware Curriculum Learning with Dynamic Temperature for Speech Deepfake Detection.
Taewoo Kim, Guisik Kim, Choongsang Cho, Young Han Lee
2025Network of acoustic characteristics for the automatic detection of suicide risk from speech. Contribution to the 2025 SpeechWellness challenge by the Semawave team.
Vincent P. Martin, Charles Brazier, Maxime Amblard, Michel Musiol, Jean-Luc Rouas
2025Neural Spectral Band Generation for Audio Coding.
Woongjib Choi, Byeong Hyeon Kim, Hyungseob Lim, Inseon Jang, Hong-Goo Kang
2025Neural Speech Extraction with Human Feedback.
Malek Itani, Ashton Graves, Sefik Emre Eskimez, Shyamnath Gollakota
2025Neuro2Semantic: A Transfer Learning Framework for Semantic Reconstruction of Continuous Language from Human Intracranial EEG.
Siavash Shams, Richard J. Antonello, Gavin Mischler, Stephan Bickel, Ashesh D. Mehta, Nima Mesgarani
2025NeuroSpex+: Dual-Task Training of Neuro-Guided Speaker Extraction with Speech Envelope and Waveform.
Dashanka De Silva, Siqi Cai, Saurav Pahuja, Tanja Schultz, Haizhou Li
2025Neurodyne: Neural Pitch Manipulation with Representation Learning and Cycle-Consistency GAN.
Yicheng Gu, Chaoren Wang, Zhizheng Wu, Lauri Juvela
2025Neutral Tone Variation in Beijing Mandarin: Is Neutral Tone Toneless?
Xiao Dong, Fengming Liu, Chien-Jer Charles Lin, Monica Nesbitt, Shuju Shi
2025No Audiogram: Leveraging Existing Scores for Personalized Speech Intelligibility Prediction.
Haoshuai Zhou, Changgeng Mo, Boxuan Cao, Linkai Li, Shan Xiang Wang
2025Non-Intrusive Binaural Speech Intelligibility Prediction Using Mamba for Hearing-Impaired Listeners.
Katsuhiko Yamamoto, Koichi Miyazaki
2025Non-Standard Accent TTS Support via Large Multi-Accent Frontend Pronunciation Knowledge Transfer.
Noe Berger, Siqi Sun, Korin Richmond
2025Non-intrusive Speech Quality Assessment with Diffusion Models Trained on Clean Speech.
Danilo de Oliveira, Julius Richter, Jean-Marie Lemercier, Simon Welker, Timo Gerkmann
2025Nosey: Open-Source Hardware for Acoustic Nasalance.
Maya Dewhurst, Jack Collins, Justin J. H. Lo, Roy Alderton, Sam Kirkham
2025Novel Loss-Enhanced Universal Adversarial Patches for Sustainable Speaker Privacy.
Elvir Karimov, Alexander Varlamov, Danil Ivanov, Dmitrii Korzh, Oleg Rogov
2025Novel Parasitic Dual-Scale Modeling for Efficient and Accurate Multilingual Speech Translation.
Chenyang Le, Yinfeng Xia, Huiyan Li, Manhong Wang, Yutao Sun, Xingyang Ma, Yanmin Qian
2025OMPAL: Bridging Speech and Learning with an Open-Source Mandarin Pronunciation Assessment Corpus for Global Learners.
Wen-Wei Hsieh, Hao-Wei Chi, Kuan-Chen Wang, Ping-Cheng Yeh, Te-Hsin Liu, Chen-Yu Chiang
2025OWSM v4: Improving Open Whisper-Style Speech Models via Data Scaling and Cleaning.
Yifan Peng, Muhammad Shakeel, Yui Sudo, William Chen, Jinchuan Tian, Chyi-Jiunn Lin, Shinji Watanabe
2025OWSM-Biasing: Contextualizing Open Whisper-Style Speech Models for Automatic Speech Recognition with Dynamic Vocabulary.
Yui Sudo, Yusuke Fujita, Atsushi Kojima, Tomoya Mizumoto, Lianbo Liu
2025Objective and Subjective Evaluation of Diffusion-Based Speech Enhancement for Dysarthric Speech.
Dimme de Groot, Tanvina Patel, Devendra Kayande, Odette Scharenborg, Zhengjun Yue
2025On Apical Vowels in Eastern Zhenjiang Mandarin.
Xuying Wang, Fang Hu
2025On Enhancing the Performance of Children's ASR Task in Limited Data Scenario.
Ankita, Shambhavi, Syed Shahnawazuddin
2025On Retrieval of Long Audios with Complex Text Queries.
Ruochu Yang, Milind Rao, Harshavardhan Sundar, Anirudh Raju, Aparna Khare, Srinath Tankasala, Di He, Venkatesh Ravichandran
2025On the Design of a Robust Superdirective Beamformer and Topology Parameter Optimization with Frustum-Shaped Microphone Arrays Featuring Multiple Rings.
Kunlong Zhao, Gongping Huang, Xudong Zhao, Jingdong Chen, Jacob Benesty, Zoran Cvetkovic
2025On the Language and Gender Biases in PSTN, VoIP and Neural Audio Codecs.
Kemal Altwlkany, Amar Kuric, Emanuel Lacic
2025On the Production and Perception of a Single Speaker's Gender.
Robin Netzorg, Naomi Carvalho, Andrea Guzman, Lydia Wang, Juliana Francis, Klo Vivienne Garoute, Keith Johnson, Gopala Anumanchipalli
2025On the Relationship between Accent Strength and Articulatory Features.
Kevin Huang, Sean Foley, Jihwan Lee, Yoonjeong Lee, Dani Byrd, Shrikanth Narayanan
2025On the Relevance of Clinical Assessment Tasks for the Automatic Detection of Parkinson's Disease Medication State from Speech.
David Gimeno-Gómez, Rubén Solera-Ureña, Anna Pompili, Carlos D. Martínez-Hinarejos, Rita Cardoso, Isabel Guimarães, Joaquim J. Ferreira, Alberto Abad
2025On the Within-class Variation Issue in Alzheimer's Disease Detection.
Jiawen Kang, Dongrui Han, Lingwei Meng, Jingyan Zhou, Jinchao Li, Xixin Wu, Helen Meng
2025On the cross-modal makeup of charisma: Insights from a field-data analysis.
Oliver Niebuhr
2025On the influence of language similarity in non-target speaker verification trials.
Paul M. Reuter, Michael Jessen
2025On the reliability of feature attribution methods for speech classification.
Gaofei Shen, Hosein Mohebbi, Arianna Bisazza, Afra Alishahi, Grzegorz Chrupala
2025On-device Streaming Discrete Speech Units.
Kwanghee Choi, Masao Someki, Emma Strubell, Shinji Watanabe
2025On-the-fly Routing for Zero-shot MoE Speaker Adaptation of Speech Foundation Models for Dysarthric Speech Recognition.
Shujie Hu, Xurong Xie, Mengzhe Geng, Jiajun Deng, Huimeng Wang, Guinan Li, Chengxi Deng, Tianzi Wang, Mingyu Cui, Helen Meng, Xunying Liu
2025Online AV-CrossNet: a Causal and Efficient Audiovisual System for Speech Enhancement and Target Speaker Extraction.
Cheng Yu, Vahid Ahmadi Kalkhorani, Buye Xu, DeLiang Wang
2025Online Audio-Visual Autoregressive Speaker Extraction.
Zexu Pan, Wupeng Wang, Shengkui Zhao, Chong Zhang, Kun Zhou, Yukun Ma, Bin Ma
2025Open Universal Arabic ASR Leaderboard.
Yingzhi Wang, Anas Alhmoud, Muhammad Alqurishi
2025Open-Set Source Tracing of Audio Deepfake Systems.
Nicholas Klein, Hemlata Tak, Elie Khoury
2025Optimized Real-time Speech Enhancement with Deep SSMs on Raw Audio.
Yan Ru Pei, Ritik Shrivastava, Sidharth
2025Optimizing ASR for Catalan-Spanish Code-Switching: A Comparative Analysis of Methodologies.
Carlos Mena, Pol Serra, Jacobo Romero, Abir Messaoudi, José Giraldo, Carme Armentano-Oller, Rodolfo Zevallos, Iván Meza, Javier Hernando
2025Optimizing CLAP Reward with LLM Feedback for Semantically Aligned and Diverse Automated Audio Captioning.
Seyun Ahn, Pil Moo Byun, Won-Gook Choi, Joon-Hyuk Chang
2025Optimizing Pause Context in Fine-Tuning Pre-trained Large Language Models for Dementia Detection.
Xiaoquan Ke, Man-Wai Mak, Helen Meng
2025OpusLM: A Family of Open Unified Speech Language Models.
Jinchuan Tian, William Chen, Yifan Peng, Jiatong Shi, Siddhant Arora, Shikhar Bharadwaj, Takashi Maekaku, Yusuke Shinohara, Keita Goto, Xiang Yue, Huck Yang, Shinji Watanabe
2025Oral Reading Errors by Grade 3 Children in Indian Schools: A Hindi-English Perspective.
Sneha Raman, Preeti Rao
2025Overcoming Data Scarcity in Multi-Dialectal Arabic ASR via Whisper Fine-Tuning.
Ömer Tarik Özyilmaz, Matt Coler, Matias Valdenegro-Toro
2025Overestimated performance of auditory attention decoding caused by experimental design in EEG recordings.
Yujie Yan, Xiran Xu, Haolin Zhu, Songyi Li, Bo Wang, Xihong Wu, Jing Chen
2025Overlap-Adaptive Hybrid Speaker Diarization and ASR-Aware Observation Addition for MISP 2025 Challenge.
Shangkun Huang, Yuxuan Du, Jingwen Yang, Dejun Zhang, Xupeng Jia, Jing Deng, Jintao Kang, Rong Zheng
2025PAEFF: Precise Alignment and Enhanced Gated Feature Fusion for Face-Voice Association.
Abdul Hannan, Muhammad Arslan Manzoor, Shah Nawaz, Muhammad Irzam Liaqat, Markus Schedl, Mubashir Noman
2025PARROT: Synergizing Mamba and Attention-based SSL Pre-Trained Models via Parallel Branch Hadamard Optimal Transport for Speech Emotion Recognition.
Orchid Chetia Phukan, Mohd Mujtaba Akhtar, Girish, Swarup Ranjan Behera, Jaya Sai Kiran Patibandla, Arun Balaji Buduru, Rajesh Sharma
2025PAST: Phonetic-Acoustic Speech Tokenizer.
Nadav Har-Tuv, Or Tal, Yossi Adi
2025PERCEPT-US: A Multimodal American English Child Speech Corpus Specialized for Articulatory Feedback.
Amanda Eads, Heather Kabakoff, Nina Benway, Elaine Hitchcock, Jonathan L. Preston, Tara McAllister
2025PPGs-BERT: Leveraging Phoneme Sequence and BERT for Alzheimer's Disease Detection from Spontaneous Speech.
Qi Sun, Ziyue Qiu, Yu Pu, Jinpeng Li, Xuchu Chen, Wei-Qiang Zhang
2025Pairwise Evaluation of Accent Similarity in Speech Synthesis.
Jinzuomu Zhong, Suyuan Liu, Dan Wells, Korin Richmond
2025ParaNoise-SV: Integrated Approach for Noise-Robust Speaker Verification with Parallel Joint Learning of Speech Enhancement and Noise Extraction.
Minu Kim, Kangwook Jang, Hoirin Kim
2025Parameter-Efficient Fine-Tuning for Low-Resource Text-to-Speech via Cross-Lingual Continual Learning.
Ki-Joong Kwon, Jun-Ho So, Sang-Hoon Lee
2025Parameter-Efficient Fine-tuning with Instance-Aware Prompt and Parallel Adapters for Speaker Verification.
Shengyu Peng, Wu Guo, Jie Zhang, Yu Guan, Lipeng Dai, Zuoliang Li
2025Parameter-efficient Fine-tuning of Conformer-based Streaming Speech Recognition into Non-streaming Models.
Yunjae Nam, Jeong U. Han, Kiyeon Kim, Jaemin Lim
2025PartialEdit: Identifying Partial Deepfakes in the Era of Neural Speech Editing.
You Zhang, Baotong Tian, Lin Zhang, Zhiyao Duan
2025Pathology-Aware Speech Encoding and Data Augmentation for Dysarthric Speech Recognition.
Ilja Baumann, Dominik Wagner, Korbinian Riedhammer, Tobias Bocklet
2025Patient-Aware Feature Alignment for Robust Lung Sound Classification: Cohesion-Separation and Global Alignment Losses.
Seung Gyu Jeong, Seong Eun Kim
2025Perception of Emotional Speech by Individuals with High Borderline Personality Features.
Yizhou Chen, Xiyu Wu
2025Perception of Long and Short Vowel Contrast in Te Reo Māori in Clean and Everyday Listening Environments.
C. T. Justine Hui, Jenice Kuzhikombil, Isabella Shields, Hiraia Haami-Wells, Catherine I. Watson, Peter J. Keegan
2025Performance of Montreal Forced Aligner on Cantonese Spontaneous Speech.
Ka Ki SO, Chenzi Xu, Grace Wenling Cao, Peggy Mok
2025PeriodCodec: A Pitch-Controllable Neural Audio Codec Using Periodic Signals for Singing Voice Synthesis.
Masato Takagi, Miku Nishihara, Yukiya Hono, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda
2025PersonaTAB: Predicting Personality Traits using Textual, Acoustic, and Behavioral Cues in Fully-Duplex Speech Dialogs.
Sho Inoue, Shuai Wang, Haizhou Li
2025Personalized Fine-Tuning with Controllable Synthetic Speech from LLM-Generated Transcripts for Dysarthric Speech Recognition.
Dominik Wagner, Ilja Baumann, Natalie Engert, Seanie Lee, Elmar Nöth, Korbinian Riedhammer, Tobias Bocklet
2025PhonemeFake: Redefining Deepfake Realism with Language-Driven Segmental Manipulation and Adaptive Bilevel Detection.
Oguzhan Baser, Ahmet Ege Tanriverdi, Sriram Vishwanath, Sandeep Chinchali
2025Phonetic Posteriorgram-Based Phoneme Selection for Vocal Cord Disorder Classification in Continuous Mandarin Speech.
Chih-Ning Chen, Yu-Lan Chuang, Ming-Jhang Yang, Wei-Cheng Hsu, Yung-An Tsou, Yi-Wen Liu
2025Phonetically-Augmented Discriminative Rescoring for Voice Search Error Correction.
Christophe Van Gysel, Maggie Wu, Lyan Verwimp, Caglar Tirkaz, Marco Bertola, Zhihong Lei, Youssef Oualil
2025Physiologically-Informed Feature Analysis of Acquired Speech Disorders for Stroke Assessment.
Giulia Sanguedolce, Jón Guðnason, Dragos-Cristian Gruia, Emilie D'Olne, Fatemeh Geranmayeh, Patrick A. Naylor
2025Pick and Summarize: Integrating Extractive and Abstractive Speech Summarization.
Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Ryo Fukuda, William Chen, Shinji Watanabe
2025Pinyin-Guided Chinese Speech Recognition with Large Language Model.
Jie Zhengjie, Gaofeng Cheng
2025Pitch Accent Detection improves Pretrained Automatic Speech Recognition.
David Sasu, Natalie Schluter
2025Pitch Contour Model (PCM) with Transformer Cross-Attention for Speech Emotion Recognition.
Minji Ryu, Ji-Hyeon Hur, Sung Heuk Kim, Gahgene Gweon
2025Pitch Target Realization in Putonghua Tone Production of Children from Dialect-Speaking Regions.
Mengxue Cao, Tianxin Zheng, Jiewen Zheng
2025Pitfalls and Limits in Automatic Dementia Assessment.
Franziska Braun, Christopher Witzl, Andreas Erzigkeit, Hartmut Lehfeld, Thomas Hillemacher, Tobias Bocklet, Korbinian Riedhammer
2025Plug-and-Play Co-Occurring Face Attention for Robust Audio-Visual Speaker Extraction.
Zexu Pan, Shengkui Zhao, Tingting Wang, Kun Zhou, Yukun Ma, Chong Zhang, Bin Ma
2025Position also matters! Separating Same Instruments in String Quartet using Timbral and Positional Cues.
Yuetonghui Xu, Yiwen Wang, Xihong Wu, Xiaobing Li, Feng Yu
2025Power Spectral Density Estimation for Acoustic Source Separation Using A Spherical Microphone Array.
Liang Tao, Maoshen Jia, Yonggang Hu
2025Pre-aspiration in Iceland Is Conditioned by Gender/Sex.
Meike Rommel, Mísa Hejná, Nicole Dehé
2025PredTrAD - Prediction-based Transformer for Anomaly Detection in Multivariate Time Series Data.
Jan Schuster, Alexander Wölfel, Fabian Brunner, Christian Bergler
2025Predicting Adolescent Suicidal Risk from Multi-task-based Speech: An Ensemble Learning Approach.
Xi Chen, Renzhe Yu, Yanshen Tan, Yiyi Li, Quan Qian, Ying Lin
2025Prediction of listening effort ratings for habitual and clear-Lombard speech presented in noise.
Esther Janse, Chen Shen, Martin Cooke
2025Pretraining Multi-Speaker Identification for Neural Speaker Diarization.
Shota Horiguchi, Atsushi Ando, Naohiro Tawara, Marc Delcroix
2025Privacy-Preserving Speaker Verification via End-to-End Secure Representation Learning.
Chenguang Hu, Yaqian Hao, Fulin Zhang, Xiaoxue Luo, Yao Shen, Yingying Gao, Chao Deng, Shilei Zhang, Junlan Feng
2025Private kNN-VC: Interpretable Anonymization of Converted Speech.
Carlos Franzreb, Arnab Das, Tim Polzehl, Sebastian Möller
2025ProBiEM: Acoustic and Lexical Correlates of Prosodic Prominence in English-Malayalam Bilingual Speech.
Anindita Mondal, Rahul Biju, Anil Kumar Vuppala, Reni K. Cherian, Chiranjeevi Yarra
2025ProMode: A Speech Prosody Model Conditioned on Acoustic and Textual Inputs.
Eray Eren, Qingju Liu, Hyeongwoo Kim, Pablo Garrido, Abeer Alwan
2025Probing Prosodic Differences Between Two Regional Varieties of Brazilian Portuguese.
Gustavo Silveira, Aviad Albert, Martine Grice
2025Probing the Robustness Properties of Neural Speech Codecs.
Wei-Cheng Tseng, David Harwath
2025Processing of grammatical information in cochlear implant simulated speech by German adult listeners.
Atty Schouwenaars, Esther Ruigendijk
2025Prolongation in Romanian.
Oana Niculescu, Monica Vasileanu
2025PromptEVC: Controllable Emotional Voice Conversion with Natural Language Prompts.
Tianhua Qi, Shiyan Wang, Cheng Lu, Tengfei Song, Hao Yang, Zhanglin Wu, Wenming Zheng
2025Prompting Whisper for Improved Verbatim Transcription and End-to-end Miscue Detection.
Griffin Dietz Smith, Dianna Yee, Jennifer King Chen, Leah Findlater
2025Prosodic Structure Beyond Lexical Content: A Study of Self-Supervised Learning.
Sarenne Wallbridge, Christoph Minixhofer, Catherine Lai, Peter Bell
2025Prosodically Enhanced Foreign Accent Simulation by Discrete Token-based Resynthesis Only with Native Speech Corpora.
Kentaro Onda, Keisuke Imoto, Satoru Fukayama, Daisuke Saito, Nobuaki Minematsu
2025Prosody-Adaptable Audio Codecs for Zero-Shot Voice Conversion via In-Context Learning.
Junchuan Zhao, Xintong Wang, Ye Wang
2025PruneSLU: Efficient On-device Spoken Language Understanding through Vocabulary and Structural Pruning.
Truong Do, Phuong Minh Nguyen, Le-Minh Nguyen
2025Pseudo Labels-based Neural Speech Enhancement for the AVSR Task in the MISP-Meeting Challenge.
Longjie Luo, Shenghui Lu, Lin Li, Qingyang Hong
2025Pull It Together: Reducing the Modality Gap in Contrastive Learning.
Amit Sofer, Yoav Goldman, Shlomo E. Chazan
2025Pushing the Frontiers of Self-Distillation Prototypes Network with Dimension Regularization and Score Normalization.
Yafeng Chen, Chong Deng, Hui Wang, Yiheng Jiang, Han Yin, Qian Chen, Wen Wang
2025Pushing the Limits of Beam Search Decoding for Transducer-based ASR models.
Lilit Grigoryan, Vladimir Bataev, Andrei Andrusenko, Hainan Xu, Vitaly Lavrukhin, Boris Ginsburg
2025Pushing the Limits of End-to-End Diarization.
Samuel J. Broughton, Lahiru Samarakoon
2025Pushing the Performance of Synthetic Speech Detection with Kolmogorov-Arnold Networks and Self-Supervised Learning Models.
Tuan Dat Phuong, Long-Vu Hoang, Huy Dat Tran
2025QUADS: Quantized Distillation Framework for Efficient Speech Language Understanding.
Subrata Biswas, Mohammad Nur Hossain Khan, Bashima Islam
2025Quadruple Path Modeling with Latent Feature Transfer for Permutation-free Continuous Speech Separation.
Jihyun Kim, Doyeon Kim, Hyewon Han, Jinyoung Lee, Jonguk Yoo, Chang Woo Han, Jeongook Song, Hoon-Young Cho, Hong-Goo Kang
2025Quantifying and Reducing Speaker Heterogeneity within the Common Voice Corpus for Phonetic Analysis.
Miao Zhang, Aref Farhadipour, Annie Baker, Jiachen Ma, Bogdan Pricop, Eleanor Chodroff
2025Queer Waves: A German Speech Dataset Capturing Gender and Sexual Diversity from Podcasts and YouTube.
Ingo Siegert, Jan Marquenie, Sven Grawunder
2025R2S: Real-to-Synthetic Representation Learning for Training Speech Recognition Models on Synthetic Data.
Minh Tran, Debjyoti Paul, Yutong Pang, Laxmi Pandey, Jinxi Guo, Ke Li, Shun Zhang, Xuedong Zhang, Xin Lei
2025RA-CLAP: Relation-Augmented Emotional Speaking Style Contrastive Language-Audio Pretraining For Speech Retrieval.
Haoqin Sun, Jingguang Tian, Jiaming Zhou, Hui Wang, Jiabei He, Shiwan Zhao, Xiangyu Kong, Desheng Hu, Xinkang Xu, Xinhui Hu, Yong Qin
2025REAL-T: Real Conversational Mixtures for Target Speaker Extraction.
Shaole Li, Shuai Wang, Jiangyu Han, Ke Zhang, Wupeng Wang, Haizhou Li
2025REB-former: RWKV-enhanced E-branchformer for Speech Recognition.
Jie Song, Wang Xiang, Jian Zhou, Cunhang Fan, Zhao Lv
2025RELATE: Subjective evaluation dataset for automatic evaluation of relevance between text and audio.
Yusuke Kanamori, Yuki Okamoto, Taisei Takano, Shinnosuke Takamichi, Yuki Saito, Hiroshi Saruwatari
2025RESOUND: Speech Reconstruction from Silent Videos via Acoustic-Semantic Decomposed Modeling.
Long-Khanh Pham, Thanh V. T. Tran, Minh-Tan Pham, Van Nguyen
2025REWIND: Speech Time Reversal for Enhancing Speaker Representations in Diffusion-based Voice Conversion.
Ishan D. Biyani, Nirmesh J. Shah, Ashishkumar P. Gudmalwar, Pankaj Wasnik, Rajiv Ratn Shah
2025Ranking and Selection of Bias Words for Contextual Bias Speech Recognition.
Haoxiang Hou, Xun Gong, Wangyou Zhang, Wei Wang, Yanmin Qian
2025RapFlow-TTS: Rapid and High-Fidelity Text-to-Speech with Improved Consistency Flow Matching.
Hyun Joon Park, Jeongmin Liu, Jin Sob Kim, Jeong Yeol Yang, Sung Won Han, Eunwoo Song
2025Rapport-Building Dialogue Strategies for Deeper Connection: Integrating Proactive Behavior, Personalization, and Aizuchi Backchannels.
Muhammad Yeza Baihaqi, Angel F. Garcia Contreras, Seiya Kawano, Koichiro Yoshino
2025Rasmalai : Resources for Adaptive Speech Modeling in IndiAn Languages with Accents and Intonations.
Ashwin Sankar, Yoach Lacombe, Sherry Thomas, Praveen Srinivasa Varadhan, Sanchit Gandhi, Mitesh M. Khapra
2025ReFlow-VC: Zero-shot Voice Conversion Based on Rectified Flow and Speaker Feature Optimization.
Pengyu Ren, Wenhao Guan, Kaidi Wang, Peijie Chen, Qingyang Hong, Lin Li
2025ReSepNet: A Unified-Light Model for Recursive Speech Separation with Unknown Speaker Count.
Hadi Alizadeh, Rahil Mahdian Toroghi, Hassan Zareian
2025Real-Time Audio-Visual Speech Enhancement Using Pre-trained Visual Representations.
Teng Aleksandra Ma, Sile Yin, Li-Chia Yang, Shuo Zhang
2025Real-Time Diffusion Buffer for Speech Enhancement On A Laptop.
Bunlong Lay, Rostilav Makarov, Timo Gerkmann
2025Real-time TSE demonstration via SoundBeam with KD.
Keigo Wakayama, Tomoko Kawase, Takafumi Moriya, Marc Delcroix, Hiroshi Sato, Tsubasa Ochiai, Masahiro Yasuda, Shoko Araki
2025Reasoning-Based Approach with Chain-of-Thought for Alzheimer's Detection Using Speech and Large Language Models.
Chanwoo Park, Anna Seo Gyeong Choi, Sunghye Cho, Chanwoo Kim
2025Recognizing Every Voice: Towards Inclusive ASR for Rural Bhojpuri Women.
Sakshi Joshi, Eldho Ittan George, Tahir Javed, Kaushal Santosh Bhogale, Nikhil Narasimhan, Mitesh M. Khapra
2025Reconstruction of the Complete Vocal Tract Contour Through Acoustic to Articulatory Inversion Using Real-Time MRI Data.
Sofiane Azzouz, Pierre-André Vuissoz, Yves Laprie
2025Recreating Neural Activity During Speech Production with Language and Speech Model Embeddings.
Owais Mujtaba Khanday, Pablo Rodríguez San Esteban, Zubair Ahmad Lone, Marc Ouellet, Jose A. Gonzalez-Lopez
2025Reddit FlairShare: A Human-Annotated Dataset of Gender-Progressive Online Discourse.
Carlos Hartmann
2025Regularized Federated Learning for Privacy-Preserving Dysarthric and Elderly Speech Recognition.
Tao Zhong, Mengzhe Geng, Shujie Hu, Guinan Li, Xunying Liu
2025Regularizing Learnable Feature Extraction for Automatic Speech Recognition.
Peter Vieting, Maximilian Kannen, Benedikt Hilmes, Ralf Schlüter, Hermann Ney
2025Rehearsal with Auxiliary-Informed Sampling for Audio Deepfake Detection.
Falih Gozi Febrinanto, Kristen Moore, Chandra Thapa, Jiangang Ma, Vidya Saikrishna, Feng Xia
2025Relationship between objective and subjective perceptual measures of speech in individuals with head and neck cancer.
Bence Mark Halpern, Thomas Tienkamp, Teja Rebernik, Rob J. J. H. van Son, Martijn Wieling, Defne Abur, Tomoki Toda
2025Relative cue weighting in multilingual stop voicing production.
Le Xuan Chan, Annika Heuser
2025Replay Attacks Against Audio Deepfake Detection.
Nicolas M. Müller, Piotr Kawa, Wei Herng Choong, Adriana Stan, Aditya Tirumala Bukkapatnam, Karla Pizzi, Alexander Wagner, Philip Sperl
2025Representation of Perceived Prosodic Similarity of Conversational Feedback.
Livia Qian, Carol Figueroa, Gabriel Skantze
2025Representing Speech Through Autoregressive Prediction of Cochlear Tokens.
Greta Tuckute, Klemen Kotar, Evelina Fedorenko, Daniel Yamins
2025Restoring Harmonics: Enhancing Speech Quality with Deep Mask and Harmonic Restoration Network.
Yu Zhao, Zengqiang Shang, Mou Wang, Xin Liu, Pengyuan Zhang
2025Rethinking Leveraging Pre-Trained Multi-Layer Representations for Speaker Verification.
Jin Sob Kim, Hyun Joon Park, Wooseok Shin, Sung Won Han
2025Revisiting WFST-based Hybrid Japanese Speech Recognition System for Individuals with Organic Speech Disorders.
Naoki Hojo, Ryoichi Takashima, Chihiro Sugiyama, Nobukazu Tanaka, Kanji Nohara, Kazunori Nozaki, Tetsuya Takiguchi
2025Revival with Voice: Multi-modal Controllable Text-to-Speech Synthesis.
Minsu Kim, Pingchuan Ma, Honglie Chen, Stavros Petridis, Maja Pantic
2025Rhotic Articulation in Australian English: Insights from MRI.
Michael Proctor, Tünde Szalay, Tharinda Piyadasa, Craig T. Jin, Naeim Sanaei, Amelia Gully, David Waddington, Sheryl Foster, Kirrie J. Ballard
2025Robot-assisted Recognition of Vocal Emotions in Pseudospeech for Cochlear Implanted Adolescents.
Gloria Araiza-Illan, Luke Meyer, Bert Maat, Deniz Baskent
2025Robust Neural Codec Language Modeling with Phoneme Position Prediction for Zero-Shot TTS.
Chunhui Lu, Xue Wen, Liming Song, Junkwang Oh
2025Robust Personal Voice Activity Detection for Mitigating Domain Mismatch and False Acceptance Scenarios.
Yuke Lin, Jun Chen, Wenjie Li, Longshuai Xiao, Chao Weng
2025Robust Target Speaker Diarization and Separation via Augmented Speaker Embedding Sampling.
Md Asif Jalal, Luca Remaggi, Vasileios Moschopoulos, Thanasis Kotsiopoulos, Vandana Rajan, Karthikeyan Saravanan, Anastasios Drosou, Junho Heo, Hyuk Oh, Seokyeong Jeong
2025Robust Unsupervised Adaptation of a Speech Recogniser Using Entropy Minimisation and Speaker Codes.
Rogier C. van Dalen, Shucong Zhang, Titouan Parcollet, Sourav Bhattacharya
2025Robust Vocal Intensity Prediction: Overcoming Dataset Bias with Pretrained Deep Models.
Quentin Le Tellier, Marc Evrard, Albert Rilliard, Jean-Sylvain Liénard
2025Robust fine-tuning of speech recognition models via model merging: application to disordered speech.
Alexandre Ducorroy, Rachid Riad
2025Robustness of F0 Ratio as a Diagnostic: Comparing Creaky Voice in Danish and Seoul Korean.
Michaela Watkins, Rasmus Puggaard-Rode, Paul Boersma, Silke Hamann
2025Rollback Speech: Smart Feedback Prompts for Lost Utterances in Unstable Online Calls.
Yuni Amaloa Quintero Villalobos, Wafaa Wardah, Sebastian Möller, Robert P. Spang
2025Room Impulse Response as a Prompt for Acoustic Echo Cancellation.
Fei Zhao, Shulin He, Xueliang Zhang
2025Running Conventional Automatic Speech Recognition on Memristor Hardware: A Simulated Approach.
Nick Rossenbach, Benedikt Hilmes, Leon Brackmann, Moritz Gunz, Ralf Schlüter
2025SA-RAS: Speaker-Aware Style Retrieval Augmented Generation for Expressive Zero-Shot Text-to-Speech Synthesis.
Xueru Li, Jingyuan Xing, Xiaofen Xing, Zhipeng Li, Xiangmin Xu
2025SAKURA: On the Multi-hop Reasoning of Large Audio-Language Models Based on Speech and Audio Information.
Chih-Kai Yang, Neo Ho, Yen-Ting Piao, Hung-yi Lee
2025SC-SOT: Conditioning the Decoder on Diarized Speaker Information for End-to-End Overlapped Speech Recognition.
Yuta Hirano, Sakriani Sakti
2025SCD-Conformer: Semantic Content Disentanglement for Text-Independent Speaker Verification.
Shanshan Yao, Dianlong Liu, Tian Li
2025SCRIBAL: A Digital Transcription Tool in Higher Education.
Javier Román, Pol Pastells, Mauro Vázquez Chas, Clara Puigventós, Montserrat Nofre, Mariona Taulé, Mireia Farrús
2025SDBench: A Comprehensive Benchmark Suite for Speaker Diarization.
Berkin Durmus, Blaise Munyampirwa, Eduardo Pacheco, Atila Orhon, Andrey Leonov
2025SEED: Speaker Embedding Enhancement Diffusion Model.
Kihyun Nam, Jungwoo Heo, Jee-weon Jung, Gangin Park, Chaeyoung Jung, Ha-Jin Yu, Joon Son Chung
2025SGED-Probe: Probing E2E ASR decoder and aligner for spoken grammar error detection under three speaking practice conditions.
Chowdam Venkata Thirumala Kumar, Chiranjeevi Yarra
2025SHEET: A Multi-purpose Open-source Speech Human Evaluation Estimation Toolkit.
Wen-Chin Huang, Erica Cooper, Tomoki Toda
2025SIDC-KWS: Efficient Spiking Inception-Dilated Conformer with Self-Attention for Keyword Spotting.
Jin-Gyo Lim, Seong-Eun Kim
2025SLASH: Self-Supervised Speech Pitch Estimation Leveraging DSP-derived Absolute Pitch.
Ryo Terashima, Yuma Shirahata, Masaya Kawamura
2025SMARTMOS: Modeling Subjective Audio Quality Evaluation for Real-Time Applications.
Sivakumar Balasubramanian, Jose Antonio Jimenez Amador, Kaustubh Kalgaonkar, King-Wei Hor, Sriram Srinivasan
2025SNIFR : Boosting Fine-Grained Child Harmful Content Detection Through Audio-Visual Alignment with Cascaded Cross-Transformer.
Orchid Chetia Phukan, Mohd Mujtaba Akhtar, Girish, Swarup Ranjan Behera, Abu Osama Siddiqui, Sarthak Jain, Priyabrata Mallick, Jaya Sai Kiran Patibandla, Pailla Balakrishna Reddy, Arun Balaji Buduru, Rajesh Sharma
2025SNR-Aligned Consistent Diffusion for Adaptive Speech Enhancement.
Yonghyeon Jun, Beomjun Woo, Myeonghun Jeong, Nam Soo Kim
2025SOMSRED-SVC: Sequential Output Modeling with Speaker Vector Constraints for Joint Multi-Talker Overlapped ASR and Speaker Diarization.
Naoki Makishima, Naotaka Kawata, Taiga Yamane, Mana Ihori, Tomohiro Tanaka, Satoshi Suzuki, Shota Orihashi, Ryo Masumura
2025SOVA-Bench: Benchmarking the Speech Conversation Ability for LLM-based Voice Assistant.
Yixuan Hou, Heyang Liu, Yuhao Wang, Ziyang Cheng, Ronghua Wu, Qunshan Gu, Yanfeng Wang, Yu Wang
2025SPCODEC: Split and Prediction for Neural Speech Codec.
Liang Wen, Lizhong Wang, Yuxing Zheng, Weijing Shi, Kwang Pyo Choi
2025SPEAKtoCOPD: a flashmob study to collect COPD speech.
Loes van Bemmel, Lauren G. Reinders, Folkert Brijker, Bas Holverda, Frits M. E. Franssen, Hanneke van Helvoort, Visara Urovi, Marieke Spreeuwenberg, Sami O. Simons
2025SPGISpeech 2.0: Transcribed multi-speaker financial audio for speaker-tagged transcription.
Raymond Grossman, Taejin Park, Kunal Dhawan, Andrew Titus, Sophia Zhi, Yulia Shchadilova, Weiqing Wang, Jagadeesh Balam, Boris Ginsburg
2025SQ-AST: A Transformer-Based Model for Speech Quality Prediction.
Wafaa Wardah, Robert P. Spang, Vincent Barriac, Jan Reimes, Anna Llagostera, Jens Berger, Sebastian Möller
2025SSF-DST: A Spectro-Spatial Features Enhanced Deep Spatiotemporal Network for EEG-Based Auditory Attention Detection.
Tong Zhu, Xiaoke Yang, Jian Zhou, Lu Li, Zhao Lv, Cunhang Fan
2025SSPS: Self-Supervised Positive Sampling for Robust Self-Supervised Speaker Verification.
Théo Lepage, Réda Dehak
2025STCON NIST SRE24 System: Composite Speaker Recognition Solution for Challenging Scenarios.
Stepan Malykh, Alexander Anikin, Nikita Khmelev, Anastasia Korenevskaya, Anastasia Zorkina, Sergey Novoselov, Vladislav Marchevskiy, Vladimir Volokhov, Andrey Shulipa, Alexander Kozlov, Alexander Melnikov, Vasiliy Galyuk, Timur Pekhovskiy
2025STOPA: A Dataset of Systematic VariaTion Of DeePfake Audio for Open-Set Source Tracing and Attribution.
Anton Firc, Manasi Chhibber, Jagabandhu Mishra, Vishwanath Pratap Singh, Tomi Kinnunen, Kamil Malinka
2025SaD: A Scenario-Aware Discriminator for Speech Enhancement.
Xihao Yuan, Siqi Liu, Yan Chen, Hang Zhou, Chang Liu, Hanting Chen, Jie Hu
2025SardinianVoxes: A Speech Recognition Dataset for the Sardinian Languages.
Salvatore Carta, Alessandro Giuliani, Marco Manolo Manca, Mirko Marras, Leonardo Piano
2025SawtArabi: A Benchmark Corpus for Arabic TTS. Standard, Dialectal and Code-Switching.
Vasista Sai Lodagala, Lamya Alkanhal, Daniel Izham, Shivam Mehta, Shammur Absar Chowdhury, Aqeelah Makki, Hamdy S. Hussein, Gustav Eje Henter, Ahmed Ali
2025Scalable Offline ASR for Command-Style Dictation in Courtrooms.
Kumarmanas Nethil, Vaibhav Mishra, Kriti Anandan, Kavya Manohar
2025Scalable Spontaneous Speech Dataset (SSSD): Crowdsourcing Data Collection to Promote Dialogue Research.
Zaid Sheikh, Shuichiro Shimizu, Siddhant Arora, Jiatong Shi, Samuele Cornell, Xinjian Li, Shinji Watanabe
2025Scaling Laws for Synthetic Speech for Model Training.
Christoph Minixhofer, Ondrej Klejch, Peter Bell
2025Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach.
Umberto Cappellazzo, Minsu Kim, Stavros Petridis, Daniele Falavigna, Alessio Brutti
2025Scaling and Prompting for Improved End-to-End Spoken Grammatical Error Correction.
Mengjie Qian, Rao Ma, Stefano Bannò, Kate M. Knill, Mark J. F. Gales
2025Scaling beyond Denoising: Submitted System and Findings in URGENT Challenge 2025.
Zhihang Sun, Andong Li, Tong Lei, Rilin Chen, Meng Yu, Chengshi Zheng, Yi Zhou, Dong Yu
2025Scaling pseudo-labeling data for end-to-end low-resource speech translation (the case of Kurdish language).
Mohammad MohammadAmini, Aghilas Sini, Marie Tahon, Antoine Laurent
2025Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs.
Hayato Futami, Emiru Tsunoo, Yosuke Kashiwagi, Yuki Ito, Hassan Shahmohammadi, Siddhant Arora, Shinji Watanabe
2025Score-Based Training for Energy-Based TTS Models.
Wanli Sun, Anton Ragni
2025Seamless Dysfluent Speech Text Alignment for Disordered Speech Analysis.
Zongli Ye, Jiachen Lian, Xuanru Zhou, Jinming Zhang, Haodong Li, Shuhe Li, Chenxu Guo, Anaisha Das, Peter Park, Zoe Ezzes, Jet Vonk, Brittany Morin, Rian Bogley, Lisa Wauters, Zachary A. Miller, Maria Luisa Gorno-Tempini, Gopala Anumanchipalli
2025Segmentation-Variant Codebooks for Preservation of Paralinguistic and Prosodic Information.
Nicholas Sanders, Yuanchao Li, Korin Richmond, Simon King
2025Selective Auditory Attention Decoding in Naturalistic Conversations Using EEG-Based Speech Envelope Tracking in Multi-Speaker Environments.
Gabriel Ivucic, Saurav Pahuja, Dashanka De Silva, Tanja Schultz
2025Selective Channel Attention based Target Speaker Voice Activity Detection for Speaker Diarization under AD-HOC Microphone Array Settings.
Hongyu Zhang, Ming Cheng, Jing Feng, Ming Li
2025Selective Invocation for Multilingual ASR: A Cost-effective Approach Adapting to Speech Recognition Difficulty.
Hongfei Xue, Yufeng Tang, Jun Zhang, Xuelong Geng, Lei Xie
2025Self-Improvement for Audio Large Language Model using Unlabeled Speech.
Shaowen Wang, Xinyuan Chen, Yao Xu
2025Self-Supervised Models of Speech Processing for Haitian Creole.
William N. Havard, Renauld Govain, Benjamin Lecouteux, Emmanuel Schang
2025Self-supervised Optimality-Guided Learning of Speech Articulation.
Juraj Simko, Benjamin Elie, Alice Turk
2025Self-supervised learning of speech representations with Dutch archival data.
Nik Vaessen, Roeland Ordelman, David A. van Leeuwen
2025Semantic Processing During Spoken Word Production by Children with Cochlear Implants.
Man Wang, Yixin Ding, Niels O. Schiller
2025Semantic-Aware Interpretable Multimodal Music Auto-Tagging.
Andreas Patakis, Vassilis Lyberatos, Spyridon Kantarelis, Edmund Dervakos, Giorgos Stamou
2025Semi-Supervised Learning for Automatic Speech Recognition with Word Error Rate Estimation and Targeted Domain Data Selection.
Chanho Park, Thomas Hain
2025Sentence-Final Particles in Mandarin Child-Directed Speech: Frequency and Impact on Speech Rate.
Yizhi Liu, Luyuan Geng, Yan Gu, Mengru Han
2025SepVAC: Multitask Learning of Speaker Separation, Speaker Localization, Microphone Array Localization, and Room Acoustic Parameter Estimation in Various Acoustic Conditions.
Roland Hartanto, Sakriani Sakti, Koichi Shinoda
2025SiamCTC: Learning Speech Representations through Monotonic Temporal Alignment.
SooHwan Eom, Mark Hasegawa-Johnson, Chang D. Yoo
2025Significance of Time-Frequency preprocessing for automatic Ultrasonic Vocalization classification in Autism Spectrum Disorder model detection.
Szymon Szmajdzinski, Juliusz Wójtowicz-Kruk, Ivan Ryzhankow, Lukasz Lazarski, Jakub Zak, Wladyslaw Sredniawa
2025Simple and Effective Content Encoder for Singing Voice Conversion via SSL-Embedding Dimension Reduction.
Wangjin Zhou, Tianjiao Du, Chenglin Xu, Sheng Li, Yi Zhao, Tatsuya Kawahara
2025Simultaneous Masked and Unmasked Decoding with Speculative Decoding Masking for Fast ASR without Accuracy Loss.
Koji Okabe, Hitoshi Yamamoto
2025Simultaneous Speech Translation Integrated Compact Multiple Sound Spot Synthesis System On A Laptop Carried Out With A Backpack.
Takuma Okamoto, Michiyo Kono
2025Skip-Salsa: Skip Synchronous Fusion of ASR LLM Decoders.
Ashish R. Mittal, Darshan Prabhu, Sunita Sarawagi, Preethi Jyothi
2025SonarGuard2: Ultrasonic Face Liveness Detection Based on Adaptive Doppler Effect Feature Extraction.
Xiaoming Zhang, Ke-Yue Zhang, Taiping Yao, Songjun Cao, Shouhong Ding, Long Ma
2025Song Form-aware Full-Song Text-to-Lyrics Generation with Multi-Level Granularity Syllable Count Control.
Yunkee Chae, Eunsik Shin, Suntae Hwang, Seungryeol Paik, Kyogu Lee
2025SoundSculpt: Direction and Semantics Driven Ambisonic Target Sound Extraction.
Tuochao Chen, D. Shin, Hakan Erdogan, Sinan Hersek
2025Sounding Like a Winner? Prosodic Differences in Post-Match Interviews.
Sofoklis Kakouros, Haoyu Chen
2025Source Verification for Speech Deepfakes.
Viola Negroni, Davide Salvi, Paolo Bestagini, Stefano Tubaro
2025Spatially Weighted Contrastive Learning for Robust Sound Source Localization.
Hyun-Soo Kim, Da-Hee Yang, Joon-Hyuk Chang
2025Spatio-Spectral Diarization of Meetings by Combining TDOA-based Segmentation and Speaker Embedding-based Clustering.
Tobias Cord-Landwehr, Tobias Gburrek, Marc Deegen, Reinhold Haeb-Umbach
2025Speaker Conditioning of Voice Activity Detection via Implicit Separation.
Matthew Maciejewski
2025Speaker Diarization with Overlapping Community Detection Using Graph Attention Networks and Label Propagation Algorithm.
Zhaoyang Li, Jie Wang, Xiaoxiao Li, Wangjie Li, Longjie Luo, Lin Li, Qingyang Hong
2025Speaker Normalization and Content Restoration for Zero-Shot Voice Conversion with Attention-Enhanced Discriminator.
Desheng Hu, Yang Xiang, Jian Lu, Xinhui Hu, Xinkang Xu
2025Speaker Separation for an Unknown Number of Speakers with Encoder-Decoder-Based Contextual Information Module.
Xue Yang, Guiru Shen, Yu Yang
2025Speaker Targeting via Self-Speaker Adaptation for Multi-talker ASR.
Weiqing Wang, Taejin Park, Ivan Medennikov, Jinhan Wang, Kunal Dhawan, He Huang, Nithin Rao Koluguri, Jagadeesh Balam, Boris Ginsburg
2025Speaker-Aware Multi-Task Learning for Speech Emotion Recognition.
Xiaohan Shi, Xingfeng Li, Tomoki Toda
2025Speaker-Distinguishable CTC: Learning Speaker Distinction Using CTC for Multi-Talker Speech Recognition.
Asahi Sakuma, Hiroaki Sato, Ryuga Sugano, Tadashi Kumano, Yoshihiko Kawai, Tetsuji Ogawa
2025Speaker-agnostic Emotion Vector for Cross-speaker Emotion Intensity Control.
Masato Murata, Koichi Miyazaki, Tomoki Koriyama
2025Speaker-specific Patterns of Phonetic Covariation in Korean Word-medial Stops and the Role of Phonological and Morphological Contexts.
Chloe D. Kwon
2025SpecTokenizer: A Lightweight Streaming Codec in the Compressed Spectrum Domain.
Zixiang Wan, Guochang Zhang, Yifeng He, Jianqiang Wei
2025Spectrotemporal Modulation: Efficient and Interpretable Feature Representation for Classifying Speech, Music, and Environmental Sounds.
Andrew Chang, Yike Li, Iran R. Roman, David Poeppel
2025Speech Annotation for A: Accuracy, Access, and Application.
Zirong Li, Hongchen Wu, Yixin Gu, Yao Du, Yang Yue
2025Speech Enhancement based on cascaded two flows.
Seonggyu Lee, Sein Cheong, Sangwook Han, Kihyuk Kim, Jong Won Shin
2025Speech Enhancement with Dual-path Multi-Channel Linear Prediction Filter and Multi-norm Beamforming.
Chengyuan Qin, Wenmeng Xiong, Jing Zhou, Maoshen Jia, Changchun Bao
2025Speech Kinematic Analysis from Acoustics: Scientific, Clinical and Practical Applications.
Carol Y. Espy-Wilson
2025Speech LLMs in Low-Resource Scenarios: Data Volume Requirements and the Impact of Pretraining on High-Resource Languages.
Seraphina Fong, Marco Matassoni, Alessio Brutti
2025Speech Mutil-label Emotion Recognition Using Asymmetric Class Loss Function Based on Effective Samples.
Shanshan Xiang, Hankiz Yilahun, Askar Hamdulla
2025Speech Reduction in French: The Relationship Between Vowel Space and Articulation Dynamics.
Kübra Bodur, Corinne Fredouille, Christine Meunier
2025Speech Reference Intervals: An Assessment of Feasibility in Depression Symptom Severity Prediction.
Lauren L. White, Ewan Carr, Judith Dineley, Catarina Botelho, Pauline Conde, Faith Matcham, Carolin Oetzmann, Amos Folarin, George Fairs, Agnes Norbury, Stefano Goria, Srinivasan Vairavan, Til Wykes, Richard J. B. Dobson, Vaibhav A. Narayan, Matthew Hotopf, Alberto Abad, Isabel Trancoso, Nicholas Cummins
2025Speech Unlearning.
Jiali Cheng, Hadi Amiri
2025Speech and Text Foundation Models for Depression Detection: Cross-Task and Cross-Language Evaluation.
Lucía Gómez-Zaragozá, Javier Marín-Morales, Mariano Alcañiz, Mohammad Soleymani
2025Speech power spectra: a window into neural oscillations in Parkinson's disease.
Sevada Hovsepyan, Mathew Magimai-Doss
2025Speech stimulus design to study the neural coding of speech and the impact of cochlear synaptopathy.
Etienne Gaudrain, Sarah Verhulst, Deniz Baskent
2025Speech transcription from South Tyrolean Dialect to Standard German with Whisper.
Luca Ducceschi, Greta H. Franzini
2025Speech-Based Automatic Chronic Kidney Disease Diagnosis via Transformer Fusion of Glottal and Spectrogram Features.
Jihyun Mun, Minhwa Chung, Sunhee Kim
2025Speech-IFEval: Evaluating Instruction-Following and Quantifying Catastrophic Forgetting in Speech-Aware Language Models.
Ke-Han Lu, Chun-Yi Kuan, Hung-yi Lee
2025Speech-guided Grapheme-to-Phoneme Conversion for Cantonese Text-to-Speech.
Timothy Shin Heng Mak, King Yiu Suen, Albert Y. S. Lam
2025Speech-to-Text Translation with Phoneme-Augmented CoT: Enhancing Cross-Lingual Transfer in Low-Resource Scenarios.
Gerard I. Gállego, Oriol Pareras, Martí Cortada Garcia, Lucas Takanori, Javier Hernando
2025SpeechDialogueFactory: A Framework for Natural Speech Dialogue Generation.
Minghan Wang, Ye Bai, Yuxia Wang, Thuy-Trang Vu, Ehsan Shareghi, Gholamreza Haffari
2025SpeechMLC: Speech Multi-label Classification.
Miseul Kim, Seyun Um, Hyeonjin Cha, Hong-Goo Kang
2025SpeechRefiner: Towards Perceptual Quality Refinement for Front-End Algorithms.
Sirui Li, Shuai Wang, Zhijun Liu, Zhongjie Jiang, Yannan Wang, Haizhou Li
2025SpeechSEC: A Unified Multi-Task Framework for Speech Synthesis, Editing, and Continuation.
Liming Liang, Dongchao Yang, Xianwei Zhuang, Yuxin Xie, Luo Chen, Yuehan Jin, Yuexian Zou
2025Speechless: Speech Instruction Training Without Speech for Low Resource Languages.
Alan Dao, Dinh Bach Vu, Huy Hoang Ha, Tuan Le Duc Anh, Shreyas Gopal, Yue Heng Yeo, Warren Keng Hoong Low, Eng Siong Chng, Jia Qi Yip
2025Spoken Language Modeling with Duration-Penalized Self-Supervised Units.
Nicol Visser, Herman Kamper
2025Spoken Language Understanding on Unseen Tasks With In-Context Learning.
Neeraj Agrawal, Sriram Ganapathy
2025Spoken Question Answering for Visual Queries.
Nimrod Shabtay, Zvi Kons, Avihu Dekel, Hagai Aronowitz, Ron Hoory, Assaf Arbelle
2025SpokenNativQA: Multilingual Everyday Spoken Queries for LLMs.
Firoj Alam, Md. Arid Hasan, Shammur Absar Chowdhury
2025Spot and Merge: A Hybrid Context Biasing Approach for Rare Word and Out of Vocabulary Recognition.
Jatin Agrawal, Bramhendra Koilakuntla, Srikanth Konjeti
2025Spotlight-TTS: Spotlighting the Style via Voiced-Aware Style Extraction and Style Direction Adjustment for Expressive Text-to-Speech.
Nam-Gyu Kim, Deok-Hyeon Cho, Seung-Bin Kim, Seong-Whan Lee
2025Stack Less, Repeat More: A Block Reusing Approach for Progressive Speech Enhancement.
Jangyeon Kim, Ui-Hyeop Shin, Jaehyun Ko, Hyung-Min Park
2025StarGAN-Aug: A Cross-domain Fault Audio Generation Method for High-performance Fault Diagnosis of Power Transformers.
Ben Niu, Yangjie Wei, Gang Yang, Yuqiao Wang, Shengling Yu
2025StarVC: A Unified Auto-Regressive Framework for Joint Text and Speech Generation in Voice Conversion.
Fengjin Li, Jie Wang, Yadong Niu, Yongqing Wang, Meng Meng, Jian Luan, Zhiyong Wu
2025Steering Deep Non-Linear Spatially Selective Filters for Weakly Guided Extraction of Moving Speakers in Dynamic Scenarios.
Jakob Kienegger, Timo Gerkmann
2025Streaming Non-Autoregressive Model for Accent Conversion and Pronunciation Improvement.
Tuan-Nam Nguyen, Ngoc-Quan Pham, Seymanur Akti, Alexander Waibel
2025Streaming Sortformer: Speaker Cache-Based Online Speaker Diarization with Arrival-Time Ordering.
Ivan Medennikov, Taejin Park, Weiqing Wang, He Huang, Kunal Dhawan, Jinhan Wang, Jagadeesh Balam, Boris Ginsburg
2025Stress in Spoken and Whistled Greek.
Andre Batchelder-Schwab, Vasileios Michos, Jonathan Barnes
2025Structured Codebook Based Hierarchical Framework for DNN for Computationally Efficient Speech Enhancement.
Chidambar B, Hanumanth Rao Naidu
2025Structured pruning for efficient systolic array accelerated cascade Speech-to-Text Translation.
Jean-Luc Rouas, Charles Brazier, Leila Ben Letaifa, Rafael Medina, Pedro Palacios, David Atienza, Giovanni Ansaloni
2025Study of vocal fold vibration using M-mode ultrasound: a proof of concept.
Juliette Dindart, Agnès Rouxel, Crystal Lin, Trung Kien Bui, Muriel Lefort, Claire Pillot-Loiseau, Christophe Trésallet, Frédérique Frouin
2025StutterCut: Uncertainty-Guided Normalised Cut for Dysfluency Segmentation.
Suhita Ghosh, Mélanie Jouaiti, Jan-Ole Perschewski, Sebastian Stober
2025Stuttering Detection Based on Self-Attention Weights of Temporal Acoustic Vector Sequence.
Genzo Miyahara, Tsuneo Kato, Akihiro Tamura
2025SuPseudo: A Pseudo-supervised Learning Method for Neural Speech Enhancement in Far-field Speech Recognition.
Longjie Luo, Lin Li, Qingyang Hong
2025Sub-band based Adaptive IIR Algorithm with Biquad Filter Stability Constraints for Feedforward Hear-Through Equalization.
Rishabh Gupta, MLNS Karthik, Omsrinath Chelamkuri
2025Subtyping Speech Errors in Childhood Speech Sound Disorders with Acoustic-to-Articulatory Speech Inversion.
Nina R. Benway, Saba Tabatabaee, Benjamin Munson, Jonathan Preston, Carol Y. Espy-Wilson
2025SupraDoRAL: Automatic Word Prominence Detection Using Suprasegmental Dependencies of Representations with Acoustic and Linguistic Context.
Jhansi Mallela, Upendra Vishwanath Y. S., Sankara Bharadwaj Rangavajjala, Bhaskar Bhatt, Chiranjeevi Yarra
2025Supralaryngeal Kinematics of Implosives in Central Vietnamese: An EMA Study.
Paul McGuire, Kye Shibata, Thanh Viet Cao, Feng-fan Hsieh, Yueh-Chin Chang
2025Swedish Whispers; Leveraging a Massive Speech Corpus for Swedish Speech Recognition.
Leonora Vesterbacka, Faton Rekathati, Robin Kurtz, Justyna Sikora, Agnes Toftgård
2025Switch Conformer with Universal Phonetic Experts for Multilingual ASR.
Masato Mimura, Jaeyoung Lee, Tatsuya Kawahara
2025SynHate: Detecting Hate Speech in Synthetic Deepfake Audio.
Rishabh Ranjan, Kishan Pipariya, Mayank Vatsa, Richa Singh
2025Synchronous analysis of abnormal acoustic and linguistic production in Parkinson's speech.
Daniel Escobar-Grisales, Cristian David Ríos-Urrego, Sabato Marco Siniscalchi, Adolfo M. García, Yamile Bocanegra, Leonardo Moreno, Elmar Nöth, Juan Rafael Orozco-Arroyave
2025Synonymity-Based Semantic Coding for Efficient Speech Compression.
Shanhui Gan, Zijian Liang, Kai Niu, Ping Zhang
2025Synthesizing Speech with Selected Perceptual Voice Qualities - A Case Study with Creaky Voice.
Frederik Rautenberg, Fritz Seebauer, Jana Wiechmann, Michael Kuhlmann, Petra Wagner, Reinhold Haeb-Umbach
2025Synthetic Data Generation for Phrase Break Prediction with Large Language Model.
Hoyeon Lee, Sejung Son, Ye-Eun Kang, Jong-Hwan Kim
2025Synthetic Dysarthric Speech: A Supplement, Not a Substitute for Authentic Data in Dysarthric Speech Recognition.
Jingting Li, Keyi Feng, Xinran Zhao, Yan Wang, Su-Jing Wang
2025Synthetic Speech Source Tracing using Metric Learning.
Dimitrios Koutsianos, Stavros Zacharopoulos, Yannis Panagakis, Themos Stafylakis
2025TA-RIR: Topology-Aware Neural Modeling of Acoustic Propagation for Room Impulse Response Synthesis.
Junhui Zhao, Hang Chen, Qing Wang, Jun Du, Yanhui Tu, Feng Ma
2025TADA: Training-free Attribution and Out-of-Domain Detection of Audio Deepfakes.
Adriana Stan, David Combei, Dan Oneata, Horia Cucu
2025TELVID: A Multilingual Multi-modal Corpus for Speaker Recognition.
Karen Jones, Kevin Walker, Christopher Caruso, Elliot Singer, Trang Nguyen, Robert B. Dunn, Stephanie M. Strassel
2025TF-Mamba: A Time-Frequency Network for Sound Source Localization.
Yang Xiao, Rohan Kumar Das
2025TF-SkiMNet: Speech Enhancement Based on Inplace Modeling and Skipping Memory in Time-Frequency Domain.
Zixuan Li, Shulin He, Jinglin Bai, Xueliang Zhang
2025TS-URGENet: A Three-stage Universal Robust and Generalizable Speech Enhancement Network.
Xiaobin Rong, Dahan Wang, Qinwen Hu, Yushi Wang, Yuxiang Hu, Jing Lu
2025TS3-Codec: Transformer-Based Simple Streaming Single Codec.
Haibin Wu, Naoyuki Kanda, Sefik Emre Eskimez, Jinyu Li
2025TSDT-Net: Ultra-Low-Complexity Two-Stage Model Combining Dual-Path-Transformer and Transform-Average-Concatenate Network for Speech Enhancement.
Yi Gao, Hangting Chen, Siyu Zhang, Qingshan Yang, Jingcong Chen
2025TTMBA: Towards Text To Multiple Sources Binaural Audio Generation.
Yuxuan He, Xiaoran Yang, Ningning Pan, Gongping Huang
2025TVC-MusicGen: Time-Varying Structure Control for Background Music Generation via Self-Supervised Training.
Chenyu Yang, Hangting Chen, Shuai Wang, Haina Zhu, Haizhou Li
2025TalTech Systems for the Interspeech 2025 ML-SUPERB 2.0 Challenge.
Tanel Alumäe, Artem Fedorchenko
2025Talker Normalization in Chinese Bilinguals: A Comparative Study.
Mingxi Lu, Ran Tao, Yujia Tian
2025TargetVoice: Single Channel Low-Latency Target Speaker Extraction.
Arun Kumar Pallala, Nivedita Chennupati, Balaji Padmanaban, Rakesh Pogula, Uma Subhashini Ravuri, Naveen Ellanki, Harish Rajamani, Naveen Ambati
2025Teacher-Free Knowledge Distillation for Improving Short-Utterance Spoken Language Identification.
Spandan Dey, Hirak Mondal, Sanjay Kumar Kurmi
2025Teaching Audio-Aware Large Language Models What Does Not Hear: Mitigating Hallucinations through Synthesized Negative Samples.
Chun-Yi Kuan, Hung-yi Lee
2025Temp4Cap: Temporally-aligned Automated Audio Captioning.
Ho-Young Choi, Jae-Heung Cho, Pil Moo Byun, Won-Gook Choi, Joon-Hyuk Chang
2025Temporal Convolutional Network with Smoothed and Weighted Losses for Distant Voice Activity and Overlapped Speech Detection.
Shaojie Li, Qintuya Si, De Hu
2025Temporal Modeling of Room Impulse Response Generation via Multi-Scale Autoregressive Learning.
Sheng Lyu, Yuemin Yu, Chenshu Wu
2025Temporal organization of prenuclear glides in Hefei Mandarin.
Yifan Yang, Zhiheng Qian
2025Test-Time Training for Speech Enhancement.
Avishkar Behera, Riya Ann Easow, Venkatesh Parvathala, K. Sri Rama Murty
2025Test-Time Training for Speech-based Depression Detection.
Sri Harsha Dumpala, Chandramouli Shama Sastry, Rudolf Uher, Sageev Oore
2025Text Entry for All: Towards Speech-based Multimodal Interaction for Inclusion, Accessibility and the Preservation of the World's Linguistic Heritage.
Julián Zapata, Lara Hanna
2025Text-Enhanced Audio Encoder for Large Language Model based Speech Recognition via Cross-Modality Pre-training with Unpaired Audio-Text Data.
Hang Su, Yuxiang Kong, Lichun Fan, Jian Luan
2025Thai Speech Spoofing Detection Dataset with Variations in Speaking Styles.
Ticho Urai, Pachara Boonsarngsuk, Ekapol Chuangsuwanich
2025The 1st SpeechWellness Challenge: Detecting Suicide Risk Among Adolescents.
Wen Wu, Ziyun Cui, Chang Lei, Yinan Duan, Diyang Qu, Ji Wu, Bowen Zhou, Runsen Chen, Chao Zhang
2025The 2024 NIST Speaker Recognition Evaluation.
Craig S. Greenberg, Lukas L. Diduch, Audrey Tong, Elliot Singer, Trang Nguyen, Robert Dunn, Lisa P. Mason, Beth Matys
2025The Development of Speech Rhythm in Putonghua-Learning Preschool Children in South Xinjiang Uyghur Autonomous Region of China.
Aijun Li, Zhiwei Wang, Jun Gao, Xin Zhou
2025The Effect of Word Predictability on Spoken Cross-Language Intelligibility.
Wei Xue, Iuliia Zaitova, Bernd Möbius
2025The Faetar Speech Recognition Benchmark.
Michael Ong, Sean Robertson, Leo Peckham, Alba Jorquera Jimenez de Aberasturi, Paula Arkhangorodsky, Robin Huo, Aman Sakhardande, Mark Hallap, Naomi Nagy, Ewan Dunbar
2025The Interspeech 2025 Challenge on Speech Emotion Recognition in Naturalistic Conditions.
Abinay Reddy Naini, Lucas Goncalves, Ali N. Salman, Pravin Mote, Ismail Rasim Ulgen, Thomas Thebaud, Laureano Moro-Velázquez, Leibny Paola García, Najim Dehak, Berrak Sisman, Carlos Busso
2025The Interspeech 2025 Speech Accessibility Project Challenge.
Xiuwen Zheng, Bornali Phukon, Jonghwan Na, Ed Cutrell, Kyu J. Han, Mark Hasegawa-Johnson, Pan-Pan Jiang, Aadhrik Kuila, Colin Lea, Bob MacDonald, Gautam Varma Mantena, Venkatesh Ravichandran, Leda Sari, Katrin Tomanek, Chang D. Yoo, Chris Zwilling
2025The ML-SUPERB 2.0 Challenge: Towards Inclusive ASR Benchmarking for All Language Varieties.
William Chen, Chutong Meng, Jiatong Shi, Martijn Bartelds, Shih-Heng Wang, Hsiu-Hsuan Wang, Rafael Mosquera, Sara Hincapie, Dan Jurafsky, Antonis Anastasopoulos, Hung-yi Lee, Karen Livescu, Shinji Watanabe
2025The Multimodal Information Based Speech Processing (MISP) 2025 Challenge: Audio-Visual Diarization and Recognition.
Ming Gao, Shilong Wu, Hang Chen, Jun Du, Chin-Hui Lee, Shinji Watanabe, Jingdong Chen, Sabato Marco Siniscalchi, Odette Scharenborg
2025The NaijaVoices Dataset: Cultivating Large-Scale, High-Quality, Culturally-Rich Speech Data for African Languages.
Chris Emezue, NaijaVoices Community, Busayo Awobade, Abraham Toluwase Owodunni, Handel Emezue, Gloria Monica Tobechukwu Emezue, Nefertiti Nneoma Emezue, Sewade Ogun, Bunmi Akinremi, David Ifeoluwa Adelani, Chris Pal
2025The Prosodic Characteristics of Standard Chinese Rhetorical Questions in Naturalistic Settings.
Shuwen Chen, Qingke Sun, Yue Huang, Yingyi Luo
2025The Role of Contextual Variation in Learning Cantonese Tones from Naturalistic Speech.
Fengyue Lisa Zhao, Jennifer Kuo
2025The Role of Syntactic Structures in Shaping Directionality in Trisyllabic Tone Sandhi: Evidence from Tianjin Mandarin.
Siqi Lu, Hui Feng, Ziyu Xiong
2025The Role of Voiced Consonant Duration in Sung Vowel-Consonant and Consonant-Vowel Recognition.
Allan Vurma, Einar Meister, Lya Meister, Jaan Ross, Marju Raju, Veeda Kala, Tuuri Dede
2025The Speech Accessibility Project: Best Practices for Collection and Curation of Disordered Speech.
Chris Zwilling, Mark Hasegawa-Johnson, Heather Hodges, Lorraine O. Ramig, Adina Bradshaw, Clarion Mendes, Heejin Kim, Alexandria Barkhimer, Laura Mattie, Meg Dickinson, Shawnise Carter, Marie Moore Channell
2025The State Of TTS: A Case Study with Human Fooling Rates.
Praveen Srinivasa Varadhan, Sherry Thomas, Sai Teja M. S., Suvrat Bhooshan, Mitesh M. Khapra
2025The Sub-3Sec Problem: From Text-Independent to Text-Dependent Corpus.
Ruichen Zuo, Kong Aik Lee, Zilong Huang, Man-Wai Mak
2025The Text-to-speech in the Wild (TITW) Database.
Jee-weon Jung, Wangyou Zhang, Soumi Maiti, Yihan Wu, Xin Wang, Ji-Hoon Kim, Yuta Matsunaga, Seyun Um, Jinchuan Tian, Hye-jin Shim, Nicholas W. D. Evans, Joon Son Chung, Shinnosuke Takamichi, Shinji Watanabe
2025The function of creaky voice in South Korean: A perception study.
Patrik Hrabánek, Michaela Watkins, Silke Hamann
2025The mutual exclusivity bias of bilingual visually grounded speech models.
Dan Oneata, Leanne Nortje, Yevgen Matusevych, Herman Kamper
2025The role of audio-visual integration in the time course of phonetic encoding in self-supervised speech models.
Yi Wang, Oli Danyi Liu, Peter Bell
2025Theoretical proposal for a unified Bayesian model of adaptation in non-interactive and interactive speech production.
Mélen Guillaume, Anahita Basirat, Julien Diard
2025Thinking Fast and Slow: Robust Speech Recognition via Deep Filter-Tuning.
Dianwen Ng, Kun Zhou, Bin Ma, Eng Siong Chng
2025Thinking in Directivity: Speech Large Language Model for Multi-Talker Directional Speech Recognition.
Jiamin Xie, Ju Lin, Yiteng Huang, Tyler Vuong, Zhaojiang Lin, Zhaojun Yang, Peng Su, Prashant Rawat, Sangeeta Srivastava, Ming Sun, Florian Metze
2025TinyClick: Single-Turn Agent for Empowering GUI Automation.
Pawel Pawlowski, Krystian Zawistowski, Wojciech Lapacz, Adam Wiacek, Marcin Skorupa, Sebastien Postansque, Jakub Hoscilowicz
2025Token-Level Logits Matter: A Closer Look at Speech Foundation Models for Ambiguous Emotion Recognition.
Jule Valendo Halim, Siyi Wang, Hong Jia, Ting Dang
2025Tonal Contrasts in the Malipo Variety of the Mienic Language.
Changhong Du, Fang Hu
2025Tonal Perception in Changde Mandarin.
Zhenrui Zhang, Fang Hu
2025Tonal Variation and Word Meaning in Taiwanese.
Yu-Ying Chuang, Sheng-Fu Wang
2025Tonality-Based Accompaniment-Guided Automatic Singing Evaluation.
Pei-Chin Hsieh, Yih-Liang Shen, Ngoc Son Tran, Tai-Shih Chi
2025Tone recognition in low-resource languages of North-East India: peeling the layers of SSL-based speech models.
Parismita Gogoi, Sishir Kalita, Wendy Lalhminghlui, Viyazonuo Terhiija, Moakala Tzudir, Priyankoo Sarmah, S. R. M. Prasanna
2025Towards Accurate Phonetic Error Detection Through Phoneme Similarity Modeling.
Xuanru Zhou, Jiachen Lian, Cheol Jun Cho, Tejas S. Prabhune, Shuhe Li, William Li, Rodrigo Ortiz, Zoe Ezzes, Jet Vonk, Brittany Morin, Rian Bogley, Lisa Wauters, Zachary A. Miller, Maria Luisa Gorno-Tempini, Gopala Anumanchipalli
2025Towards Adaptable and Intelligible Speech Synthesis in Noisy Environments.
Lubos Marcinek, Jonas Beskow, Joakim Gustafson
2025Towards Better Disentanglement in Non-Autoregressive Zero-Shot Expressive Voice Conversion.
Seymanur Akti, Tuan-Nam Nguyen, Alexander Waibel
2025Towards Bitrate-Efficient and Noise-Robust Speech Coding with Variable Bitrate RVQ.
Yunkee Chae, Kyogu Lee
2025Towards Classification of Typical and Atypical Disfluencies: A Self Supervised Representation Approach.
Priyanka Kommagouni, Pragya Khanna, Vamshiraghusimha Narasinga, Anirudh Bocha, Anil Kumar Vuppala
2025Towards Diverse and Efficient Audio Captioning via Diffusion Models.
Manjie Xu, Chenxing Li, Yong Ren, Xinyi Tu, Ruibo Fu, Wei Liang, Dong Yu
2025Towards Domain-Specific Spoken Language Understanding for a Catalan Voice-Controlled Video Game.
Alex Peiró Lilja, Rodolfo Zevallos, Carme Armentano-Oller, José Giraldo, Cristina España-Bonet, Mireia Farrús
2025Towards Early Prediction of Self-Supervised Speech Model Performance.
Ryan Whetten, Lucas Maison, Titouan Parcollet, Marco Dinarelli, Yannick Estève
2025Towards Efficiently Whisper Fine-tuning with Monotonic Alignments.
Ziyang Zhuang, Tao Wei, Ming Fang, Ning Cheng, Shaojun Wang, Jing Xiao
2025Towards Emotionally Consistent Text-Based Speech Editing: Introducing EmoCorrector and The ECD-TSE Dataset.
Rui Liu, Pu Gao, Jiatian Xi, Berrak Sisman, Carlos Busso, Haizhou Li
2025Towards Few-Shot Training-Free Anomaly Sound Detection.
Ho-Hsiang Wu, Wei-Cheng Lin, Abinaya Kumar, Luca Bondi, Shabnam Ghaffarzadegan, Juan Pablo Bello
2025Towards Frame-level Quality Predictions of Synthetic Speech.
Michael Kuhlmann, Fritz Seebauer, Petra Wagner, Reinhold Haeb-Umbach
2025Towards Fusion of Neural Audio Codec-based Representations with Spectral for Heart Murmur Classification via Bandit-based Cross-Attention Mechanism.
Orchid Chetia Phukan, Girish, Mohd Mujtaba Akhtar, Swarup Ranjan Behera, Priyabrata Mallick, Santanu Roy, Arun Balaji Buduru, Rajesh Sharma
2025Towards High-Quality LLM-Based Data for French Spontaneous Speech Simplification: an Exo-Refinement Approach.
Lucia Ormaechea Grijalba, Nikos Tsourakis, Pierrette Bouillon, Benjamin Lecouteux, Didier Schwab
2025Towards Human-like Multimodal Conversational Agent by Generating Engaging Speech.
Taesoo Kim, Yongsik Jo, Hyunmin Song, Taehwan Kim
2025Towards Inclusive ASR: Investigating Voice Conversion for Dysarthric Speech Recognition in Low-Resource Languages.
Chin-Jou Li, Eunjung Yeo, Kwanghee Choi, Paula Andrea Pérez-Toro, Masao Someki, Rohan Kumar Das, Zhengjun Yue, Juan Rafael Orozco-Arroyave, Elmar Nöth, David R. Mortensen
2025Towards Inclusive and Fair ASR: Insights from the SAPC Challenge for Optimizing Disordered Speech Recognition.
Nada Gohider, Otman Basir
2025Towards LLM-Empowered Fine-Grained Speech Descriptors for Explainable Emotion Recognition.
Youjun Chen, Xurong Xie, Haoning Xu, Mengzhe Geng, Guinan Li, Chengxi Deng, Huimeng Wang, Shujie Hu, Xunying Liu
2025Towards Machine Unlearning for Paralinguistic Speech Processing.
Orchid Chetia Phukan, Girish, Mohd Mujtaba Akhtar, Shubham Singh, Swarup Ranjan Behera, Vandana Rajan, Muskaan Singh, Arun Balaji Buduru, Rajesh Sharma
2025Towards Multi-Level Transcript Segmentation: LoRA Fine-Tuning for Table-of-Contents Generation.
Steffen Freisinger, Philipp Seeberger, Thomas Ranzenberger, Tobias Bocklet, Korbinian Riedhammer
2025Towards One-bit ASR: Extremely Low-bit Conformer Quantization Using Co-training and Stochastic Precision.
Zhaoqing Li, Haoning Xu, Zengrui Jin, Lingwei Meng, Tianzi Wang, Huimeng Wang, Youjun Chen, Mingyu Cui, Shujie Hu, Xunying Liu
2025Towards Personalised Audio Visual Speech Enhancement.
Mandar Gogate, Kia Dashtipour, Amir Hussain
2025Towards Pre-training an Effective Respiratory Audio Foundation Model.
Daisuke Niizumi, Daiki Takeuchi, Masahiro Yasuda, Binh Thien Nguyen, Yasunori Ohishi, Noboru Harada
2025Towards Robust Overlapping Speech Detection: A Speaker-Aware Progressive Approach Using WavLM.
Zhaokai Sun, Li Zhang, Qing Wang, Pan Zhou, Lei Xie
2025Towards Robust Speaker Recognition against Intrinsic Variation with Foundation Model Few-shot Tuning and Effective Speech Synthesis.
Zhiyong Chen, Shuhang Wu, Xinnuo Li, Zhiqi Ai, Shugong Xu
2025Towards Secure User Authentication for Headphones via In-Ear or In-Earcup Microphones.
N. Shashaank, Xiao Quan, Andrew Kaluzny, Leonard Varghese, Marko Stamenovic, Chuan-Che Huang
2025Towards Sentence Level Imagined Speech Generation from EEG signals.
Sparsh Rastogi, Harsh Dadwal, Khushboo Modi, Jatin Bedi, Jasmeet Singh
2025Towards Source Attribution of Singing Voice Deepfake with Multimodal Foundation Models.
Orchid Chetia Phukan, Girish, Mohd Mujtaba Akhtar, Swarup Ranjan Behera, Priyabrata Mallick, Pailla Balakrishna Reddy, Arun Balaji Buduru, Rajesh Sharma
2025Towards Temporally Explainable Dysarthric Speech Clarity Assessment.
Seohyun Park, Chitralekha Gupta, Michelle Kah Yian Kwan, Xinhui Fung, Alexander Wenjun Yip, Suranga Nanayakkara
2025Towards a Japanese Full-duplex Spoken Dialogue System.
Atsumoto Ohashi, Shinya Iizuka, Jingjing Jiang, Ryuichiro Higashinaka
2025Towards a Unified Benchmark for Arabic Pronunciation Assessment: Qur'anic Recitation as Case Study.
Yassine El Kheir, Omnia Ibrahim, Amit Meghanani, Nada AlMarwani, Hawau Olamide Toyin, Sadeen Alharbi, Modar Alfadly, Lamya Alkanhal, Ibrahim Selim, Shehab Elbatal, Salima Mdhaffar, Thomas Hain, Yasser Hifny, Mostafa Shahin, Ahmed Ali
2025Towards a dynamical model of transitions between fluent and stuttered speech.
Yijing Lu, Khalil Iskarous, Louis Goldstein
2025Towards an Ultra-Low-Delay Neural Audio Coding with Computational Efficiency.
Byeong Hyeon Kim, Hyungseob Lim, Inseon Jang, Hong-Goo Kang
2025Towards atypical speech transcription using LLM-based ASR.
Jinda Zhang, Aanchan Mohan
2025Towards the Objective Characterisation of Major Depressive Disorder Using Speech Data from a 12-week Observational Study with Daily Measurements.
Robert Lewis, Szymon Fedor, Nelson Hidalgo Julia, Joshua Curtiss, Jiyeon Kim, Noah Jones, David Mischoulon, Thomas F. Quatieri, Nicholas Cummins, Paola Pedrelli, Rosalind W. Picard
2025ToxicTone: A Mandarin Audio Dataset Annotated for Toxicity and Toxic Utterance Tonality.
Yu-Xiang Luo, Yi-Cheng Lin, Ming-To Chuang, Jia-Hung Chen, I-Ning Tsai, Pei Xing Kiew, Yueh-Hsuan Huang, Chien-Feng Liu, Yu-Chen Chen, Bo-Han Feng, Wenze Ren, Hung-yi Lee
2025Tracking /r/ Deletion: Forced Alignment of Pronunciation Variants and Sociophonetic Insights into Post-Obstruent Final /r/ in French.
Anisia Popescu, Lori Lamel, Marc Evrard, Ioana Vasilescu
2025Training Articulatory Inversion Models for Interspeaker Consistency.
Charles McGhee, Mark J. F. Gales, Kate M. Knill
2025Training Onset-and-Offset-Aware Sound Event Detection on a Heterogeneous Dataset via Probabilistic Sequential Modeling.
Tomoya Yoshinaga, Yoshiaki Bando, Keitaro Tanaka, Keisuke Imoto, Masaki Onishi, Shigeo Morishima
2025Training-Free Voice Conversion with Factorized Optimal Transport.
Alexander Lobashev, Assel Yermekova, Maria A. Larchenko
2025Transcribing Diverse Voices: Using Whisper for ICE corpora.
Andreas Weilinghoff
2025Transcribing Oral History Recordings Using the Transcription Portal.
Christoph Draxler, Julian Pömp, Henk van den Heuvel, Fabio Ardolino, Arjan van Hessen
2025Transcript-Prompted Whisper with Dictionary-Enhanced Decoding for Japanese Speech Annotation.
Rui Hu, Xiaolong Lin, Jiawang Liu, Shixi Huang, Zhenpeng Zhan
2025Triadic Multi-party Voice Activity Projection for Turn-taking in Spoken Dialogue Systems.
Mikey Elmers, Koji Inoue, Divesh Lala, Tatsuya Kawahara
2025Tungnaá In Live Performance: An Implementation Of Interactive Artistic Text-To-Voice.
Victor Shepardson, Jonathan Reus, Thor Magnusson
2025Turing's Echo: Investigating Linguistic Sensitivity of Deepfake Voice Detection via Gamification.
Binh Nguyen, Thai Le
2025U-SAM: An Audio Language Model for Unified Speech, Audio, and Music Understanding.
Ziqian Wang, Xianjun Xia, Xinfa Zhu, Lei Xie
2025Ultra-Low Bit Post-Training Quantization of Large Speech Models via K-Means Clustering and Mixed Precision Allocation.
Tianteng Gu, Bei Liu, Haoyu Wang, Yanmin Qian
2025Understanding Dementia Speech Alignment with Diffusion-Based Image Generation.
Mansi, Anastasios Lepipas, Dominika C. Woszczyk, Yiying Guan, Soteris Demetriou
2025Unfolding A Few Structures for The Many: Memory-Efficient Compression of Conformer and Speech Foundation Models.
Zhaoqing Li, Haoning Xu, Xurong Xie, Zengrui Jin, Tianzi Wang, Xunying Liu
2025Uni-VERSA: Versatile Speech Assessment with a Unified Network.
Jiatong Shi, Hye-jin Shim, Shinji Watanabe
2025Unified Audio-Visual Modeling for Recognizing Which Face Spoke When and What in Multi-Talker Overlapped Speech and Video.
Naoki Makishima, Naotaka Kawata, Taiga Yamane, Mana Ihori, Tomohiro Tanaka, Satoshi Suzuki, Shota Orihashi, Ryo Masumura
2025Unified Microphone Conversion: Many-to-Many Device Mapping via Feature-wise Linear Modulation.
Myeonghoon Ryu, Hongseok Oh, Suji Lee, Han Park
2025Unified Semi-Supervised Pipeline for Automatic Speech Recognition.
Nune Tadevosyan, Nikolay Karpov, Andrei Andrusenko, Vitaly Lavrukhin, Ante Jukic
2025Unified Text and Speaker Verification using SSL model for Text-Dependent Speaker Verification.
Nathan Griot, Driss Matrouf, Raphaël Blouet, Jean-François Bonastre, Ana Mantecon
2025Unified Variational and Physics-aware Model for Room Impulse Response Estimation.
Louis Lalay, Mathieu Fontaine, Roland Badeau
2025Unifying Listener Scoring Scales: Comparison Learning Framework for Speech Quality Assessment and Continuous Speech Emotion Recognition.
Cheng-Hung Hu, Yusuke Yasuda, Akifumi Yoshimoto, Tomoki Toda
2025Universal Preference-Score-based Pairwise Speech Quality Assessment.
Yu-Fei Shi, Yang Ai, Zhen-Hua Ling
2025Universal Semantic Disentangled Privacy-preserving Speech Representation Learning.
Biel Tura Vecino, Subhadeep Maji, Aravind Varier, Antonio Bonafonte, Ivan Valles, Michael Owen, Constantinos Papayiannis, Leif Rädel, Grant P. Strimel, Oluwaseyi Feyisetan, Roberto Barra-Chicote, Ariya Rastrow, Volker Leutnant, Trevor Wood
2025Universal Speech Enhancement with Regression and Generative Mamba.
Rong Chao, Rauf Nasretdinov, Yu-Chiang Frank Wang, Ante Jukic, Szu-Wei Fu, Yu Tsao
2025Unlearning LLM-Based Speech Recognition Models.
Zhe Liu
2025Unleashing the Inner Monster: Demonstrating High-Fidelity Human to Non-Human Voice Conversion.
Namhyun Cho, Sunmin Kim, Minsu Kang, Seolhee Lee, Choonghyeon Lee, Yangsun Lee
2025Unlocking Temporal Flexibility: Neural Speech Codec with Variable Frame Rate.
Hanglei Zhang, Yiwei Guo, Zhihan Li, Xiang Hao, Xie Chen, Kai Yu
2025Unmasking real-world audio deepfakes: A data-centric approach.
David Combei, Adriana Stan, Dan Oneata, Nicolas M. Müller, Horia Cucu
2025Unsupervised Rhythm and Voice Conversion to Improve ASR on Dysarthric Speech.
Karl El Hajal, Enno Hermann, Sevada Hovsepyan, Mathew Magimai-Doss
2025Unveiling Audio Deepfake Origins: A Deep Metric learning And Conformer Network Approach With Ensemble Fusion.
Ajinkya Kulkarni, Sandipana Dowerah, Tanel Alumäe, Mathew Magimai-Doss
2025Using Neurogram Similarity Index Measure (NSIM) to Model Hearing Loss and Cochlear Neural Degeneration.
Ahsan J. Cheema, Sunil Puria
2025Using and comprehending language in face-to-face conversation.
Judith Holler
2025Using gender, phonation and age to interpret automatically discovered speech attributes for explainable speaker recognition.
Carole Millot, Clara Ponchard, Cédric Gendrot, Jean-François Bonastre, Orane Dufour
2025VCapAV: A Video-Caption Based Audio-Visual Deepfake Detection Dataset.
Yuxi Wang, Yikang Wang, Qishan Zhang, Hiromitsu Nishizaki, Ming Li
2025VIB-based Real Pre-emphasis Audio Deepfake Source Tracing.
Thien-Phuc Doan, Kihun Hong, Souhwan Jung
2025VS-Singer: Vision-Guided Stereo Singing Voice Synthesis with Consistency Schrödinger Bridge.
Zijing Zhao, Kai Wang, Hao Huang, Ying Hu, Liang He, Jichen Yang
2025Variability in Intervocalic /t/ and Community Diversity in Australian English.
Hannah White, Joshua Penney, Felicity Cox
2025Variability in performance across four generations of automatic speaker recognition systems.
Lauren Harrington, Vincent Hughes, Philip Harrison, Paul Foulkes, Jessica Wormald, Finnian Kelly, David van der Vloed
2025Vector Quantized Cross-lingual Unsupervised Domain Adaptation for Speech Emotion Recognition.
Pravin Mote, Donita Robinson, Elizabeth Richerson, Carlos Busso
2025Vela: Scalable Embeddings with Voice Large Language Models for Multimodal Retrieval.
Ruofan Hu, Yan Xia, Minjie Hong, Jieming Zhu, Bo Chen, Xiaoda Yang, Minghui Fang, Tao Jin
2025ViCocktail: Automated Multi-Modal Data Collection for Vietnamese Audio-Visual Speech Recognition.
Thai-Binh Nguyen, Thi Van Nguyen, Quoc Truong Do, Chi Mai Luong
2025ViToSA: Audio-Based Toxic Spans Detection on Vietnamese Speech Utterances.
Huy Ba Do, Vy Le-Phuong Huynh, Luan Thanh Nguyen
2025VibE-SVC: Vibrato Extraction with High-frequency F0 Contour for Singing Voice Conversion.
Joon-Seung Choi, Dong-Min Byun, Hyung-Seok Oh, Seong-Whan Lee
2025Video-to-Audio Generation with Fine-grained Temporal Semantics.
Yuchen Hu, Yu Gu, Chenxing Li, Rilin Chen, Dong Yu
2025VietASR: Achieving Industry-level Vietnamese ASR with 50-hour labeled data and Large-Scale Speech Pretraining.
Jianheng Zhuo, Yifan Yang, Yiwen Shao, Yong Xu, Dong Yu, Kai Yu, Xie Chen
2025Vision-Integrated High-Quality Neural Speech Coding.
Yao Guo, Yang Ai, Rui-Chen Zheng, Hui-Peng Du, Xiao-Hang Jiang, Zhen-Hua Ling
2025Visual Cues Support Robust Turn-taking Prediction in Noise.
Sam O'Connor Russell, Naomi Harte
2025Visual features of the oral region in Polish sibilants produced by children with various sibilance patterns.
Agata Sage, Zuzanna Miodonska, Michal Krecichwost, Ewa Kwasniok, Pawel Badura
2025VisualSpeech: Enhancing Prosody Modeling in TTS Using Video.
Shumin Que, Anton Ragni
2025Visually-Adaptive Guided Robust Speech Recognition with Parameter-Efficient Adaptation.
Zhao Yang, Rui Jiang, Yue Heng Yeo, Xiao Fu, Wei Xi, Jizhong Zhao
2025Vo-Ve: An Explainable Voice-Vector for Speaker Identity Evaluation.
Jaejun Lee, Kyogu Lee
2025Vocal-tract model with two directions: Static design for a dummy head and dynamic design for a speaking machine.
Takayuki Arai
2025VocalAgent: Large Language Models for Vocal Health Diagnostics with Safety-Aware Evaluation.
Yubin Kim, Taehan Kim, Wonjune Kang, Eugene Park, Joonsik Yoon, Dongjae Lee, Xin Liu, Daniel McDuff, Hyeonhoon Lee, Cynthia Breazeal, Hae Won Park
2025Vocoder-Projected Feature Discriminator.
Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Yuto Kondo
2025Voice Activity-based Text Segmentation for ASR Text Denormalization.
Sashi Novitasari, Takashi Fukuda, Gakuto Kurata
2025Voice Adaptation for Swiss German.
Samuel Stucki, Jan Deriu, Mark Cieliebak
2025Voice Conversion Improves Cross-Domain Robustness for Spoken Arabic Dialect Identification.
Badr M. Abdullah, Matthew Baas, Bernd Möbius, Dietrich Klakow
2025Voice Conversion for Likability Control via Automated Rating of Speech Synthesis Corpora.
Hitoshi Suda, Shinnosuke Takamichi, Satoru Fukayama
2025Voice Impression Control in Zero-Shot TTS.
Kenichi Fujita, Shota Horiguchi, Yusuke Ijima
2025Voice Quality Dimensions as Interpretable Primitives for Speaking Style for Atypical Speech and Affect.
Jaya Narain, Vasudha Kowtha, Colin Lea, Lauren Tooley, Dianna Yee, Vikramjit Mitra, Zifang Huang, Miquel Espi Marques, Jon Huang, Carlos Avendaño, Shirley Ren
2025Voice Reconstruction through Large-Scale TTS Models: Comparing Zero-Shot and Fine-tuning Approaches to Personalise TTS in Assistive Communication.
Éva Székely, Péter Mihajlik, Máté Soma Kádár, László Tóth
2025Voice-Based Dysphagia Detection: Leveraging Self-Supervised Speech Representation.
Injune Hwang, Jung-Min Kim, Ju Seok Ryu, Kyogu Lee
2025Voice-ENHANCE: Speech Restoration using a Diffusion-based Voice Conversion Framework.
Kyungguen Byun, Jason Filos, Erik Visser, Sunkuk Moon
2025VoiceMark: Zero-Shot Voice Cloning-Resistant Watermarking Approach Leveraging Speaker-Specific Latents.
Haiyun Li, Zhiyong Wu, Xiaofeng Xie, Jingran Xie, Yaoxun Xu, Hanyang Peng
2025VoiceNet: Multilingual On-Device Phoneme-To-Audio Alignment.
Kun Jin, Siva Penke, Srinivasa Algubelli
2025VoiceNoNG: Robust High-Quality Speech Editing Model without Hallucinations.
Sung-Feng Huang, Heng-Cheng Kuo, Zhehuai Chen, Xuesong Yang, Pin-Jui Ku, Ante Jukic, Huck Yang, Yu Tsao, Yu-Chiang Frank Wang, Hung-yi Lee, Szu-Wei Fu
2025VoiceQualityVC: A Voice Conversion System for Studying the Perceptual Effects of Voice Quality in Speech.
Harm Lameris, Joakim Gustafsson, Éva Székely
2025Voices of 'cyborg awesomeness': Posthuman embodiment of nonbinary gender expression in AI speech technologies.
Maxwell Hope, Éva Székely
2025VoxAging: Continuously Tracking Speaker Aging with a Large-Scale Longitudinal Dataset in English and Mandarin.
Zhiqi Ai, Meixuan Bao, Zhiyong Chen, Zhi Yang, Xinnuo Li, Shugong Xu
2025Voxplorer: Voice data exploration and projection in an interactive dashboard.
Alessandro De Luca, Srikanth R. Madikeri, Volker Dellwo
2025WAKE: Watermarking Audio with Key Enrichment.
Yaoxun Xu, Jianwei Yu, Hangting Chen, Zhiyong Wu, Xixin Wu, Dong Yu, Rongzhi Gu, Yi Luo
2025WCTC-Biasing: Retraining-free Contextual Biasing ASR with Wildcard CTC-based Keyword Spotting and Inter-layer Biasing.
Yu Nakagome, Michael Hentschel
2025WIND: Accelerated RNN-T Decoding with Windowed Inference for Non-blank Detection.
Hainan Xu, Vladimir Bataev, Lilit Grigoryan, Boris Ginsburg
2025WTFormer: A Wavelet Conformer Network for MIMO Speech Enhancement with Spatial Cues Peservation.
Lu Han, Junqi Zhao, Renhua Peng
2025WavShape: Information-Theoretic Speech Representation Learning for Fair and Privacy-Aware Audio Processing.
Oguzhan Baser, Ahmet Ege Tanriverdi, Kaan Kale, Sandeep Chinchali, Sriram Vishwanath
2025Weakly Supervised Data Refinement and Flexible Sequence Compression for Efficient Thai LLM-based ASR.
Mingchen Shao, Xinfa Zhu, Chengyou Wang, Bingshen Mu, Hai Li, Ying Yan, Junhui Liu, Danming Xie, Lei Xie
2025Web-Based Application for Real-Time Biofeedback of Vocal Resonance in Gender-Affirming Voice Training: Design and Usability Evaluation.
Tara McAllister, Collin Eagen, Yi Shan, Peter Traver, Daphna Harel, Tae Hong Park, Vesna D. Novak
2025Weight Factorization and Centralization for Continual Learning in Speech Recognition.
Enes Yavuz Ugan, Ngoc-Quan Pham, Alexander Waibel
2025What Do Humans Hear When Interacting? Experiments on Selective Listening for Evaluating ASR of Spoken Dialogue Systems.
Kiyotada Mori, Seiya Kawano, Chaoran Liu, Carlos Toshinori Ishi, Angel F. Garcia Contreras, Koichiro Yoshino
2025What do self-supervised speech models know about Dutch? Analyzing advantages of language-specific pre-training.
Marianne de Heer Kloots, Hosein Mohebbi, Charlotte Pouw, Gaofei Shen, Willem H. Zuidema, Martijn Bentum
2025What the Filler? Both ASR Systems and Humans Struggle More With Other Kinds of Disfluencies Than With Filler Particles.
Saskia Wepner, Lucas Eckert, Gernot Kubin, Barbara Schuppler
2025When Humans Growl and Birds Speak: High-Fidelity Voice Conversion from Human to Animal and Designed Sounds.
Minsu Kang, Seolhee Lee, Choonghyeon Lee, Namhyun Cho
2025When The MOS Predictor Asks For Training Annotation In Cross Lingual/Domain Adaptation.
Natacha Miniconi, Meysam Shamsi, Anthony Larcher
2025When focus shapes the flow: prosodic restructuring in Mandarin complex nominals.
Anqi Xu, Yu-Yin Hsu
2025WhiStress: Enriching Transcriptions with Sentence Stress Detection.
Iddo Yosha, Dorin Shteyman, Yossi Adi
2025Whilter: A Whisper-based Data Filter for "In-the-Wild" Speech Corpora Using Utterance-level Multi-Task Classification.
William Ravenscroft, George Close, Kit Bower-Morris, Jamie Stacey, Dmitry Sityaev, Kris Y. Hong
2025Whisper-Based Multilingual Alzheimer's Disease Detection and Improvements for Low-Resource Language.
Kaichen Jia, Jinpeng Li, Ke Li, Wei-Qiang Zhang
2025WhisperD: Dementia Speech Recognition and Filler Word Detection with Whisper.
Emmanuel Akinrintoyo, Nadine Abdelhalim, Nicole Salomons
2025WhisperMSS: A Two-Stage Framework for Mandarin Singing Transcription and Segmentation Using Pretrained Models.
Ruoxuan Liang, Xiangjian Zeng, Zhen Liu, Qingqiang Wu, Ruichen Zhang, Le Ren
2025Who Gets the Mic? Investigating Gender Bias in the Speaker Assignment of a Speech-LLM.
Dariia Puhach, Amir H. Payberah, Éva Székely
2025Who knows best? Effects of speech disfluencies on incentivized decision-making.
Ambika Kirkland, Jens Edlund
2025Who, When, and What: Leveraging the "Three Ws" Concept for Emotion Recognition in Conversation.
Xiaohan Shi, Xingfeng Li, Tomoki Toda
2025Why is children's ASR so difficult? Analyzing children's phonological error patterns using SSL-based phoneme recognizers.
Koharu Horii, Naohiro Tawara, Atsunori Ogawa, Shoko Araki
2025Word Level Timestamp Generation for Automatic Speech Recognition and Translation.
Ke Hu, Krishna C. Puvvada, Elena Rastorgueva, Zhehuai Chen, He Huang, Shuoyang Ding, Kunal Dhawan, Hainan Xu, Jagadeesh Balam, Boris Ginsburg
2025Word stress in self-supervised speech models: A cross-linguistic comparison.
Martijn Bentum, Louis ten Bosch, Tomas O. Lentz
2025Word-Level Error Analysis in Decoding Systems: From Speech Recognition to Brain-Computer Interfaces.
Jingya Huang, Aashish N. Patel, Sowmya Manojna Narasimha, Gal Mishne, Vikash Gilja
2025X-ARES: A Comprehensive Framework for Assessing Audio Encoder Performance.
Junbo Zhang, Heinrich Dinkel, Yadong Niu, Chenyu Liu, Si Cheng, Anbei Zhao, Jian Luan
2025You Are What You Say: Exploiting Linguistic Content for VoicePrivacy Attacks.
Ünal Ege Gaznepoglu, Anna Leschanowsky, Ahmad Aloradi, Prachi Singh, Daniel Tenbrinck, Emanuël A. P. Habets, Nils Peters
2025ZSDEVC: Zero-Shot Diffusion-based Emotional Voice Conversion with Disentangled Mechanism.
Hsing-Hang Chou, Yun-Shao Lin, Ching-Chin Sung, Yu Tsao, Chi-Chun Lee
2025Zero-Shot Learning for Acoustic Event Classification Using an Attribute Vector and Conditional GAN.
Kohei Uehara, Ryoichi Takashima, Tetsuya Takiguchi
2025Zero-Shot Mono-to-Binaural Speech Synthesis.
Alon Levkovitch, Julian Salazar, Soroosh Mariooryad, R. J. Skerry-Ryan, Nadav Bar, W. Bastiaan Kleijn, Eliya Nachmani
2025Zero-Shot Speech-Based Depression and Anxiety Assessment with LLMs.
Erfan Loweimi, Sofia de la Fuente Garcia, Saturnino Luz
2025xLSTM-SENet: xLSTM for Single-Channel Speech Enhancement.
Nikolai Lund Kühne, Jan Østergaard, Jesper Jensen, Zheng-Hua Tan