| 2023 | A Comparative Study of Voice Conversion Models With Large-Scale Speech and Singing Data: The T13 Systems for the Singing Voice Conversion Challenge 2023. Ryuichi Yamamoto, Reo Yoneyama, Lester Phillip Violeta, Wen-Chin Huang, Tomoki Toda |
| 2023 | A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, And Extraction. Kohei Saijo, Wangyou Zhang, Zhong-Qiu Wang, Shinji Watanabe, Tetsunori Kobayashi, Tetsuji Ogawa |
| 2023 | A Token-Wise Beam Search Algorithm for RNN-T. Gil Keren |
| 2023 | A Weakly-Supervised Streaming Multilingual Speech Model with Truly Zero-Shot Capability. Jian Xue, Peidong Wang, Jinyu Li, Eric Sun |
| 2023 | AWMC: Online Test-Time Adaptation Without Mode Collapse for Continual Adaptation. Jae-Hong Lee, Do-Hee Kim, Joon-Hyuk Chang |
| 2023 | Acoustic Model Fusion For End-to-End Speech Recognition. Zhihong Lei, Mingbin Xu, Shiyi Han, Leo Liu, Zhen Huang, Tim Ng, Yuanyuan Zhang, Ernest Pusateri, Mirko Hannemann, Yaqiao Deng, Man-Hung Siu |
| 2023 | Acoustics-Text Dual-Modal Joint Representation Learning for Cover Song Identification. Yanmei Gu, Jing Li, Jiayi Zhou, Zhiming Wang, Huijia Zhu |
| 2023 | Adapting Pretrained Speech Model for Mandarin Lyrics Transcription and Alignment. Jun-You Wang, Chon-In Leong, Yu-Chen Lin, Li Su, Jyh-Shing Roger Jang |
| 2023 | Adversarial Augmentation For Adapter Learning. Jen-Tzung Chien, Wei-Yu Sun |
| 2023 | After: Active Learning Based Fine-Tuning Framework for Speech Emotion Recognition. Dongyuan Li, Yusong Wang, Kotaro Funakoshi, Manabu Okumura |
| 2023 | An Exploration of Task-Decoupling on Two-Stage Neural Post Filter for Real-Time Personalized Acoustic Echo Cancellation. Zihan Zhang, Jiayao Sun, Xianjun Xia, Ziqian Wang, Xiaopeng Yan, Yijian Xiao, Lei Xie |
| 2023 | Audio-Adapterfusion: A Task-Id-Free Approach for Efficient and Non-Destructive Multi-Task Speech Recognition. Hillary Ngai, Rohan Agrawal, Neeraj Gaur, W. Ronny Huang, Parisa Haghani, Pedro Moreno Mengibar |
| 2023 | Audio-Visual Neural Syntax Acquisition. Cheng-I Jeff Lai, Freda Shi, Puyuan Peng, Yoon Kim, Kevin Gimpel, Shiyu Chang, Yung-Sung Chuang, Saurabhchand Bhati, David D. Cox, David Harwath, Yang Zhang, Karen Livescu, James R. Glass |
| 2023 | Av-Data2Vec: Self-Supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations. Jiachen Lian, Alexei Baevski, Wei-Ning Hsu, Michael Auli |
| 2023 | BA-MoE: Boundary-Aware Mixture-of-Experts Adapter for Code-Switching Speech Recognition. Peikun Chen, Fan Yu, Yuhao Liang, Hongfei Xue, Xucheng Wan, Naijun Zheng, Huan Zhou, Lei Xie |
| 2023 | Bisinger: Bilingual Singing Voice Synthesis. Huali Zhou, Yueqian Lin, Yao Shi, Peng Sun, Ming Li |
| 2023 | Boosting Modality Representation With Pre-Trained Models and Multi-Task Training for Multimodal Sentiment Analysis. Jiarui Hai, Yu-Jeh Liu, Mounya Elhilali |
| 2023 | Brouhaha: Multi-Task Training for Voice Activity Detection, Speech-to-Noise Ratio, and C50 Room Acoustics Estimation. Marvin Lavechin, Marianne Métais, Hadrien Titeux, Alodie Boissonnet, Jade Copet, Morgane Rivière, Elika Bergelson, Alejandrina Cristià, Emmanuel Dupoux, Hervé Bredin |
| 2023 | Building High-Accuracy Multilingual ASR With Gated Language Experts and Curriculum Training. Eric Sun, Jinyu Li, Yuxuan Hu, Yimeng Zhu, Long Zhou, Jian Xue, Peidong Wang, Linquan Liu, Shujie Liu, Edward Lin, Yifan Gong |
| 2023 | CAMSAT: Augmentation Mix and Self-Augmented Training Clustering for Self-Supervised Speaker Recognition. Abderrahim Fathan, Jahangir Alam |
| 2023 | COCO-NUT: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-Based Control. Aya Watanabe, Shinnosuke Takamichi, Yuki Saito, Wataru Nakata, Detai Xin, Hiroshi Saruwatari |
| 2023 | CTC Blank Triggered Dynamic Layer-Skipping for Efficient Ctc-Based Speech Recognition. Junfeng Hou, Peiyao Wang, Jincheng Zhang, Meng Yang, Minwei Feng, Jingcheng Yin |
| 2023 | Can Unpaired Textual Data Replace Synthetic Speech in ASR Model Adaptation? Pasquale D'Alterio, Christian Hensel, Bashar Awwad Shiekh Hasan |
| 2023 | Can We Use Speaker Embeddings On Spontaneous Speech Obtained From Medical Conversations To Predict Intelligibility? Sebastião Quintas, Mathieu Balaguer, Julie Mauclair, Virginie Woisard, Julien Pinquier |
| 2023 | Clustering Unsupervised Representations as Defense Against Poisoning Attacks on Speech Commands Classification System. Thomas Thebaud, Sonal Joshi, Henry Li, Martin Sustek, Jesús Villalba, Sanjeev Khudanpur, Najim Dehak |
| 2023 | Combining Relative and Absolute Learning Formulations to Predict Emotional Attributes From Speech. Abinay Reddy Naini, Shruthi Subramanium, Seong-Gyun Leem, Carlos Busso |
| 2023 | Consistency Based Unsupervised Self-Training for ASR Personalisation. Jisi Zhang, Vandana Rajan, Haaris Mehmood, David Tuckey, Pablo Peso Parada, Md Asif Jalal, Karthikeyan Saravanan, Gil Ho Lee, Jungin Lee, Seokyeong Jung |
| 2023 | Contextual Spelling Correction with Large Language Models. Gan Song, Zelin Wu, Golan Pundak, Angad Chandorkar, Kandarp Joshi, Xavier Velez, Diamantino Caseiro, Ben Haynor, Weiran Wang, Nikhil Siddhartha, Pat Rondon, Khe Chai Sim |
| 2023 | Cross-Modal Alignment With Optimal Transport For CTC-Based ASR. Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai |
| 2023 | Cross-Modal Learning for CTC-Based ASR: Leveraging CTC-Bertscore and Sequence-Level Training. Mun-Hak Lee, Sang-Eon Lee, Ji-Eun Choi, Joon-Hyuk Chang |
| 2023 | Crosssinger: A Cross-Lingual Multi-Singer High-Fidelity Singing Voice Synthesizer Trained on Monolingual Singers. Xintong Wang, Chang Zeng, Jun Chen, Chunhui Wang |
| 2023 | Deep Learning for Joint Acoustic Echo and Acoustic Howling Suppression in Hybrid Meetings. Hao Zhang, Meng Yu, Dong Yu |
| 2023 | Deriving Translational Acoustic Sub-Word Embeddings. Amit Meghanani, Thomas Hain |
| 2023 | Detecting Speech Abnormalities With a Perceiver-Based Sequence Classifier that Leverages a Universal Speech Model. Hagen Soltau, Izhak Shafran, Alex Ottenwess, Joseph R. Duffy, Rene L. Utianski, Leland R. Barnard, John L. Stricker, Daniela A. Wiepert, David T. Jones, Hugo Botha |
| 2023 | Detection of Vowel Errors in Children's Speech using Synthetic Phonetic Transcripts. Ilja Baumann, Dominik Wagner, Korbinian Riedhammer, Elmar Nöth, Tobias Bocklet |
| 2023 | Dialect Adaptation and Data Augmentation for Low-Resource ASR: Taltech Systems for the Madasr 2023 Challenge. Tanel Alumäe, Jiaming Kong, Daniil Robnikov |
| 2023 | Diffusion-Based Mel-Spectrogram Enhancement for Personalized Speech Synthesis with Found Data. Yusheng Tian, Wei Liu, Tan Lee |
| 2023 | Discriminative Speech Recognition Rescoring With Pre-Trained Language Models. Prashanth Gurunath Shivakumar, Jari Kolehmainen, Yile Gu, Ankur Gandhe, Ariya Rastrow, Ivan Bulyko |
| 2023 | Domain Adaptation by Data Distribution Matching Via Submodularity For Speech Recognition. Yusuke Shinohara, Shinji Watanabe |
| 2023 | E3 TTS: Easy End-to-End Diffusion-Based Text To Speech. Yuan Gao, Nobuyuki Morioka, Yu Zhang, Nanxin Chen |
| 2023 | ECAPA2: A Hybrid Neural Network Architecture and Training Strategy for Robust Speaker Embeddings. Jenthe Thienpondt, Kris Demuynck |
| 2023 | ED-CEC: Improving Rare word Recognition Using ASR Postprocessing Based on Error Detection and Context-Aware Error Correction. Jiajun He, Zekun Yang, Tomoki Toda |
| 2023 | Efficient Cascaded Streaming ASR System Via Frame Rate Reduction. Xingyu Cai, David Qiu, Shaojin Ding, Dongseong Hwang, Weiran Wang, Antoine Bruguier, Rohit Prabhavalkar, Tara N. Sainath, Yanzhang He |
| 2023 | Efficient Text-Only Domain Adaptation For CTC-Based ASR. Chang Chen, Xun Gong, Yanmin Qian |
| 2023 | Enabling Noisy Label Usage for Out-of-Airspace Data in Read-Back Error Detection. Lakshmi Rajendram Bashyam, Alexander Blatt, Dietrich Klakow |
| 2023 | End-To-End Training of a Neural HMM with Label and Transition Probabilities. Daniel Mann, Tina Raissi, Wilfried Michel, Ralf Schlüter, Hermann Ney |
| 2023 | End-to-End Multichannel Speaker-Attributed ASR: Speaker Guided Decoder and Input Feature Analysis. Can Cui, Imran A. Sheikh, Mostafa Sadeghi, Emmanuel Vincent |
| 2023 | Ending the Blind Flight: Analyzing the Impact of Acoustic and Lexical Factors on WAV2VEC 2.0 in Air-Traffic Control. Alexander Blatt, Badr M. Abdullah, Dietrich Klakow |
| 2023 | Enhancing Expressivity Transfer in Textless Speech-to-Speech Translation. Jarod Duret, Benjamin O'Brien, Yannick Estève, Titouan Parcollet |
| 2023 | Enhancing Task-Oriented Dialogues With Chitchat: A Comparative Study Based on Lexical Diversity And Divergence. Armand Stricker, Patrick Paroubek |
| 2023 | Espnet-Summ: Introducing a Novel Large Dataset, Toolkit, and a Cross-Corpora Evaluation of Speech Summarization Systems. Roshan S. Sharma, William Chen, Takatomo Kano, Ruchira Sharma, Siddhant Arora, Shinji Watanabe, Atsunori Ogawa, Marc Delcroix, Rita Singh, Bhiksha Raj |
| 2023 | Evaluating Self-Supervised Speech Models on a Taiwanese Hokkien Corpus. Yi-Hui Chou, Kalvin Chang, Meng-Ju Wu, Winston Ou, Alice Wen-Hsin Bi, Carol Yang, Bryan Y. Chen, Rong-Wei Pai, Po-Yen Yeh, Jo-Peng Chiang, Iu-Tshian Phoann, Winnie Chang, Chenxuan Cui, Noel Chen, Jiatong Shi |
| 2023 | Exploring Data Augmentation in Bias Mitigation Against Non-Native-Accented Speech. Yuanyuan Zhang, Aaricia Herygers, Tanvina Patel, Zhengjun Yue, Odette Scharenborg |
| 2023 | Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition. Yujin Wang, Changli Tang, Ziyang Ma, Zhisheng Zheng, Xie Chen, Wei-Qiang Zhang |
| 2023 | Exploring Time-Frequency Domain Target Speaker Extraction For Causal and Non-Causal Processing. Wangyou Zhang, Lei Yang, Yanmin Qian |
| 2023 | Exploring the Viability of Synthetic Audio Data for Audio-Based Dialogue State Tracking. Jihyun Lee, Yejin Jeon, Wonjun Lee, Yunsu Kim, Gary Geunbae Lee |
| 2023 | Extending Self-Distilled Self-Supervised Learning For Semi-Supervised Speaker Verification. Jeong-Hwan Choi, Jehyun Kyung, Ju-Seok Seong, Ye-Rin Jeoung, Joon-Hyuk Chang |
| 2023 | FAT-HuBERT: Front-End Adaptive Training of Hidden-Unit BERT For Distortion-Invariant Robust Speech Recognition. Dongning Yang, Wei Wang, Yanmin Qian |
| 2023 | Fast Conformer With Linearly Scalable Attention For Efficient Speech Recognition. Dima Rekesh, Nithin Rao Koluguri, Samuel Kriman, Somshubra Majumdar, Vahid Noroozi, He Huang, Oleksii Hrinchuk, Krishna C. Puvvada, Ankur Kumar, Jagadeesh Balam, Boris Ginsburg |
| 2023 | Fast-Hubert: an Efficient Training Framework for Self-Supervised Speech Representation Learning. Guanrou Yang, Ziyang Ma, Zhisheng Zheng, Yakun Song, Zhikang Niu, Xie Chen |
| 2023 | FedCPC: An Effective Federated Contrastive Learning Method for Privacy Preserving Early-Stage Alzheimers Speech Detection. Wenqing Wei, Zhengdong Yang, Yuan Gao, Jiyi Li, Chenhui Chu, Shogo Okada, Sheng Li |
| 2023 | Few-Shot Spoken Language Understanding Via Joint Speech-Text Models. Chung-Ming Chien, Mingjiamei Zhang, Ju-Chieh Chou, Karen Livescu |
| 2023 | Findings of the 2023 ML-Superb Challenge: Pre-Training And Evaluation Over More Languages And Beyond. Jiatong Shi, William Chen, Dan Berrebbi, Hsiu-Hsuan Wang, Wei-Ping Huang, En-Pei Hu, Ho-Lam Chuang, Xuankai Chang, Yuxun Tang, Shang-wen Li, Abdelrahman Mohamed, Hung-yi Lee, Shinji Watanabe |
| 2023 | Flap: Fast Language-Audio Pre-Training. Ching-Feng Yeh, Po-Yao Huang, Vasu Sharma, Shang-wen Li, Gargi Ghosh |
| 2023 | GPU-Accelerated Wfst Beam Search Decoder for CTC-Based Speech Recognition. Daniel Galvez, Tim Kaldewey |
| 2023 | Gated Multi Encoders and Multitask Objectives for Dialectal Speech Recognition in Indian Languages. Sathvik Udupa, Jesuraja Bandekar, Deekshitha G, Saurabh Kumar, Prasanta Kumar Ghosh, Sandhya Badiger, Abhayjeet Singh, Savitha Murthy, Priyanka Pai, Srinivasa Raghavan K. M., Raoul Nanavati |
| 2023 | Generalized Zero-Shot Audio-to-Intent Classification. Veera Raghavendra Elluru, Devang Kulshreshtha, Rohit Paturi, Sravan Bodapati, Srikanth Ronanki |
| 2023 | Generative Linguistic Representation for Spoken Language Identification. Peng Shen, Xuguang Lu, Hisashi Kawai |
| 2023 | Generative Speech Recognition Error Correction With Large Language Models and Task-Activating Prompting. Chao-Han Huck Yang, Yile Gu, Yi-Chieh Liu, Shalini Ghosh, Ivan Bulyko, Andreas Stolcke |
| 2023 | HEVAL: A New Hybrid Evaluation Metric for Automatic Speech Recognition Tasks. Zitha Sasindran, Harsha Yelchuri, T. V. Prabhakar, Supreeth Rao |
| 2023 | HIGNN-TTS: Hierarchical Prosody Modeling With Graph Neural Networks for Expressive Long-Form TTS. Dake Guo, Xinfa Zhu, Liumeng Xue, Tao Li, Yuanjun Lv, Yuepeng Jiang, Lei Xie |
| 2023 | Haha-POD: An Attempt for Laughter-Based Non-Verbal Speaker Verification. Yuke Lin, Xiaoyi Qin, Ning Jiang, Guoqing Zhao, Ming Li |
| 2023 | Hierarchical Attention-Based Contextual Biasing For Personalized Speech Recognition Using Neural Transducers. Sibo Tong, Philip Harding, Simon Wiesler |
| 2023 | IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023, Taipei, Taiwan, December 16-20, 2023 |
| 2023 | Identifying People with Mild Cognitive Impairment at Risk of Developing Dementia using Speech Analysis. Bahman Mirheidari, Ronan O'Malley, Daniel Blackburn, Heidi Christensen |
| 2023 | Importance of Smoothness Induced by Optimizers in Fl4Asr: Towards Understanding Federated Learning for End-To-End ASR. Sheikh Shams Azam, Tatiana Likhomanenko, Martin Pelikan, Jan Honza Silovsky |
| 2023 | Improved Long-Form Speech Recognition By Jointly Modeling The Primary And Non-Primary Speakers. Guru Prakash Arumugam, Shuo-Yiin Chang, Tara N. Sainath, Rohit Prabhavalkar, Quan Wang, Shaan Bijwadia |
| 2023 | Improved Multi-Modal Emotion Recognition Using Squeeze-and-Excitation Block in Cross-Modal Attention. Junchen Liu, Jesin James, Karan Nathwani |
| 2023 | Improving Audiovisual Active Speaker Detection in Egocentric Recordings with the Data-Efficient Image Transformer. Jason Clarke, Yoshihiko Gotoh, Stefan Goetze |
| 2023 | Improving Large-Scale Deep Biasing With Phoneme Features and Text-Only Data in Streaming Transducer. Jin Qiu, Lu Huang, Boyu Li, Jun Zhang, Lu Lu, Zejun Ma |
| 2023 | Improving Multilingual and Code-Switching ASR Using Large Language Model Generated Text. Ke Hu, Tara N. Sainath, Bo Li, Yu Zhang, Yong Cheng, Tao Wang, Yujing Zhang, Frederick Liu |
| 2023 | Improving Severity Preservation of Healthy-to-Pathological Voice Conversion With Global Style Tokens. Bence Mark Halpern, Wen-Chin Huang, Lester Phillip Violeta, R. J. J. H. van Son, Tomoki Toda |
| 2023 | Improving Speech Enhancement Using Audio Tagging Knowledge From Pre-Trained Representations and Multi-Task Learning. Shaoxiong Lin, Chao Zhang, Yanmin Qian |
| 2023 | Improving Stability in Simultaneous Speech Translation: A Revision-Controllable Decoding Approach. Jun-Kun Chen, Jian Xue, Peidong Wang, Jing Pan, Jinyu Li |
| 2023 | Improving Whispered Speech Recognition Performance Using Pseudo-Whispered Based Data Augmentation. Zhaofeng Lin, Tanvina Patel, Odette Scharenborg |
| 2023 | Invert-Classify: Recovering Discrete Prosody Inputs for Text-To-Speech. Nicholas Sanders, Korin Richmond |
| 2023 | Investigating The Effect of Language Models in Sequence Discriminative Training For Neural Transducers. Zijian Yang, Wei Zhou, Ralf Schlüter, Hermann Ney |
| 2023 | Joint Audio and Speech Understanding. Yuan Gong, Alexander H. Liu, Hongyin Luo, Leonid Karlinsky, James R. Glass |
| 2023 | Joint Energy-Based Model for Robust Speech Classification System Against Dirty-Label Backdoor Poisoning Attacks. Martin Sustek, Sonal Joshi, Henry Li, Thomas Thebaud, Jesús Villalba, Sanjeev Khudanpur, Najim Dehak |
| 2023 | Joint Federated Learning and Personalization for on-Device ASR. Junteng Jia, Ke Li, Mani Malek, Kshitiz Malik, Jay Mahadeokar, Ozlem Kalinli, Frank Seide |
| 2023 | Joint Prediction and Denoising for Large-Scale Multilingual Self-Supervised Learning. William Chen, Jiatong Shi, Brian Yan, Dan Berrebbi, Wangyou Zhang, Yifan Peng, Xuankai Chang, Soumi Maiti, Shinji Watanabe |
| 2023 | KAQ: A Non-Intrusive Stacking Framework for Mean Opinion Score Prediction with Multi-Task Learning. Chenglin Xu, Xiguang Zheng, Chen Zhang, Chao Zhou, Qi Huang, Bing Yu |
| 2023 | Knowledge Distillation From Offline to Streaming Transducer: Towards Accurate and Fast Streaming Model by Matching Alignments. Ji-Hwan Mo, Jae-Jin Jeon, Mun-Hak Lee, Joon-Hyuk Chang |
| 2023 | LAE-ST-MOE: Boosted Language-Aware Encoder Using Speech Translation Auxiliary Task for E2E Code-Switching ASR. Guodong Ma, Wenxuan Wang, Yuke Li, Yuting Yang, Binbin Du, Haoran Fu |
| 2023 | LC4SV: A Denoising Framework Learning to Compensate for Unseen Speaker Verification Models. Chi-Chang Lee, Hong-Wei Chen, Chu-Song Chen, Hsin-Min Wang, Tsung-Te Liu, Yu Tsao |
| 2023 | LE-SSL-MOS: Self-Supervised Learning MOS Prediction with Listener Enhancement. Zili Qi, Xinhui Hu, Wangjin Zhou, Sheng Li, Hao Wu, Jian Lu, Xinkang Xu |
| 2023 | LV-CTC: Non-Autoregressive ASR With CTC and Latent Variable Models. Yuya Fujita, Shinji Watanabe, Xuankai Chang, Takashi Maekaku |
| 2023 | Learning From Flawed Data: Weakly Supervised Automatic Speech Recognition. Dongji Gao, Hainan Xu, Desh Raj, Leibny Paola García-Perera, Daniel Povey, Sanjeev Khudanpur |
| 2023 | Leveraging Multilingual Self-Supervised Pretrained Models for Sequence-to-Sequence End-to-End Spoken Language Understanding. Pavel Denisov, Ngoc Thang Vu |
| 2023 | Leveraging the Multilingual Indonesian Ethnic Languages Dataset In Self-Supervised Models for Low-Resource ASR Task. Sakriani Sakti, Benita Angela Titalim |
| 2023 | LibriSpeech-PC: Benchmark for Evaluation of Punctuation and Capitalization Capabilities of End-to-End ASR Models. Aleksandr Meister, Matvei Novikov, Nikolay Karpov, Evelina Bakhturina, Vitaly Lavrukhin, Boris Ginsburg |
| 2023 | Locality Enhanced Dynamic Biasing and Sampling Strategies For Contextual ASR. Md Asif Jalal, Pablo Peso Parada, George Pavlidis, Vasileios Moschopoulos, Karthikeyan Saravanan, Chrysovalantis-Giorgos Kontoulis, Jisi Zhang, Anastasios Drosou, Gil Ho Lee, Jungin Lee, Seokyeong Jung |
| 2023 | Low-Rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition. Yu Yu, Chao-Han Huck Yang, Jari Kolehmainen, Prashanth Gurunath Shivakumar, Yile Gu, Sungho Ryu, Roger Ren, Qi Luo, Aditya Gourav, I-Fan Chen, Yi-Chieh Liu, Tuan Dinh, Ankur Gandhe, Denis Filimonov, Shalini Ghosh, Andreas Stolcke, Ariya Rastrow, Ivan Bulyko |
| 2023 | MASR: Multi-Label Aware Speech Representation. Anjali Raj, Shikhar Bharadwaj, Sriram Ganapathy, Min Ma, Shikhar Vashishth |
| 2023 | MBTFNET: Multi-Band Temporal-Frequency Neural Network for Singing Voice Enhancement. Weiming Xu, Zhouxuan Chen, Zhili Tan, Shubo Lv, Runduo Han, Wenjiang Zhou, Weifeng Zhao, Lei Xie |
| 2023 | MUST: A Multilingual Student-Teacher Learning Approach for Low-Resource Speech Recognition. Muhammad Umar Farooq, Rehan Ahmad, Thomas Hain |
| 2023 | Magnitude-and-Phase-Aware Speech Enhancement With Parallel Sequence Modeling. Yuewei Zhang, Huanbin Zou, Jie Zhu |
| 2023 | Mask-Conformer: Augmenting Conformer with Mask-Predict Decoder. Yosuke Higuchi, Andrew Rosenberg, Yuan Wang, Murali Karthick Baskar, Bhuvana Ramabhadran |
| 2023 | Maximizing Data Efficiency for Cross-Lingual TTS Adaptation by Self-Supervised Representation Mixing and Embedding Initialization. Wei-Ping Huang, Sung-Feng Huang, Hung-yi Lee |
| 2023 | MelHuBERT: A Simplified Hubert on Mel Spectrograms. Tzu-Quan Lin, Hung-yi Lee, Hao Tang |
| 2023 | Meta-Learning Framework for End-to-End Imposter Identification in Unseen Speaker Recognition. Ashutosh Chaubey, Sparsh Sinha, Susmita Ghose |
| 2023 | Minisuperb: Lightweight Benchmark for Self-Supervised Speech Models. Yu-Hsiang Wang, Huang-Yu Chen, Kai-Wei Chang, Winston H. Hsu, Hung-yi Lee |
| 2023 | Model-Based Fairness Metric for Speaker Verification. Maliha Jahan, Laureano Moro-Velázquez, Thomas Thebaud, Najim Dehak, Jesús Villalba |
| 2023 | Multi Transcription-Style Speech Transcription Using Attention-Based Encoder-Decoder Model. Yan Huang, Piyush Behre, Guoli Ye, Shawn Chang, Yifan Gong |
| 2023 | Multitask Learning Model with Text and Speech Representation for Fine-Grained Speech Scoring. Seongjin Park, Rutuja Ubale |
| 2023 | Neuralecho: Hybrid of Full-Band and Sub-Band Recurrent Neural Network For Acoustic Echo Cancellation and Speech Enhancement. Meng Yu, Yong Xu, Chunlei Zhang, Shi-Xiong Zhang, Dong Yu |
| 2023 | Neuralkalman: A Learnable Kalman Filter for Acoustic Echo Cancellation. Yixuan Zhang, Meng Yu, Hao Zhang, Dong Yu, DeLiang Wang |
| 2023 | No Pitch Left Behind: Addressing Gender Unbalance In Automatic Speech Recognition Through Pitch Manipulation. Dennis Fucci, Marco Gaido, Matteo Negri, Mauro Cettolo, Luisa Bentivogli |
| 2023 | Not All Errors Are Created Equal: Evaluating The Impact of Model and Speaker Factors on ASR Outcomes in Clinical Populations. Daniela A. Wiepert, Rene L. Utianski, Joseph R. Duffy, John L. Stricker, Leland Barnard, Keith A. Josephs, Jennifer L. Whitwell, David T. Jones, Hugo Botha |
| 2023 | On Decoder-Only Architecture For Speech-to-Text and Large Language Model Integration. Jian Wu, Yashesh Gaur, Zhuo Chen, Long Zhou, Yimeng Zhu, Tianrui Wang, Jinyu Li, Shujie Liu, Bo Ren, Linquan Liu, Yu Wu |
| 2023 | On Time Domain Conformer Models for Monaural Speech Separation in Noisy Reverberant Acoustic Environments. William Ravenscroft, Stefan Goetze, Thomas Hain |
| 2023 | On the Relevance of Phoneme Duration Variability of Synthesized Training Data for Automatic Speech Recognition. Nick Rossenbach, Benedikt Hilmes, Ralf Schlüter |
| 2023 | Optimizing Two-Pass Cross-Lingual Transfer Learning: Phoneme Recognition And Phoneme To Grapheme Translation. Wonjun Lee, Gary Geunbae Lee, Yunsu Kim |
| 2023 | PP-MET: A Real-World Personalized Prompt Based Meeting Transcription System. Xiang Lyu, Yuhang Cao, Qing Wang, Jingjing Yin, Yuguang Yang, Pengpeng Zou, Yanni Hu, Heng Lu |
| 2023 | Paraconsistent Feature Analysis for the Competency Evaluation of Voice Impersonation. Rajeev Rajan, Noumida Abdul Kareem, Sreelakshmi S |
| 2023 | Parameter-Efficient Cross-Language Transfer Learning for a Language-Modular Audiovisual Speech Recognition. Zhengyang Li, Thomas Graave, Jing Liu, Timo Lohrenz, Siegfried Kunzmann, Tim Fingscheidt |
| 2023 | Parameter-Efficient Tuning with Adaptive Bottlenecks for Automatic Speech Recognition. Geoffroy Vanderreydt, Amrutha Prasad, Driss Khalil, Srikanth R. Madikeri, Kris Demuynck, Petr Motlícek |
| 2023 | Pareto Efficiency of Learning-Forgetting Trade-Off in Neural Language Model Adaptation. Jerome R. Bellegarda |
| 2023 | Partial Rank Similarity Minimization Method for Quality MOS Prediction of Unseen Speech Synthesis Systems in Zero-Shot and Semi-Supervised Setting. Hemant Yadav, Erica Cooper, Junichi Yamagishi, Sunayana Sitaram, Rajiv Ratn Shah |
| 2023 | Permod: Perceptually Grounded Voice Modification With Latent Diffusion Models. Robin Netzorg, Ajil Jalal, Luna McNulty, Gopala Krishna Anumanchipalli |
| 2023 | Preserving Phonemic Distinctions For Ordinal Regression: A Novel Loss Function For Automatic Pronunciation Assessment. Bi-Cheng Yan, Hsin-Wei Wang, Yi-Cheng Wang, Jiun-Ting Li, Chi-Han Lin, Berlin Chen |
| 2023 | Prompt Pool Based Class-Incremental Continual Learning for Dialog State Tracking. Hong Liu, Yucheng Cai, Yuan Zhou, Zhijian Ou, Yi Huang, Junlan Feng |
| 2023 | Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition. Yuang Li, Yu Wu, Jinyu Li, Shujie Liu |
| 2023 | Prompting and Adapter Tuning For Self-Supervised Encoder-Decoder Speech Model. Kai-Wei Chang, Ming-Hsin Chen, Yun-Ping Lin, Jing Neng Hsu, Paul Kuo-Ming Huang, Chien-yu Huang, Shang-wen Li, Hung-yi Lee |
| 2023 | Promptspeaker: Speaker Generation Based on Text Descriptions. Yongmao Zhang, Guanghou Liu, Yi Lei, Yunlin Chen, Hao Yin, Lei Xie, Zhifei Li |
| 2023 | Pseudo-Label Based Supervised Contrastive Loss for Robust Speech Representations. Varun Krishna, Sriram Ganapathy |
| 2023 | QUICKVC: A Lightweight VITS-Based Any-to-Many Voice Conversion Model using ISTFT for Faster Conversion. Houjian Guo, Chaoran Liu, Carlos Toshinori Ishi, Hiroshi Ishiguro |
| 2023 | Reducing the Cost of Spoof Detection Labeling using Mixed-Strategy Active Learning and Pretrained Models. Mark Lindsey, Nathaniel R. Robinson, Francis Kubala, Richard M. Stern |
| 2023 | Reproducing Whisper-Style Training Using An Open-Source Toolkit And Publicly Available Data. Yifan Peng, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan S. Sharma, Wangyou Zhang, Yui Sudo, Muhammad Shakeel, Jee-weon Jung, Soumi Maiti, Shinji Watanabe |
| 2023 | Rescuespeech: A German Corpus for Speech Recognition in Search and Rescue Domain. Sangeet Sagar, Mirco Ravanelli, Bernd Kiefer, Ivana Kruijff-Korbayová, Josef van Genabith |
| 2023 | Robust End-to-End Diarization with Domain Adaptive Training and Multi-Task Learning. Ivan Fung, Lahiru Samarakoon, Samuel J. Broughton |
| 2023 | Robust Logarithmic Champernowne Algorithm for Feedback Cancellation in Hearing aids. Vanitha Devi R, Vasundhara |
| 2023 | Robust Recognition of Speaker Emotion With Difference Feature Extraction Using a Few Enrollment Utterances. Daichi Hayakawa, Takehiko Kagoshima, Kenji Iwata, Norbert Braunschweiler, Rama Doddipatla |
| 2023 | SLM: Bridge the Thin Gap Between Speech and Text Foundation Models. Mingqiu Wang, Wei Han, Izhak Shafran, Zelin Wu, Chung-Cheng Chiu, Yuan Cao, Nanxin Chen, Yu Zhang, Hagen Soltau, Paul K. Rubenstein, Lukas Zilka, Dian Yu, Golan Pundak, Nikhil Siddhartha, Johan Schalkwyk, Yonghui Wu |
| 2023 | SQAT-LD: SPeech Quality Assessment Transformer Utilizing Listener Dependent Modeling for Zero-Shot Out-of-Domain MOS Prediction. Kailai Shen, Diqun Yan, Li Dong, Ying Ren, Xiaoxun Wu, Jing Hu |
| 2023 | Sa-Paraformer: Non-Autoregressive End-To-End Speaker-Attributed ASR. Yangze Li, Fan Yu, Yuhao Liang, Pengcheng Guo, Mohan Shi, Zhihao Du, Shiliang Zhang, Lei Xie |
| 2023 | Salt: Distinguishable Speaker Anonymization Through Latent Space Transformation. Yuanjun Lv, Jixun Yao, Peikun Chen, Hongbin Zhou, Heng Lu, Lei Xie |
| 2023 | Scenario-Aware Audio-Visual TF-Gridnet for Target Speech Extraction. Zexu Pan, Gordon Wichern, Yoshiki Masuyama, François G. Germain, Sameer Khurana, Chiori Hori, Jonathan Le Roux |
| 2023 | Segment-Level Vectorized Beam Search Based on Partially Autoregressive Inference. Masao Someki, Nicholas Eng, Yosuke Higuchi, Shinji Watanabe |
| 2023 | Semi-Supervised Multi-Channel Speaker Diarization With Cross-Channel Attention. Shilong Wu, Jun Du, Mao-Kui He, Shutong Niu, Hang Chen, Haitao Tang, Chin-Hui Lee |
| 2023 | Simulation of Teacher-Learner Interaction in English Language Pronunciation Learning. Elaf Islam, Thomas Hain, Protima Nomo Sudro |
| 2023 | Speaker Adaptation for End-to-End Speech Recognition Systems in Noisy Environments. Dominik Wagner, Ilja Baumann, Sebastian P. Bayerl, Korbinian Riedhammer, Tobias Bocklet |
| 2023 | Spectral Tilt May Have a Smaller Impact on the Intelligibility of Speech in Noise. Yoshiki Sato, Julián Villegas |
| 2023 | Speech Emotion Diarization: Which Emotion Appears When? Yingzhi Wang, Mirco Ravanelli, Alya Yacoubi |
| 2023 | Spike-Triggered Contextual Biasing for End-to-End Mandarin Speech Recognition. Kaixun Huang, Ao Zhang, Binbin Zhang, Tianyi Xu, Xingchen Song, Lei Xie |
| 2023 | Study on the Correlation Between Objective Evaluations and Subjective Speech Quality and Intelligibility. Hsin-Tien Chiang, Kuo-Hsuan Hung, Szu-Wei Fu, Heng-Cheng Kuo, Ming-Hsueh Tsai, Yu Tsao |
| 2023 | Summarize While Translating: Universal Model With Parallel Decoding for Summarization and Translation. Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Kohei Matsuura, Takanori Ashihara, William Chen, Shinji Watanabe |
| 2023 | Thai-Dialect: Low Resource Thai Dialectal Speech to Text Corpora. Artit Suwanbandit, Jaturong Chitiyaphol, Sutthinan Chuenchom, Kanyarat Kwiecien, Husen Sawal, Ruslan Uthai, Orathai Sangpetch, Ekapol Chuangsuwanich |
| 2023 | The Gift of Feedback: Improving ASR Model Quality by Learning from User Corrections Through Federated Learning. Lillian Zhou, Yuxin Ding, Mingqing Chen, Harry Zhang, Rohit Prabhavalkar, Dhruv Guliani, Giovanni Motta, Rajiv Mathews |
| 2023 | The Role of Feature Correlation on Quantized Neural Networks. David Qiu, Shaojin Ding, Yanzhang He |
| 2023 | The Second Multi-Channel Multi-Party Meeting Transcription Challenge (M2MeT 2.0): A Benchmark for Speaker-Attributed ASR. Yuhao Liang, Mohan Shi, Fan Yu, Yangze Li, Shiliang Zhang, Zhihao Du, Qian Chen, Lei Xie, Yanmin Qian, Jian Wu, Zhuo Chen, Kong Aik Lee, Zhijie Yan, Hui Bu |
| 2023 | The Singing Voice Conversion Challenge 2023. Wen-Chin Huang, Lester Phillip Violeta, Songxiang Liu, Jiatong Shi, Tomoki Toda |
| 2023 | The Voicemos Challenge 2023: Zero-Shot Subjective Speech Quality Prediction for Multiple Domains. Erica Cooper, Wen-Chin Huang, Yu Tsao, Hsin-Min Wang, Tomoki Toda, Junichi Yamagishi |
| 2023 | Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments. Sara Papi, Peidong Wang, Jun-Kun Chen, Jian Xue, Jinyu Li, Yashesh Gaur |
| 2023 | TorchAudio 2.1: Advancing Speech Recognition, Self-Supervised Learning, and Audio Processing Components for Pytorch. Jeff Hwang, Moto Hira, Caroline Chen, Xiaohui Zhang, Zhaoheng Ni, Guangzhi Sun, Pingchuan Ma, Ruizhe Huang, Vineel Pratap, Yuekai Zhang, Anurag Kumar, Chin-Yun Yu, Chuang Zhu, Chunxi Liu, Jacob Kahn, Mirco Ravanelli, Peng Sun, Shinji Watanabe, Yangyang Shi, Yumeng Tao |
| 2023 | Toward Universal Speech Enhancement For Diverse Input Conditions. Wangyou Zhang, Kohei Saijo, Zhong-Qiu Wang, Shinji Watanabe, Yanmin Qian |
| 2023 | Towards Developing State-of-The-Art TTS Synthesisers for 13 Indian Languages with Signal Processing Aided Alignments. Anusha Prakash, Srinivasan Umesh, Hema A. Murthy |
| 2023 | Towards General-Purpose Text-Instruction-Guided Voice Conversion. Chun-Yi Kuan, Chen-An Li, Tsu-Yuan Hsu, Tse-Yang Lin, Ho-Lam Chung, Kai-Wei Chang, Shuo-Yiin Chang, Hung-yi Lee |
| 2023 | Towards Matching Phones and Speech Representations. Gene-Ping Yang, Hao Tang |
| 2023 | Towards Robust Packet Loss Concealment System With ASR-Guided Representations. Da-Hee Yang, Joon-Hyuk Chang |
| 2023 | Towards a Unified End-to-End Language Understanding System for Speech and Text Inputs. Mohan Li, Catalin Zorila, Cong-Thanh Do, Rama Doddipatla |
| 2023 | Transcribing and Aligning Conversational Speech: A Hybrid Pipeline Applied to French Conversations. Hiroyoshi Yamasaki, Jérôme Louradour, Julie Hunter, Laurent Prévot |
| 2023 | Transduce and Speak: Neural Transducer for Text-To-Speech with Semantic Token Prediction. Minchan Kim, Myeonghun Jeong, Byoung Jin Choi, Dongjune Lee, Nam Soo Kim |
| 2023 | Transferring Speech-Generic and Depression-Specific Knowledge for Alzheimer's Disease Detection. Ziyun Cui, Wen Wu, Wei-Qiang Zhang, Ji Wu, Chao Zhang |
| 2023 | Transformer Attractors for Robust and Efficient End-To-End Neural Diarization. Lahiru Samarakoon, Samuel J. Broughton, Marc Härkönen, Ivan Fung |
| 2023 | Two-Pass Endpoint Detection for Speech Recognition. Anirudh Raju, Aparna Khare, Di He, Ilya Sklyar, Long Chen, Sam Alptekin, Viet Anh Trinh, Zhe Zhang, Colin Vaz, Venkatesh Ravichandran, Roland Maas, Ariya Rastrow |
| 2023 | U2-KWS: Unified Two-Pass Open-Vocabulary Keyword Spotting with Keyword Bias. Ao Zhang, Pan Zhou, Kaixun Huang, Yong Zou, Ming Liu, Lei Xie |
| 2023 | Unconstrained Dysfluency Modeling for Dysfluent Speech Transcription and Detection. Jiachen Lian, Carly Feng, Naasir Farooqi, Steve Li, Anshul Kashyap, Cheol Jun Cho, Peter Wu, Robbie Netzorg, Tingle Li, Gopala Krishna Anumanchipalli |
| 2023 | Using Joint Training Speaker Encoder With Consistency Loss to Achieve Cross-Lingual Voice Conversion and Expressive Voice Conversion. Houjian Guo, Chaoran Liu, Carlos Toshinori Ishi, Hiroshi Ishiguro |
| 2023 | VITS-Based Singing Voice Conversion System with DSPGAN Post-Processing for SVCC2023. Yiquan Zhou, Meng Chen, Yi Lei, Jihua Zhu, Weifeng Zhao |
| 2023 | Variational Gaussian Process Data Uncertainty. Jeremy Heng Meng Wong, Huayun Zhang, Nancy F. Chen |
| 2023 | Vits-Based Singing Voice Conversion Leveraging Whisper and Multi-Scale F0 Modeling. Ziqian Ning, Yuepeng Jiang, Zhichao Wang, Bin Zhang, Lei Xie |
| 2023 | Voiceextender: Short-Utterance Text-Independent Speaker Verification With Guided Diffusion Model. Yayun He, Zuheng Kang, Jianzong Wang, Junqing Peng, Jing Xiao |
| 2023 | Vsanet: Real-Time Speech Enhancement Based on Voice Activity Detection and Causal Spatial Attention. Yuewei Zhang, Huanbin Zou, Jie Zhu |
| 2023 | WaveNeXt: ConvNeXt-Based Fast Neural Vocoder Without ISTFT layer. Takuma Okamoto, Haruki Yamashita, Yamato Ohtani, Tomoki Toda, Hisashi Kawai |
| 2023 | Whisper-Slu: Extending a Pretrained Speech-to-Text Transformer for Low Resource Spoken Language Understanding. Quentin Meeus, Marie-Francine Moens, Hugo Van hamme |
| 2023 | Wiki-En-ASR-Adapt: Large-Scale Synthetic Dataset for English ASR Customization. Alexandra Antonova |
| 2023 | Yodas: Youtube-Oriented Dataset for Audio and Speech. Xinjian Li, Shinnosuke Takamichi, Takaaki Saeki, William Chen, Sayaka Shiota, Shinji Watanabe |
| 2023 | Zero-Shot Domain-Sensitive Speech Recognition with Prompt-Conditioning Fine-Tuning. Feng-Ting Liao, Yung-Chieh Chan, Yi-Chang Chen, Chan-Jan Hsu, Da-Shan Shiu |
| 2023 | Zero-Shot Emotion Transfer for Cross-Lingual Speech Synthesis. Yuke Li, Xinfa Zhu, Yi Lei, Hai Li, Junhui Liu, Danming Xie, Lei Xie |
| 2023 | Zero-Shot Singing Voice Synthesis from Musical Score. Jun-You Wang, Hung-yi Lee, Jyh-Shing Roger Jang, Li Su |