ASRU C

191 papers

YearTitle / Authors
2023A Comparative Study of Voice Conversion Models With Large-Scale Speech and Singing Data: The T13 Systems for the Singing Voice Conversion Challenge 2023.
Ryuichi Yamamoto, Reo Yoneyama, Lester Phillip Violeta, Wen-Chin Huang, Tomoki Toda
2023A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, And Extraction.
Kohei Saijo, Wangyou Zhang, Zhong-Qiu Wang, Shinji Watanabe, Tetsunori Kobayashi, Tetsuji Ogawa
2023A Token-Wise Beam Search Algorithm for RNN-T.
Gil Keren
2023A Weakly-Supervised Streaming Multilingual Speech Model with Truly Zero-Shot Capability.
Jian Xue, Peidong Wang, Jinyu Li, Eric Sun
2023AWMC: Online Test-Time Adaptation Without Mode Collapse for Continual Adaptation.
Jae-Hong Lee, Do-Hee Kim, Joon-Hyuk Chang
2023Acoustic Model Fusion For End-to-End Speech Recognition.
Zhihong Lei, Mingbin Xu, Shiyi Han, Leo Liu, Zhen Huang, Tim Ng, Yuanyuan Zhang, Ernest Pusateri, Mirko Hannemann, Yaqiao Deng, Man-Hung Siu
2023Acoustics-Text Dual-Modal Joint Representation Learning for Cover Song Identification.
Yanmei Gu, Jing Li, Jiayi Zhou, Zhiming Wang, Huijia Zhu
2023Adapting Pretrained Speech Model for Mandarin Lyrics Transcription and Alignment.
Jun-You Wang, Chon-In Leong, Yu-Chen Lin, Li Su, Jyh-Shing Roger Jang
2023Adversarial Augmentation For Adapter Learning.
Jen-Tzung Chien, Wei-Yu Sun
2023After: Active Learning Based Fine-Tuning Framework for Speech Emotion Recognition.
Dongyuan Li, Yusong Wang, Kotaro Funakoshi, Manabu Okumura
2023An Exploration of Task-Decoupling on Two-Stage Neural Post Filter for Real-Time Personalized Acoustic Echo Cancellation.
Zihan Zhang, Jiayao Sun, Xianjun Xia, Ziqian Wang, Xiaopeng Yan, Yijian Xiao, Lei Xie
2023Audio-Adapterfusion: A Task-Id-Free Approach for Efficient and Non-Destructive Multi-Task Speech Recognition.
Hillary Ngai, Rohan Agrawal, Neeraj Gaur, W. Ronny Huang, Parisa Haghani, Pedro Moreno Mengibar
2023Audio-Visual Neural Syntax Acquisition.
Cheng-I Jeff Lai, Freda Shi, Puyuan Peng, Yoon Kim, Kevin Gimpel, Shiyu Chang, Yung-Sung Chuang, Saurabhchand Bhati, David D. Cox, David Harwath, Yang Zhang, Karen Livescu, James R. Glass
2023Av-Data2Vec: Self-Supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations.
Jiachen Lian, Alexei Baevski, Wei-Ning Hsu, Michael Auli
2023BA-MoE: Boundary-Aware Mixture-of-Experts Adapter for Code-Switching Speech Recognition.
Peikun Chen, Fan Yu, Yuhao Liang, Hongfei Xue, Xucheng Wan, Naijun Zheng, Huan Zhou, Lei Xie
2023Bisinger: Bilingual Singing Voice Synthesis.
Huali Zhou, Yueqian Lin, Yao Shi, Peng Sun, Ming Li
2023Boosting Modality Representation With Pre-Trained Models and Multi-Task Training for Multimodal Sentiment Analysis.
Jiarui Hai, Yu-Jeh Liu, Mounya Elhilali
2023Brouhaha: Multi-Task Training for Voice Activity Detection, Speech-to-Noise Ratio, and C50 Room Acoustics Estimation.
Marvin Lavechin, Marianne Métais, Hadrien Titeux, Alodie Boissonnet, Jade Copet, Morgane Rivière, Elika Bergelson, Alejandrina Cristià, Emmanuel Dupoux, Hervé Bredin
2023Building High-Accuracy Multilingual ASR With Gated Language Experts and Curriculum Training.
Eric Sun, Jinyu Li, Yuxuan Hu, Yimeng Zhu, Long Zhou, Jian Xue, Peidong Wang, Linquan Liu, Shujie Liu, Edward Lin, Yifan Gong
2023CAMSAT: Augmentation Mix and Self-Augmented Training Clustering for Self-Supervised Speaker Recognition.
Abderrahim Fathan, Jahangir Alam
2023COCO-NUT: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-Based Control.
Aya Watanabe, Shinnosuke Takamichi, Yuki Saito, Wataru Nakata, Detai Xin, Hiroshi Saruwatari
2023CTC Blank Triggered Dynamic Layer-Skipping for Efficient Ctc-Based Speech Recognition.
Junfeng Hou, Peiyao Wang, Jincheng Zhang, Meng Yang, Minwei Feng, Jingcheng Yin
2023Can Unpaired Textual Data Replace Synthetic Speech in ASR Model Adaptation?
Pasquale D'Alterio, Christian Hensel, Bashar Awwad Shiekh Hasan
2023Can We Use Speaker Embeddings On Spontaneous Speech Obtained From Medical Conversations To Predict Intelligibility?
Sebastião Quintas, Mathieu Balaguer, Julie Mauclair, Virginie Woisard, Julien Pinquier
2023Clustering Unsupervised Representations as Defense Against Poisoning Attacks on Speech Commands Classification System.
Thomas Thebaud, Sonal Joshi, Henry Li, Martin Sustek, Jesús Villalba, Sanjeev Khudanpur, Najim Dehak
2023Combining Relative and Absolute Learning Formulations to Predict Emotional Attributes From Speech.
Abinay Reddy Naini, Shruthi Subramanium, Seong-Gyun Leem, Carlos Busso
2023Consistency Based Unsupervised Self-Training for ASR Personalisation.
Jisi Zhang, Vandana Rajan, Haaris Mehmood, David Tuckey, Pablo Peso Parada, Md Asif Jalal, Karthikeyan Saravanan, Gil Ho Lee, Jungin Lee, Seokyeong Jung
2023Contextual Spelling Correction with Large Language Models.
Gan Song, Zelin Wu, Golan Pundak, Angad Chandorkar, Kandarp Joshi, Xavier Velez, Diamantino Caseiro, Ben Haynor, Weiran Wang, Nikhil Siddhartha, Pat Rondon, Khe Chai Sim
2023Cross-Modal Alignment With Optimal Transport For CTC-Based ASR.
Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai
2023Cross-Modal Learning for CTC-Based ASR: Leveraging CTC-Bertscore and Sequence-Level Training.
Mun-Hak Lee, Sang-Eon Lee, Ji-Eun Choi, Joon-Hyuk Chang
2023Crosssinger: A Cross-Lingual Multi-Singer High-Fidelity Singing Voice Synthesizer Trained on Monolingual Singers.
Xintong Wang, Chang Zeng, Jun Chen, Chunhui Wang
2023Deep Learning for Joint Acoustic Echo and Acoustic Howling Suppression in Hybrid Meetings.
Hao Zhang, Meng Yu, Dong Yu
2023Deriving Translational Acoustic Sub-Word Embeddings.
Amit Meghanani, Thomas Hain
2023Detecting Speech Abnormalities With a Perceiver-Based Sequence Classifier that Leverages a Universal Speech Model.
Hagen Soltau, Izhak Shafran, Alex Ottenwess, Joseph R. Duffy, Rene L. Utianski, Leland R. Barnard, John L. Stricker, Daniela A. Wiepert, David T. Jones, Hugo Botha
2023Detection of Vowel Errors in Children's Speech using Synthetic Phonetic Transcripts.
Ilja Baumann, Dominik Wagner, Korbinian Riedhammer, Elmar Nöth, Tobias Bocklet
2023Dialect Adaptation and Data Augmentation for Low-Resource ASR: Taltech Systems for the Madasr 2023 Challenge.
Tanel Alumäe, Jiaming Kong, Daniil Robnikov
2023Diffusion-Based Mel-Spectrogram Enhancement for Personalized Speech Synthesis with Found Data.
Yusheng Tian, Wei Liu, Tan Lee
2023Discriminative Speech Recognition Rescoring With Pre-Trained Language Models.
Prashanth Gurunath Shivakumar, Jari Kolehmainen, Yile Gu, Ankur Gandhe, Ariya Rastrow, Ivan Bulyko
2023Domain Adaptation by Data Distribution Matching Via Submodularity For Speech Recognition.
Yusuke Shinohara, Shinji Watanabe
2023E3 TTS: Easy End-to-End Diffusion-Based Text To Speech.
Yuan Gao, Nobuyuki Morioka, Yu Zhang, Nanxin Chen
2023ECAPA2: A Hybrid Neural Network Architecture and Training Strategy for Robust Speaker Embeddings.
Jenthe Thienpondt, Kris Demuynck
2023ED-CEC: Improving Rare word Recognition Using ASR Postprocessing Based on Error Detection and Context-Aware Error Correction.
Jiajun He, Zekun Yang, Tomoki Toda
2023Efficient Cascaded Streaming ASR System Via Frame Rate Reduction.
Xingyu Cai, David Qiu, Shaojin Ding, Dongseong Hwang, Weiran Wang, Antoine Bruguier, Rohit Prabhavalkar, Tara N. Sainath, Yanzhang He
2023Efficient Text-Only Domain Adaptation For CTC-Based ASR.
Chang Chen, Xun Gong, Yanmin Qian
2023Enabling Noisy Label Usage for Out-of-Airspace Data in Read-Back Error Detection.
Lakshmi Rajendram Bashyam, Alexander Blatt, Dietrich Klakow
2023End-To-End Training of a Neural HMM with Label and Transition Probabilities.
Daniel Mann, Tina Raissi, Wilfried Michel, Ralf Schlüter, Hermann Ney
2023End-to-End Multichannel Speaker-Attributed ASR: Speaker Guided Decoder and Input Feature Analysis.
Can Cui, Imran A. Sheikh, Mostafa Sadeghi, Emmanuel Vincent
2023Ending the Blind Flight: Analyzing the Impact of Acoustic and Lexical Factors on WAV2VEC 2.0 in Air-Traffic Control.
Alexander Blatt, Badr M. Abdullah, Dietrich Klakow
2023Enhancing Expressivity Transfer in Textless Speech-to-Speech Translation.
Jarod Duret, Benjamin O'Brien, Yannick Estève, Titouan Parcollet
2023Enhancing Task-Oriented Dialogues With Chitchat: A Comparative Study Based on Lexical Diversity And Divergence.
Armand Stricker, Patrick Paroubek
2023Espnet-Summ: Introducing a Novel Large Dataset, Toolkit, and a Cross-Corpora Evaluation of Speech Summarization Systems.
Roshan S. Sharma, William Chen, Takatomo Kano, Ruchira Sharma, Siddhant Arora, Shinji Watanabe, Atsunori Ogawa, Marc Delcroix, Rita Singh, Bhiksha Raj
2023Evaluating Self-Supervised Speech Models on a Taiwanese Hokkien Corpus.
Yi-Hui Chou, Kalvin Chang, Meng-Ju Wu, Winston Ou, Alice Wen-Hsin Bi, Carol Yang, Bryan Y. Chen, Rong-Wei Pai, Po-Yen Yeh, Jo-Peng Chiang, Iu-Tshian Phoann, Winnie Chang, Chenxuan Cui, Noel Chen, Jiatong Shi
2023Exploring Data Augmentation in Bias Mitigation Against Non-Native-Accented Speech.
Yuanyuan Zhang, Aaricia Herygers, Tanvina Patel, Zhengjun Yue, Odette Scharenborg
2023Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition.
Yujin Wang, Changli Tang, Ziyang Ma, Zhisheng Zheng, Xie Chen, Wei-Qiang Zhang
2023Exploring Time-Frequency Domain Target Speaker Extraction For Causal and Non-Causal Processing.
Wangyou Zhang, Lei Yang, Yanmin Qian
2023Exploring the Viability of Synthetic Audio Data for Audio-Based Dialogue State Tracking.
Jihyun Lee, Yejin Jeon, Wonjun Lee, Yunsu Kim, Gary Geunbae Lee
2023Extending Self-Distilled Self-Supervised Learning For Semi-Supervised Speaker Verification.
Jeong-Hwan Choi, Jehyun Kyung, Ju-Seok Seong, Ye-Rin Jeoung, Joon-Hyuk Chang
2023FAT-HuBERT: Front-End Adaptive Training of Hidden-Unit BERT For Distortion-Invariant Robust Speech Recognition.
Dongning Yang, Wei Wang, Yanmin Qian
2023Fast Conformer With Linearly Scalable Attention For Efficient Speech Recognition.
Dima Rekesh, Nithin Rao Koluguri, Samuel Kriman, Somshubra Majumdar, Vahid Noroozi, He Huang, Oleksii Hrinchuk, Krishna C. Puvvada, Ankur Kumar, Jagadeesh Balam, Boris Ginsburg
2023Fast-Hubert: an Efficient Training Framework for Self-Supervised Speech Representation Learning.
Guanrou Yang, Ziyang Ma, Zhisheng Zheng, Yakun Song, Zhikang Niu, Xie Chen
2023FedCPC: An Effective Federated Contrastive Learning Method for Privacy Preserving Early-Stage Alzheimers Speech Detection.
Wenqing Wei, Zhengdong Yang, Yuan Gao, Jiyi Li, Chenhui Chu, Shogo Okada, Sheng Li
2023Few-Shot Spoken Language Understanding Via Joint Speech-Text Models.
Chung-Ming Chien, Mingjiamei Zhang, Ju-Chieh Chou, Karen Livescu
2023Findings of the 2023 ML-Superb Challenge: Pre-Training And Evaluation Over More Languages And Beyond.
Jiatong Shi, William Chen, Dan Berrebbi, Hsiu-Hsuan Wang, Wei-Ping Huang, En-Pei Hu, Ho-Lam Chuang, Xuankai Chang, Yuxun Tang, Shang-wen Li, Abdelrahman Mohamed, Hung-yi Lee, Shinji Watanabe
2023Flap: Fast Language-Audio Pre-Training.
Ching-Feng Yeh, Po-Yao Huang, Vasu Sharma, Shang-wen Li, Gargi Ghosh
2023GPU-Accelerated Wfst Beam Search Decoder for CTC-Based Speech Recognition.
Daniel Galvez, Tim Kaldewey
2023Gated Multi Encoders and Multitask Objectives for Dialectal Speech Recognition in Indian Languages.
Sathvik Udupa, Jesuraja Bandekar, Deekshitha G, Saurabh Kumar, Prasanta Kumar Ghosh, Sandhya Badiger, Abhayjeet Singh, Savitha Murthy, Priyanka Pai, Srinivasa Raghavan K. M., Raoul Nanavati
2023Generalized Zero-Shot Audio-to-Intent Classification.
Veera Raghavendra Elluru, Devang Kulshreshtha, Rohit Paturi, Sravan Bodapati, Srikanth Ronanki
2023Generative Linguistic Representation for Spoken Language Identification.
Peng Shen, Xuguang Lu, Hisashi Kawai
2023Generative Speech Recognition Error Correction With Large Language Models and Task-Activating Prompting.
Chao-Han Huck Yang, Yile Gu, Yi-Chieh Liu, Shalini Ghosh, Ivan Bulyko, Andreas Stolcke
2023HEVAL: A New Hybrid Evaluation Metric for Automatic Speech Recognition Tasks.
Zitha Sasindran, Harsha Yelchuri, T. V. Prabhakar, Supreeth Rao
2023HIGNN-TTS: Hierarchical Prosody Modeling With Graph Neural Networks for Expressive Long-Form TTS.
Dake Guo, Xinfa Zhu, Liumeng Xue, Tao Li, Yuanjun Lv, Yuepeng Jiang, Lei Xie
2023Haha-POD: An Attempt for Laughter-Based Non-Verbal Speaker Verification.
Yuke Lin, Xiaoyi Qin, Ning Jiang, Guoqing Zhao, Ming Li
2023Hierarchical Attention-Based Contextual Biasing For Personalized Speech Recognition Using Neural Transducers.
Sibo Tong, Philip Harding, Simon Wiesler
2023IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023, Taipei, Taiwan, December 16-20, 2023
2023Identifying People with Mild Cognitive Impairment at Risk of Developing Dementia using Speech Analysis.
Bahman Mirheidari, Ronan O'Malley, Daniel Blackburn, Heidi Christensen
2023Importance of Smoothness Induced by Optimizers in Fl4Asr: Towards Understanding Federated Learning for End-To-End ASR.
Sheikh Shams Azam, Tatiana Likhomanenko, Martin Pelikan, Jan Honza Silovsky
2023Improved Long-Form Speech Recognition By Jointly Modeling The Primary And Non-Primary Speakers.
Guru Prakash Arumugam, Shuo-Yiin Chang, Tara N. Sainath, Rohit Prabhavalkar, Quan Wang, Shaan Bijwadia
2023Improved Multi-Modal Emotion Recognition Using Squeeze-and-Excitation Block in Cross-Modal Attention.
Junchen Liu, Jesin James, Karan Nathwani
2023Improving Audiovisual Active Speaker Detection in Egocentric Recordings with the Data-Efficient Image Transformer.
Jason Clarke, Yoshihiko Gotoh, Stefan Goetze
2023Improving Large-Scale Deep Biasing With Phoneme Features and Text-Only Data in Streaming Transducer.
Jin Qiu, Lu Huang, Boyu Li, Jun Zhang, Lu Lu, Zejun Ma
2023Improving Multilingual and Code-Switching ASR Using Large Language Model Generated Text.
Ke Hu, Tara N. Sainath, Bo Li, Yu Zhang, Yong Cheng, Tao Wang, Yujing Zhang, Frederick Liu
2023Improving Severity Preservation of Healthy-to-Pathological Voice Conversion With Global Style Tokens.
Bence Mark Halpern, Wen-Chin Huang, Lester Phillip Violeta, R. J. J. H. van Son, Tomoki Toda
2023Improving Speech Enhancement Using Audio Tagging Knowledge From Pre-Trained Representations and Multi-Task Learning.
Shaoxiong Lin, Chao Zhang, Yanmin Qian
2023Improving Stability in Simultaneous Speech Translation: A Revision-Controllable Decoding Approach.
Jun-Kun Chen, Jian Xue, Peidong Wang, Jing Pan, Jinyu Li
2023Improving Whispered Speech Recognition Performance Using Pseudo-Whispered Based Data Augmentation.
Zhaofeng Lin, Tanvina Patel, Odette Scharenborg
2023Invert-Classify: Recovering Discrete Prosody Inputs for Text-To-Speech.
Nicholas Sanders, Korin Richmond
2023Investigating The Effect of Language Models in Sequence Discriminative Training For Neural Transducers.
Zijian Yang, Wei Zhou, Ralf Schlüter, Hermann Ney
2023Joint Audio and Speech Understanding.
Yuan Gong, Alexander H. Liu, Hongyin Luo, Leonid Karlinsky, James R. Glass
2023Joint Energy-Based Model for Robust Speech Classification System Against Dirty-Label Backdoor Poisoning Attacks.
Martin Sustek, Sonal Joshi, Henry Li, Thomas Thebaud, Jesús Villalba, Sanjeev Khudanpur, Najim Dehak
2023Joint Federated Learning and Personalization for on-Device ASR.
Junteng Jia, Ke Li, Mani Malek, Kshitiz Malik, Jay Mahadeokar, Ozlem Kalinli, Frank Seide
2023Joint Prediction and Denoising for Large-Scale Multilingual Self-Supervised Learning.
William Chen, Jiatong Shi, Brian Yan, Dan Berrebbi, Wangyou Zhang, Yifan Peng, Xuankai Chang, Soumi Maiti, Shinji Watanabe
2023KAQ: A Non-Intrusive Stacking Framework for Mean Opinion Score Prediction with Multi-Task Learning.
Chenglin Xu, Xiguang Zheng, Chen Zhang, Chao Zhou, Qi Huang, Bing Yu
2023Knowledge Distillation From Offline to Streaming Transducer: Towards Accurate and Fast Streaming Model by Matching Alignments.
Ji-Hwan Mo, Jae-Jin Jeon, Mun-Hak Lee, Joon-Hyuk Chang
2023LAE-ST-MOE: Boosted Language-Aware Encoder Using Speech Translation Auxiliary Task for E2E Code-Switching ASR.
Guodong Ma, Wenxuan Wang, Yuke Li, Yuting Yang, Binbin Du, Haoran Fu
2023LC4SV: A Denoising Framework Learning to Compensate for Unseen Speaker Verification Models.
Chi-Chang Lee, Hong-Wei Chen, Chu-Song Chen, Hsin-Min Wang, Tsung-Te Liu, Yu Tsao
2023LE-SSL-MOS: Self-Supervised Learning MOS Prediction with Listener Enhancement.
Zili Qi, Xinhui Hu, Wangjin Zhou, Sheng Li, Hao Wu, Jian Lu, Xinkang Xu
2023LV-CTC: Non-Autoregressive ASR With CTC and Latent Variable Models.
Yuya Fujita, Shinji Watanabe, Xuankai Chang, Takashi Maekaku
2023Learning From Flawed Data: Weakly Supervised Automatic Speech Recognition.
Dongji Gao, Hainan Xu, Desh Raj, Leibny Paola García-Perera, Daniel Povey, Sanjeev Khudanpur
2023Leveraging Multilingual Self-Supervised Pretrained Models for Sequence-to-Sequence End-to-End Spoken Language Understanding.
Pavel Denisov, Ngoc Thang Vu
2023Leveraging the Multilingual Indonesian Ethnic Languages Dataset In Self-Supervised Models for Low-Resource ASR Task.
Sakriani Sakti, Benita Angela Titalim
2023LibriSpeech-PC: Benchmark for Evaluation of Punctuation and Capitalization Capabilities of End-to-End ASR Models.
Aleksandr Meister, Matvei Novikov, Nikolay Karpov, Evelina Bakhturina, Vitaly Lavrukhin, Boris Ginsburg
2023Locality Enhanced Dynamic Biasing and Sampling Strategies For Contextual ASR.
Md Asif Jalal, Pablo Peso Parada, George Pavlidis, Vasileios Moschopoulos, Karthikeyan Saravanan, Chrysovalantis-Giorgos Kontoulis, Jisi Zhang, Anastasios Drosou, Gil Ho Lee, Jungin Lee, Seokyeong Jung
2023Low-Rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition.
Yu Yu, Chao-Han Huck Yang, Jari Kolehmainen, Prashanth Gurunath Shivakumar, Yile Gu, Sungho Ryu, Roger Ren, Qi Luo, Aditya Gourav, I-Fan Chen, Yi-Chieh Liu, Tuan Dinh, Ankur Gandhe, Denis Filimonov, Shalini Ghosh, Andreas Stolcke, Ariya Rastrow, Ivan Bulyko
2023MASR: Multi-Label Aware Speech Representation.
Anjali Raj, Shikhar Bharadwaj, Sriram Ganapathy, Min Ma, Shikhar Vashishth
2023MBTFNET: Multi-Band Temporal-Frequency Neural Network for Singing Voice Enhancement.
Weiming Xu, Zhouxuan Chen, Zhili Tan, Shubo Lv, Runduo Han, Wenjiang Zhou, Weifeng Zhao, Lei Xie
2023MUST: A Multilingual Student-Teacher Learning Approach for Low-Resource Speech Recognition.
Muhammad Umar Farooq, Rehan Ahmad, Thomas Hain
2023Magnitude-and-Phase-Aware Speech Enhancement With Parallel Sequence Modeling.
Yuewei Zhang, Huanbin Zou, Jie Zhu
2023Mask-Conformer: Augmenting Conformer with Mask-Predict Decoder.
Yosuke Higuchi, Andrew Rosenberg, Yuan Wang, Murali Karthick Baskar, Bhuvana Ramabhadran
2023Maximizing Data Efficiency for Cross-Lingual TTS Adaptation by Self-Supervised Representation Mixing and Embedding Initialization.
Wei-Ping Huang, Sung-Feng Huang, Hung-yi Lee
2023MelHuBERT: A Simplified Hubert on Mel Spectrograms.
Tzu-Quan Lin, Hung-yi Lee, Hao Tang
2023Meta-Learning Framework for End-to-End Imposter Identification in Unseen Speaker Recognition.
Ashutosh Chaubey, Sparsh Sinha, Susmita Ghose
2023Minisuperb: Lightweight Benchmark for Self-Supervised Speech Models.
Yu-Hsiang Wang, Huang-Yu Chen, Kai-Wei Chang, Winston H. Hsu, Hung-yi Lee
2023Model-Based Fairness Metric for Speaker Verification.
Maliha Jahan, Laureano Moro-Velázquez, Thomas Thebaud, Najim Dehak, Jesús Villalba
2023Multi Transcription-Style Speech Transcription Using Attention-Based Encoder-Decoder Model.
Yan Huang, Piyush Behre, Guoli Ye, Shawn Chang, Yifan Gong
2023Multitask Learning Model with Text and Speech Representation for Fine-Grained Speech Scoring.
Seongjin Park, Rutuja Ubale
2023Neuralecho: Hybrid of Full-Band and Sub-Band Recurrent Neural Network For Acoustic Echo Cancellation and Speech Enhancement.
Meng Yu, Yong Xu, Chunlei Zhang, Shi-Xiong Zhang, Dong Yu
2023Neuralkalman: A Learnable Kalman Filter for Acoustic Echo Cancellation.
Yixuan Zhang, Meng Yu, Hao Zhang, Dong Yu, DeLiang Wang
2023No Pitch Left Behind: Addressing Gender Unbalance In Automatic Speech Recognition Through Pitch Manipulation.
Dennis Fucci, Marco Gaido, Matteo Negri, Mauro Cettolo, Luisa Bentivogli
2023Not All Errors Are Created Equal: Evaluating The Impact of Model and Speaker Factors on ASR Outcomes in Clinical Populations.
Daniela A. Wiepert, Rene L. Utianski, Joseph R. Duffy, John L. Stricker, Leland Barnard, Keith A. Josephs, Jennifer L. Whitwell, David T. Jones, Hugo Botha
2023On Decoder-Only Architecture For Speech-to-Text and Large Language Model Integration.
Jian Wu, Yashesh Gaur, Zhuo Chen, Long Zhou, Yimeng Zhu, Tianrui Wang, Jinyu Li, Shujie Liu, Bo Ren, Linquan Liu, Yu Wu
2023On Time Domain Conformer Models for Monaural Speech Separation in Noisy Reverberant Acoustic Environments.
William Ravenscroft, Stefan Goetze, Thomas Hain
2023On the Relevance of Phoneme Duration Variability of Synthesized Training Data for Automatic Speech Recognition.
Nick Rossenbach, Benedikt Hilmes, Ralf Schlüter
2023Optimizing Two-Pass Cross-Lingual Transfer Learning: Phoneme Recognition And Phoneme To Grapheme Translation.
Wonjun Lee, Gary Geunbae Lee, Yunsu Kim
2023PP-MET: A Real-World Personalized Prompt Based Meeting Transcription System.
Xiang Lyu, Yuhang Cao, Qing Wang, Jingjing Yin, Yuguang Yang, Pengpeng Zou, Yanni Hu, Heng Lu
2023Paraconsistent Feature Analysis for the Competency Evaluation of Voice Impersonation.
Rajeev Rajan, Noumida Abdul Kareem, Sreelakshmi S
2023Parameter-Efficient Cross-Language Transfer Learning for a Language-Modular Audiovisual Speech Recognition.
Zhengyang Li, Thomas Graave, Jing Liu, Timo Lohrenz, Siegfried Kunzmann, Tim Fingscheidt
2023Parameter-Efficient Tuning with Adaptive Bottlenecks for Automatic Speech Recognition.
Geoffroy Vanderreydt, Amrutha Prasad, Driss Khalil, Srikanth R. Madikeri, Kris Demuynck, Petr Motlícek
2023Pareto Efficiency of Learning-Forgetting Trade-Off in Neural Language Model Adaptation.
Jerome R. Bellegarda
2023Partial Rank Similarity Minimization Method for Quality MOS Prediction of Unseen Speech Synthesis Systems in Zero-Shot and Semi-Supervised Setting.
Hemant Yadav, Erica Cooper, Junichi Yamagishi, Sunayana Sitaram, Rajiv Ratn Shah
2023Permod: Perceptually Grounded Voice Modification With Latent Diffusion Models.
Robin Netzorg, Ajil Jalal, Luna McNulty, Gopala Krishna Anumanchipalli
2023Preserving Phonemic Distinctions For Ordinal Regression: A Novel Loss Function For Automatic Pronunciation Assessment.
Bi-Cheng Yan, Hsin-Wei Wang, Yi-Cheng Wang, Jiun-Ting Li, Chi-Han Lin, Berlin Chen
2023Prompt Pool Based Class-Incremental Continual Learning for Dialog State Tracking.
Hong Liu, Yucheng Cai, Yuan Zhou, Zhijian Ou, Yi Huang, Junlan Feng
2023Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition.
Yuang Li, Yu Wu, Jinyu Li, Shujie Liu
2023Prompting and Adapter Tuning For Self-Supervised Encoder-Decoder Speech Model.
Kai-Wei Chang, Ming-Hsin Chen, Yun-Ping Lin, Jing Neng Hsu, Paul Kuo-Ming Huang, Chien-yu Huang, Shang-wen Li, Hung-yi Lee
2023Promptspeaker: Speaker Generation Based on Text Descriptions.
Yongmao Zhang, Guanghou Liu, Yi Lei, Yunlin Chen, Hao Yin, Lei Xie, Zhifei Li
2023Pseudo-Label Based Supervised Contrastive Loss for Robust Speech Representations.
Varun Krishna, Sriram Ganapathy
2023QUICKVC: A Lightweight VITS-Based Any-to-Many Voice Conversion Model using ISTFT for Faster Conversion.
Houjian Guo, Chaoran Liu, Carlos Toshinori Ishi, Hiroshi Ishiguro
2023Reducing the Cost of Spoof Detection Labeling using Mixed-Strategy Active Learning and Pretrained Models.
Mark Lindsey, Nathaniel R. Robinson, Francis Kubala, Richard M. Stern
2023Reproducing Whisper-Style Training Using An Open-Source Toolkit And Publicly Available Data.
Yifan Peng, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan S. Sharma, Wangyou Zhang, Yui Sudo, Muhammad Shakeel, Jee-weon Jung, Soumi Maiti, Shinji Watanabe
2023Rescuespeech: A German Corpus for Speech Recognition in Search and Rescue Domain.
Sangeet Sagar, Mirco Ravanelli, Bernd Kiefer, Ivana Kruijff-Korbayová, Josef van Genabith
2023Robust End-to-End Diarization with Domain Adaptive Training and Multi-Task Learning.
Ivan Fung, Lahiru Samarakoon, Samuel J. Broughton
2023Robust Logarithmic Champernowne Algorithm for Feedback Cancellation in Hearing aids.
Vanitha Devi R, Vasundhara
2023Robust Recognition of Speaker Emotion With Difference Feature Extraction Using a Few Enrollment Utterances.
Daichi Hayakawa, Takehiko Kagoshima, Kenji Iwata, Norbert Braunschweiler, Rama Doddipatla
2023SLM: Bridge the Thin Gap Between Speech and Text Foundation Models.
Mingqiu Wang, Wei Han, Izhak Shafran, Zelin Wu, Chung-Cheng Chiu, Yuan Cao, Nanxin Chen, Yu Zhang, Hagen Soltau, Paul K. Rubenstein, Lukas Zilka, Dian Yu, Golan Pundak, Nikhil Siddhartha, Johan Schalkwyk, Yonghui Wu
2023SQAT-LD: SPeech Quality Assessment Transformer Utilizing Listener Dependent Modeling for Zero-Shot Out-of-Domain MOS Prediction.
Kailai Shen, Diqun Yan, Li Dong, Ying Ren, Xiaoxun Wu, Jing Hu
2023Sa-Paraformer: Non-Autoregressive End-To-End Speaker-Attributed ASR.
Yangze Li, Fan Yu, Yuhao Liang, Pengcheng Guo, Mohan Shi, Zhihao Du, Shiliang Zhang, Lei Xie
2023Salt: Distinguishable Speaker Anonymization Through Latent Space Transformation.
Yuanjun Lv, Jixun Yao, Peikun Chen, Hongbin Zhou, Heng Lu, Lei Xie
2023Scenario-Aware Audio-Visual TF-Gridnet for Target Speech Extraction.
Zexu Pan, Gordon Wichern, Yoshiki Masuyama, François G. Germain, Sameer Khurana, Chiori Hori, Jonathan Le Roux
2023Segment-Level Vectorized Beam Search Based on Partially Autoregressive Inference.
Masao Someki, Nicholas Eng, Yosuke Higuchi, Shinji Watanabe
2023Semi-Supervised Multi-Channel Speaker Diarization With Cross-Channel Attention.
Shilong Wu, Jun Du, Mao-Kui He, Shutong Niu, Hang Chen, Haitao Tang, Chin-Hui Lee
2023Simulation of Teacher-Learner Interaction in English Language Pronunciation Learning.
Elaf Islam, Thomas Hain, Protima Nomo Sudro
2023Speaker Adaptation for End-to-End Speech Recognition Systems in Noisy Environments.
Dominik Wagner, Ilja Baumann, Sebastian P. Bayerl, Korbinian Riedhammer, Tobias Bocklet
2023Spectral Tilt May Have a Smaller Impact on the Intelligibility of Speech in Noise.
Yoshiki Sato, Julián Villegas
2023Speech Emotion Diarization: Which Emotion Appears When?
Yingzhi Wang, Mirco Ravanelli, Alya Yacoubi
2023Spike-Triggered Contextual Biasing for End-to-End Mandarin Speech Recognition.
Kaixun Huang, Ao Zhang, Binbin Zhang, Tianyi Xu, Xingchen Song, Lei Xie
2023Study on the Correlation Between Objective Evaluations and Subjective Speech Quality and Intelligibility.
Hsin-Tien Chiang, Kuo-Hsuan Hung, Szu-Wei Fu, Heng-Cheng Kuo, Ming-Hsueh Tsai, Yu Tsao
2023Summarize While Translating: Universal Model With Parallel Decoding for Summarization and Translation.
Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Kohei Matsuura, Takanori Ashihara, William Chen, Shinji Watanabe
2023Thai-Dialect: Low Resource Thai Dialectal Speech to Text Corpora.
Artit Suwanbandit, Jaturong Chitiyaphol, Sutthinan Chuenchom, Kanyarat Kwiecien, Husen Sawal, Ruslan Uthai, Orathai Sangpetch, Ekapol Chuangsuwanich
2023The Gift of Feedback: Improving ASR Model Quality by Learning from User Corrections Through Federated Learning.
Lillian Zhou, Yuxin Ding, Mingqing Chen, Harry Zhang, Rohit Prabhavalkar, Dhruv Guliani, Giovanni Motta, Rajiv Mathews
2023The Role of Feature Correlation on Quantized Neural Networks.
David Qiu, Shaojin Ding, Yanzhang He
2023The Second Multi-Channel Multi-Party Meeting Transcription Challenge (M2MeT 2.0): A Benchmark for Speaker-Attributed ASR.
Yuhao Liang, Mohan Shi, Fan Yu, Yangze Li, Shiliang Zhang, Zhihao Du, Qian Chen, Lei Xie, Yanmin Qian, Jian Wu, Zhuo Chen, Kong Aik Lee, Zhijie Yan, Hui Bu
2023The Singing Voice Conversion Challenge 2023.
Wen-Chin Huang, Lester Phillip Violeta, Songxiang Liu, Jiatong Shi, Tomoki Toda
2023The Voicemos Challenge 2023: Zero-Shot Subjective Speech Quality Prediction for Multiple Domains.
Erica Cooper, Wen-Chin Huang, Yu Tsao, Hsin-Min Wang, Tomoki Toda, Junichi Yamagishi
2023Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments.
Sara Papi, Peidong Wang, Jun-Kun Chen, Jian Xue, Jinyu Li, Yashesh Gaur
2023TorchAudio 2.1: Advancing Speech Recognition, Self-Supervised Learning, and Audio Processing Components for Pytorch.
Jeff Hwang, Moto Hira, Caroline Chen, Xiaohui Zhang, Zhaoheng Ni, Guangzhi Sun, Pingchuan Ma, Ruizhe Huang, Vineel Pratap, Yuekai Zhang, Anurag Kumar, Chin-Yun Yu, Chuang Zhu, Chunxi Liu, Jacob Kahn, Mirco Ravanelli, Peng Sun, Shinji Watanabe, Yangyang Shi, Yumeng Tao
2023Toward Universal Speech Enhancement For Diverse Input Conditions.
Wangyou Zhang, Kohei Saijo, Zhong-Qiu Wang, Shinji Watanabe, Yanmin Qian
2023Towards Developing State-of-The-Art TTS Synthesisers for 13 Indian Languages with Signal Processing Aided Alignments.
Anusha Prakash, Srinivasan Umesh, Hema A. Murthy
2023Towards General-Purpose Text-Instruction-Guided Voice Conversion.
Chun-Yi Kuan, Chen-An Li, Tsu-Yuan Hsu, Tse-Yang Lin, Ho-Lam Chung, Kai-Wei Chang, Shuo-Yiin Chang, Hung-yi Lee
2023Towards Matching Phones and Speech Representations.
Gene-Ping Yang, Hao Tang
2023Towards Robust Packet Loss Concealment System With ASR-Guided Representations.
Da-Hee Yang, Joon-Hyuk Chang
2023Towards a Unified End-to-End Language Understanding System for Speech and Text Inputs.
Mohan Li, Catalin Zorila, Cong-Thanh Do, Rama Doddipatla
2023Transcribing and Aligning Conversational Speech: A Hybrid Pipeline Applied to French Conversations.
Hiroyoshi Yamasaki, Jérôme Louradour, Julie Hunter, Laurent Prévot
2023Transduce and Speak: Neural Transducer for Text-To-Speech with Semantic Token Prediction.
Minchan Kim, Myeonghun Jeong, Byoung Jin Choi, Dongjune Lee, Nam Soo Kim
2023Transferring Speech-Generic and Depression-Specific Knowledge for Alzheimer's Disease Detection.
Ziyun Cui, Wen Wu, Wei-Qiang Zhang, Ji Wu, Chao Zhang
2023Transformer Attractors for Robust and Efficient End-To-End Neural Diarization.
Lahiru Samarakoon, Samuel J. Broughton, Marc Härkönen, Ivan Fung
2023Two-Pass Endpoint Detection for Speech Recognition.
Anirudh Raju, Aparna Khare, Di He, Ilya Sklyar, Long Chen, Sam Alptekin, Viet Anh Trinh, Zhe Zhang, Colin Vaz, Venkatesh Ravichandran, Roland Maas, Ariya Rastrow
2023U2-KWS: Unified Two-Pass Open-Vocabulary Keyword Spotting with Keyword Bias.
Ao Zhang, Pan Zhou, Kaixun Huang, Yong Zou, Ming Liu, Lei Xie
2023Unconstrained Dysfluency Modeling for Dysfluent Speech Transcription and Detection.
Jiachen Lian, Carly Feng, Naasir Farooqi, Steve Li, Anshul Kashyap, Cheol Jun Cho, Peter Wu, Robbie Netzorg, Tingle Li, Gopala Krishna Anumanchipalli
2023Using Joint Training Speaker Encoder With Consistency Loss to Achieve Cross-Lingual Voice Conversion and Expressive Voice Conversion.
Houjian Guo, Chaoran Liu, Carlos Toshinori Ishi, Hiroshi Ishiguro
2023VITS-Based Singing Voice Conversion System with DSPGAN Post-Processing for SVCC2023.
Yiquan Zhou, Meng Chen, Yi Lei, Jihua Zhu, Weifeng Zhao
2023Variational Gaussian Process Data Uncertainty.
Jeremy Heng Meng Wong, Huayun Zhang, Nancy F. Chen
2023Vits-Based Singing Voice Conversion Leveraging Whisper and Multi-Scale F0 Modeling.
Ziqian Ning, Yuepeng Jiang, Zhichao Wang, Bin Zhang, Lei Xie
2023Voiceextender: Short-Utterance Text-Independent Speaker Verification With Guided Diffusion Model.
Yayun He, Zuheng Kang, Jianzong Wang, Junqing Peng, Jing Xiao
2023Vsanet: Real-Time Speech Enhancement Based on Voice Activity Detection and Causal Spatial Attention.
Yuewei Zhang, Huanbin Zou, Jie Zhu
2023WaveNeXt: ConvNeXt-Based Fast Neural Vocoder Without ISTFT layer.
Takuma Okamoto, Haruki Yamashita, Yamato Ohtani, Tomoki Toda, Hisashi Kawai
2023Whisper-Slu: Extending a Pretrained Speech-to-Text Transformer for Low Resource Spoken Language Understanding.
Quentin Meeus, Marie-Francine Moens, Hugo Van hamme
2023Wiki-En-ASR-Adapt: Large-Scale Synthetic Dataset for English ASR Customization.
Alexandra Antonova
2023Yodas: Youtube-Oriented Dataset for Audio and Speech.
Xinjian Li, Shinnosuke Takamichi, Takaaki Saeki, William Chen, Sayaka Shiota, Shinji Watanabe
2023Zero-Shot Domain-Sensitive Speech Recognition with Prompt-Conditioning Fine-Tuning.
Feng-Ting Liao, Yung-Chieh Chan, Yi-Chang Chen, Chan-Jan Hsu, Da-Shan Shiu
2023Zero-Shot Emotion Transfer for Cross-Lingual Speech Synthesis.
Yuke Li, Xinfa Zhu, Yi Lei, Hai Li, Junhui Liu, Danming Xie, Lei Xie
2023Zero-Shot Singing Voice Synthesis from Musical Score.
Jun-You Wang, Hung-yi Lee, Jyh-Shing Roger Jang, Li Su