| 2021 | "How Robust R U?": Evaluating Task-Oriented Dialogue Systems on Spoken Conversations. Seokhwan Kim, Yang Liu, Di Jin, Alexandros Papangelis, Karthik Gopalakrishnan, Behnam Hedayatnia, Dilek Hakkani-Tür |
| 2021 | 3D Spatial Features for Multi-Channel Target Speech Separation. Rongzhi Gu, Shi-Xiong Zhang, Meng Yu, Dong Yu |
| 2021 | A Comparative Study of Modular and Joint Approaches for Speaker-Attributed ASR on Monaural Long-Form Audio. Naoyuki Kanda, Xiong Xiao, Jian Wu, Tianyan Zhou, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka |
| 2021 | A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation. Yosuke Higuchi, Nanxin Chen, Yuya Fujita, Hirofumi Inaguma, Tatsuya Komatsu, Jaesong Lee, Jumon Nozaki, Tianzi Wang, Shinji Watanabe |
| 2021 | A Comparison of Streaming Models and Data Augmentation Methods for Robust Speech Recognition. Jiyeon Kim, Mehul Kumar, Dhananjaya Gowda, Abhinav Garg, Chanwoo Kim |
| 2021 | A Conformer-Based ASR Frontend for Joint Acoustic Echo Cancellation, Speech Enhancement and Speech Separation. Tom O'Malley, Arun Narayanan, Quan Wang, Alex Park, James Walker, Nathan Howard |
| 2021 | A Study of Transducer Based End-to-End ASR with ESPnet: Architecture, Auxiliary Loss and Decoding Strategies. Florian Boyer, Yusuke Shinohara, Takaaki Ishii, Hirofumi Inaguma, Shinji Watanabe |
| 2021 | A Study on Cross-Corpus Speech Emotion Recognition and Data Augmentation. Norbert Braunschweiler, Rama Doddipatla, Simon Keizer, Svetlana Stoyanchev |
| 2021 | AC-VC: Non-Parallel Low Latency Phonetic Posteriorgrams Based Voice Conversion. Damien Ronssin, Milos Cernak |
| 2021 | ASR Rescoring and Confidence Estimation with Electra. Hayato Futami, Hirofumi Inaguma, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara |
| 2021 | Action Item Detection in Meetings Using Pretrained Transformers. Kishan Sachdeva, Joshua Maynez, Olivier Siohan |
| 2021 | Adapting GPT, GPT-2 and BERT Language Models for Speech Recognition. Xianrui Zheng, Chao Zhang, Philip C. Woodland |
| 2021 | An ASR N-Best Transcript Neural Ranking Model for Spoken Content Retrieval. Yasufumi Moriya, Gareth J. F. Jones |
| 2021 | An End-to-End Far-Field Keyword Spotting System with Neural Beamforming. Xuan Ji, Lu Lu, Fuming Fang, Jianbo Ma, Lei Zhu, Jinke Li, Dongdi Zhao, Ming Liu, Feijun Jiang |
| 2021 | An Evaluation Benchmark for Automatic Speech Recognition of German-English Code-Switching. Abbas Khosravani, Philip N. Garner, Alexandros Lararidis |
| 2021 | An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition. Xuankai Chang, Takashi Maekaku, Pengcheng Guo, Jing Shi, Yen-Ju Lu, Aswin Shanmugam Subramanian, Tianzi Wang, Shu-Wen Yang, Yu Tsao, Hung-yi Lee, Shinji Watanabe |
| 2021 | Analysis of Conversational Speech with Application to Voice Adaptation. Bhagyashree Mukherjee, Anusha Prakash, Hema A. Murthy |
| 2021 | Applying X-Vectors on Pathological Speech After Larynx Removal. Ralph Scheuerer, Tino Haderlein, Elmar Nöth, Tobias Bocklet |
| 2021 | Are You Dictating to Me? Detecting Embedded Dictations in Doctor-Patient Conversations. Thomas Schaaf, Longxiang Zhang, Alireza Bayestehtashk, Mark C. Fuhs, Shahid Durrani, Susanne Burger, Monika Woszczyna, Thomas Polzin |
| 2021 | Assessing Evaluation Metrics for Speech-to-Speech Translation. Elizabeth Salesky, Julian Mäder, Severin Klinger |
| 2021 | Attention Based Model for Segmental Pronunciation Error Detection. Jose Antonio Lopez Saenz, Md Asif Jalal, Rosanna Milner, Thomas Hain |
| 2021 | Attention-Based Multi-Hypothesis Fusion for Speech Summarization. Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Shinji Watanabe |
| 2021 | Attention-Based Scaling Adaptation for Target Speech Extraction. Jiangyu Han, Wei Rao, Yanhua Long, Jiaen Liang |
| 2021 | Attentive Contextual Carryover for Multi-Turn End-to-End Spoken Language Understanding. Kai Wei, Thanh Tran, Feng-Ju Chang, Kanthashree Mysore Sathyendra, Thejaswi Muniyappa, Jing Liu, Anirudh Raju, Ross McGowan, Nathan Susanj, Ariya Rastrow, Grant P. Strimel |
| 2021 | Audio Embeddings Help to Learn Better Dialogue Policies. Asier López-Zorrilla, M. Inés Torres, Heriberto Cuayáhuitl |
| 2021 | Audio-Visual Speech Recognition is Worth $32\times 32\times 8$ Voxels. Dmitriy Serdyuk, Otavio Braga, Olivier Siohan |
| 2021 | Automatic Generation of Diagnostic Content Feedback in Spoken Language Learning and Assessment. Xinhao Wang, Christopher Hamill |
| 2021 | Automatic Speech Recognition for Low-Resource Languages: The Thuee Systems for the IARPA Openasr20 Evaluation. Jing Zhao, Gui-Xin Shi, Guan-Bo Wang, Wei-Qiang Zhang |
| 2021 | Beyond Isolated Utterances: Conversational Emotion Recognition. Raghavendra Pappagari, Piotr Zelasko, Jesús Villalba, Laureano Moro-Velázquez, Najim Dehak |
| 2021 | Boundary and Context Aware Training for CIF-Based Non-Autoregressive End-to-End ASR. Fan Yu, Haoneng Luo, Pengcheng Guo, Yuhao Liang, Zhuoyuan Yao, Lei Xie, Yingying Gao, Leijing Hou, Shilei Zhang |
| 2021 | ChannelAugment: Improving Generalization of Multi-Channel ASR by Training with Input Channel Randomization. Marco Gaudesi, Felix Weninger, Dushyant Sharma, Puming Zhan |
| 2021 | Colombian Dialect Recognition Based on Information Extracted from Speech and Text Signals. Daniel Escobar-Grisales, Cristian D. Ríos-Urrego, Diego Alexander Lopez-Santander, Jeferson David Gallo-Aristizábal, Juan Camilo Vásquez-Correa, Elmar Nöth, Juan Rafael Orozco-Arroyave |
| 2021 | Comparative Study of Different Tokenization Strategies for Streaming End-to-End ASR. Sachin Singh, Ashutosh Gupta, Aman Maghan, Dhananjaya Gowda, Shatrughan Singh, Chanwoo Kim |
| 2021 | Comparing the Benefit of Synthetic Training Data for Various Automatic Speech Recognition Architectures. Nick Rossenbach, Mohammad Zeineldeen, Benedikt Hilmes, Ralf Schlüter, Hermann Ney |
| 2021 | Comparison of Self-Supervised Speech Pre-Training Methods on Flemish Dutch. Jakob Poncelet, Hugo Van hamme |
| 2021 | Conferencingspeech Challenge: Towards Far-Field Multi-Channel Speech Enhancement for Video Conferencing. Wei Rao, Yihui Fu, Yanxin Hu, Xin Xu, Yvkai Jv, Jiangyu Han, Zhongjie Jiang, Lei Xie, Yannan Wang, Shinji Watanabe, Zheng-Hua Tan, Hui Bu, Tao Yu, Shidong Shang |
| 2021 | Context-Aware Transformer Transducer for Speech Recognition. Feng-Ju Chang, Jing Liu, Martin Radfar, Athanasios Mouchtaris, Maurizio Omologo, Ariya Rastrow, Siegfried Kunzmann |
| 2021 | Cross-Attention Conformer for Context Modeling in Speech Enhancement for ASR. Arun Narayanan, Chung-Cheng Chiu, Tom O'Malley, Quan Wang, Yanzhang He |
| 2021 | Cross-Lingual Transfer for Speech Processing Using Acoustic Language Similarity. Peter Wu, Jiatong Shi, Yifan Zhong, Shinji Watanabe, Alan W. Black |
| 2021 | Cyclegean: Cycle Generative Enhanced Adversarial Network for Voice Conversion. Xulong Zhang, Jianzong Wang, Ning Cheng, Edward Xiao, Jing Xiao |
| 2021 | DEEPA: A Deep Neural Analyzer for Speech and Singing Vocoding. Sergey Nikonorov, Berrak Sisman, Mingyang Zhang, Haizhou Li |
| 2021 | Data Augmentation for ASR Using TTS Via a Discrete Representation. Sei Ueno, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara |
| 2021 | Deciding Whether to Ask Clarifying Questions in Large-Scale Spoken Language Understanding. Joo-Kyung Kim, Guoyin Wang, Sungjin Lee, Young-Bum Kim |
| 2021 | Decoupling Recognition and Transcription in Mandarin ASR. Jiahong Yuan, Xingyu Cai, Dongji Gao, Renjie Zheng, Liang Huang, Kenneth Church |
| 2021 | DeepLip: A Benchmark for Deep Learning-Based Audio-Visual Lip Biometrics. Meng Liu, Longbiao Wang, Kong Aik Lee, Hanyi Zhang, Chang Zeng, Jianwu Dang |
| 2021 | Detecting Emotion Carriers by Combining Acoustic and Lexical Representations. Sebastian P. Bayerl, Aniruddha Tammewar, Korbinian Riedhammer, Giuseppe Riccardi |
| 2021 | Dialogue Strategy Adaptation to New Action Sets Using Multi-Dimensional Modelling. Simon Keizer, Norbert Braunschweiler, Svetlana Stoyanchev, Rama Doddipatla |
| 2021 | DiffSVC: A Diffusion Probabilistic Model for Singing Voice Conversion. Songxiang Liu, Yuewen Cao, Dan Su, Helen Meng |
| 2021 | Distilling Knowledge from Ensembles of Acoustic Models for Joint CTC-Attention End-to-End Speech Recognition. Yan Gao, Titouan Parcollet, Nicholas D. Lane |
| 2021 | Dive: End-to-End Speech Diarization Via Iterative Speaker Embedding. Neil Zeghidour, Olivier Teboul, David Grangier |
| 2021 | Dual-Encoder Architecture with Encoder Selection for Joint Close-Talk and Far-Talk Speech Recognition. Felix Weninger, Marco Gaudesi, Ralf Leibold, Roberto Gemello, Puming Zhan |
| 2021 | Duality Temporal-Channel-Frequency Attention Enhanced Speaker Representation Learning. Li Zhang, Qing Wang, Lei Xie |
| 2021 | EditSpeech: A Text Based Speech Editing System Using Partial Inference and Bidirectional Fusion. Daxin Tan, Liqun Deng, Yu Ting Yeung, Xin Jiang, Xiao Chen, Tan Lee |
| 2021 | Efficient Conformer: Progressive Downsampling and Grouped Attention for Automatic Speech Recognition. Maxime Burchi, Valentin Vielzeuf |
| 2021 | Efficient Keyword Spotting by Capturing Long-Range Interactions with Temporal Lambda Networks. Biel Tura, Santiago Escuder, Ferran Diego, Carlos Segura, Jordi Luque |
| 2021 | Enabling Zero-Shot Multilingual Spoken Language Translation with Language-Specific Encoders and Decoders. Carlos Escolano, Marta R. Costa-jussà, José A. R. Fonollosa, Carlos Segura |
| 2021 | Ensemble of Domain Adversarial Neural Networks for Speech Emotion Recognition. Shi-wook Lee |
| 2021 | Estimating the Generation Timing of Responsive Utterances by Active Listeners of Spoken Narratives. Koichiro Ito, Masaki Murata, Tomohiro Ohno, Shigeki Matsubara |
| 2021 | Exploring Teacher-Student Learning Approach for Multi-Lingual Speech-to-Intent Classification. Bidisha Sharma, Maulik C. Madhavi, Xuehao Zhou, Haizhou Li |
| 2021 | Expressive Voice Conversion: A Joint Framework for Speaker Identity and Emotional Style Transfer. Zongyang Du, Berrak Sisman, Kun Zhou, Haizhou Li |
| 2021 | Far-Field Speech Recognition Based on Complex-Valued Neural Networks and Inter-Frame Similarity Difference Method. Yifan Guo, Yifan Chen, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan |
| 2021 | Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates. Hirofumi Inaguma, Siddharth Dalmia, Brian Yan, Shinji Watanabe |
| 2021 | GLMSnet: Single Channel Speech Separation Framework in Noisy and Reverberant Environments. Huiyu Shi, Xi Chen, Tianlong Kong, Shouyi Yin, Peng Ouyang |
| 2021 | HASA-Net: A Non-Intrusive Hearing-Aid Speech Assessment Network. Hsin-Tien Chiang, Yi-Chiao Wu, Cheng Yu, Tomoki Toda, Hsin-Min Wang, Yih-Chun Hu, Yu Tsao |
| 2021 | Hearing Faces: Target Speaker Text-to-Speech Synthesis from a Face. Björn Plüster, Cornelius Weber, Leyuan Qu, Stefan Wermter |
| 2021 | HiTNet: Byte-to-BPE Hierarchical Transcription Network for End-to-End Speech Recognition. Dhananjaya Gowda, Abhinav Garg, Jiyeon Kim, Mehul Kumar, Sachin Singh, Ashutosh Gupta, Ankur Kumar, Nauman Dawalatabad, Aman Maghan, Shatrughan Singh, Chanwoo Kim |
| 2021 | Hierarchical Knowledge Distillation for Dialogue Sequence Labeling. Shota Orihashi, Yoshihiro Yamazaki, Naoki Makishima, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Ryo Masumura |
| 2021 | Human-Agent Collaboration Strategies for Vision-Grounded Instruction Following. Guan-Lin Chao, Ian R. Lane |
| 2021 | Hybrid Network with Multi-Level Global-Local Statistics Pooling for Robust Text-Independent Speaker Recognition. Woo Hyun Kang, Jahangir Alam, Abderrahim Fathan |
| 2021 | IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2021, Cartagena, Colombia, December 13-17, 2021 |
| 2021 | Improving ASR Error Correction Using N-Best Hypotheses. Linchen Zhu, Wenjie Liu, Linquan Liu, Edward Lin |
| 2021 | Improving HS-DACS Based Streaming Transformer ASR with Deep Reinforcement Learning. Mohan Li, Rama Doddipatla |
| 2021 | Improving Hybrid CTC/Attention End-to-End Speech Recognition with Pretrained Acoustic and Language Models. Keqi Deng, Songjun Cao, Yike Zhang, Long Ma |
| 2021 | Improving Reverberant Speech Separation with Synthetic Room Impulse Responses. Rohith Aralikatti, Anton Ratnarajah, Zhenyu Tang, Dinesh Manocha |
| 2021 | Improving Speaker Identification for Shared Devices by Adapting Embeddings to Speaker Subsets. Zhenning Tan, Yuguang Yang, Eunjung Han, Andreas Stolcke |
| 2021 | Improving Speech Recognition on Noisy Speech via Speech Enhancement with Multi-Discriminators CycleGAN. Chia-Yu Li, Ngoc Thang Vu |
| 2021 | Improving Text-Independent Speaker Verification with Auxiliary Speakers Using Graph. Jingyu Li, Si Ioi Ng, Tan Lee |
| 2021 | In Pursuit of Babel - Multilingual End-to-End Spoken Language Understanding. Markus Müller, Samridhi Choudhary, Clement Chung, Athanasios Mouchtaris, Siegfried Kunzmann |
| 2021 | Incorporating Real-World Noisy Speech in Neural-Network-Based Speech Enhancement Systems. Yangyang Xia, Buye Xu, Anurag Kumar |
| 2021 | Incremental Learning for End-to-End Automatic Speech Recognition. Li Fu, Xiaoxiao Li, Libo Zi, Zhengchen Zhang, Youzheng Wu, Xiaodong He, Bowen Zhou |
| 2021 | Injecting Text in Self-Supervised Speech Pretraining. Zhehuai Chen, Yu Zhang, Andrew Rosenberg, Bhuvana Ramabhadran, Gary Wang, Pedro J. Moreno |
| 2021 | Instant One-Shot Word-Learning for Context-Specific Neural Sequence-to-Sequence Speech Recognition. Christian Huber, Juan Hussain, Sebastian Stüker, Alexander Waibel |
| 2021 | Intent Recognition and Unsupervised Slot Identification for Low-Resourced Spoken Dialog Systems. Akshat Gupta, Olivia Deng, Akruti Kushwaha, Saloni Mittal, William Zeng, Sai Krishna Rallabandi, Alan W. Black |
| 2021 | Joint Prediction of Truecasing and Punctuation for Conversational Speech in Low-Resource Scenarios. Raghavendra Pappagari, Piotr Zelasko, Agnieszka Mikolajczyk, Piotr Pezik, Najim Dehak |
| 2021 | Kaizen: Continuously Improving Teacher Using Exponential Moving Average for Semi-Supervised Speech Recognition. Vimal Manohar, Tatiana Likhomanenko, Qiantong Xu, Wei-Ning Hsu, Ronan Collobert, Yatharth Saraf, Geoffrey Zweig, Abdelrahman Mohamed |
| 2021 | Latency-Controlled Neural Architecture Search for Streaming Speech Recognition. Liqiang He, Shulin Feng, Dan Su, Dong Yu |
| 2021 | Layer-Wise Analysis of a Self-Supervised Speech Representation Model. Ankita Pasad, Ju-Chieh Chou, Karen Livescu |
| 2021 | Learning How Long to Wait: Adaptively-Constrained Monotonic Multihead Attention for Streaming ASR. Jaeyun Song, Hajin Shim, Eunho Yang |
| 2021 | Learning Language and Speaker Information for Code-Switch Speech Synthesis with Limited Data. Mengxin Chai, Shaotong Guo, Cheng Gong, Longbiao Wang, Jianwu Dang, Ju Zhang |
| 2021 | Learning to Translate Low-Resourced Swiss German Dialectal Speech into Standard German Text. Abbas Khosravani, Philip N. Garner, Alexandros Lazaridis |
| 2021 | Leveraging Linguistic Knowledge for Accent Robustness of End-to-End Models. Andrea Carmantini, Steve Renals, Peter Bell |
| 2021 | Leveraging Pre-Trained Representations to Improve Access to Untranscribed Speech from Endangered Languages. Nay San, Martijn Bartelds, Mitchell Browne, Lily Clifford, Fiona Gibson, John Mansfield, David Nash, Jane Simpson, Myfany Turpin, Maria Vollmer, Sasha Wilmoth, Dan Jurafsky |
| 2021 | Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network. Takaaki Saeki, Shinnosuke Takamichi, Hiroshi Saruwatari |
| 2021 | MACCIF-TDNN: Multi Aspect Aggregation of Channel and Context Interdependence Features in TDNN-Based Speaker Verification. Fangyuan Wang, Zhigang Song, Hongchen Jiang, Bo Xu |
| 2021 | Mandarin Electrolaryngeal Speech Voice Conversion with Sequence-to-Sequence Modeling. Ming-Chi Yen, Wen-Chin Huang, Kazuhiro Kobayashi, Yu-Huai Peng, Shu-Wei Tsai, Yu Tsao, Tomoki Toda, Jyh-Shing Roger Jang, Hsin-Min Wang |
| 2021 | Multi-Granularity Annotation of Instantaneous Intelligibility of Learners' Utterances Based on Shadowing Techniques. Chuanbo Zhu, Ryo Hakoda, Daisuke Saito, Nobuaki Minematsu, Noriko Nakanishi, Tazuko Nishimura |
| 2021 | Multi-Stream HiFi-GAN with Data-Driven Waveform Decomposition. Takuma Okamoto, Tomoki Toda, Hisashi Kawai |
| 2021 | Multi-Task Audio Source Separation. Lu Zhang, Chenxing Li, Feng Deng, Xiaorui Wang |
| 2021 | Multi-Task Language Modeling for Improving Speech Recognition of Rare Words. Chao-Han Huck Yang, Linda Liu, Ankur Gandhe, Yile Gu, Anirudh Raju, Denis Filimonov, Ivan Bulyko |
| 2021 | Multi-Task Learning with Cross Attention for Keyword Spotting. Takuya Higuchi, Anmol Gupta, Chandra Dhir |
| 2021 | Multi-User Voicefilter-Lite via Attentive Speaker Embedding. Rajeev Rikhye, Quan Wang, Qiao Liang, Yanzhang He, Ian McGraw |
| 2021 | Multilingual and Crosslingual Speech Recognition Using Phonological-Vector Based Phone Embeddings. Chengrui Zhu, Keyu An, Huahuan Zheng, Zhijian Ou |
| 2021 | Multimodal Emotion Recognition with High-Level Speech and Text Features. Mariana Rodrigues Makiuchi, Kuniaki Uto, Koichi Shinoda |
| 2021 | Multitask Generative Adversarial Imitation Learning for Multi-Domain Dialogue System. Chuan-En Hsu, Mahdin Rohmatillah, Jen-Tzung Chien |
| 2021 | Non-Autoregressive Mandarin-English Code-Switching Speech Recognition. Shun-Po Chuang, Heng-Jui Chang, Sung-Feng Huang, Hung-yi Lee |
| 2021 | On Addressing Practical Challenges for RNN-Transducer. Rui Zhao, Jian Xue, Jinyu Li, Wenning Wei, Lei He, Yifan Gong |
| 2021 | On Architectures and Training for Raw Waveform Feature Extraction in ASR. Peter Vieting, Christoph Lüscher, Wilfried Michel, Ralf Schlüter, Hermann Ney |
| 2021 | On Lattice-Free Boosted MMI Training of HMM and CTC-Based Full-Context ASR Models. Xiaohui Zhang, Vimal Manohar, David Zhang, Frank Zhang, Yangyang Shi, Nayan Singhal, Julian Chan, Fuchun Peng, Yatharth Saraf, Mike Seltzer |
| 2021 | On Prosody Modeling for ASR+TTS Based Voice Conversion. Wen-Chin Huang, Tomoki Hayashi, Xinjian Li, Shinji Watanabe, Tomoki Toda |
| 2021 | On the Invertibility of a Voice Privacy System Using Embedding Alignment. Pierre Champion, Thomas Thebaud, Gaël Le Lan, Anthony Larcher, Denis Jouvet |
| 2021 | On-Device Neural Speech Synthesis. Sivanand Achanta, Albert Antony, Ladan Golipour, Jiangchuan Li, Tuomo Raitio, Ramya Rasipuram, Francesco Rossi, Jennifer Shi, Jaimin Upadhyay, David Winarsky, Hepeng Zhang |
| 2021 | On-The-Fly Data Augmentation for Text-to-Speech Style Transfer. Raymond Chung, Brian Mak |
| 2021 | Optimized Power Normalized Cepstral Coefficients Towards Robust Deep Speaker Verification. Xuechen Liu, Md. Sahidullah, Tomi Kinnunen |
| 2021 | Overlap-Aware Low-Latency Online Speaker Diarization Based on End-to-End Local Segmentation. Juan Manuel Coria, Hervé Bredin, Sahar Ghannay, Sophie Rosset |
| 2021 | PL-EESR: Perceptual Loss Based End-to-End Robust Speaker Representation Extraction. Yi Ma, Kong Aik Lee, Ville Hautamäki, Haizhou Li |
| 2021 | PSVD: Post-Training Compression of LSTM-Based RNN-T Models. Suwa Xu, Jinwon Lee, Jim Steele |
| 2021 | Parameterized Channel Normalization for Far-Field Deep Speaker Verification. Xuechen Liu, Md. Sahidullah, Tomi Kinnunen |
| 2021 | Reconstructing Dual Learning for Neural Voice Conversion Using Relatively Few Samples. Aolan Sun, Jianzong Wang, Ning Cheng, Methawee Tantrawenith, Zhiyong Wu, Helen Meng, Edward Xiao, Jing Xiao |
| 2021 | Relaxed Attention: A Simple Method to Boost Performance of End-to-End Automatic Speech Recognition. Timo Lohrenz, Patrick Schwarz, Zhengyang Li, Tim Fingscheidt |
| 2021 | Remember the Context! ASR Slot Error Correction Through Memorization. Dhanush Bekal, Ashish Shenoy, Monica Sunkara, Sravan Bodapati, Katrin Kirchhoff |
| 2021 | Robust Speech-Age Estimation Using Local Maximum Mean Discrepancy Under Mismatched Recording Conditions. Naohiro Tawara, Atsunori Ogawa, Yuki Kitagishi, Hosana Kamiyama, Yusuke Ijima |
| 2021 | SI-Net: Multi-Scale Context-Aware Convolutional Block for Speaker Verification. Zhuo Li, Ce Fang, Runqiu Xiao, Wenchao Wang, Yonghong Yan |
| 2021 | Scaling End-to-End Models for Large-Scale Multilingual ASR. Bo Li, Ruoming Pang, Tara N. Sainath, Anmol Gulati, Yu Zhang, James Qin, Parisa Haghani, W. Ronny Huang, Min Ma, Junwen Bai |
| 2021 | Scenario Aware Speech Recognition: Advancements for Apollo Fearless Steps & CHiME-4 Corpora. Szu-Jui Chen, Wei Xia, John H. L. Hansen |
| 2021 | Self-Supervised Metric Learning With Graph Clustering For Speaker Diarization. Prachi Singh, Sriram Ganapathy |
| 2021 | Semi-Supervised Transfer Learning for Language Expansion of End-to-End Speech Recognition Models to Low-Resource Languages. Jiyeon Kim, Mehul Kumar, Dhananjaya Gowda, Abhinav Garg, Chanwoo Kim |
| 2021 | Sequence Model with Self-Adaptive Sliding Window for Efficient Spoken Document Segmentation. Qinglin Zhang, Qian Chen, Yali Li, Jiaqing Liu, Wen Wang |
| 2021 | Short-Utterance Embedding Enhancement Method Based on Time Series Forecasting Technique for Text-Independent Speaker Verification. Jeong-Hwan Choi, Joon-Young Yang, Joon-Hyuk Chang |
| 2021 | Speaker Conditioning of Acoustic Models Using Affine Transformation for Multi-Speaker Speech Recognition. Midia Yousefi, John H. L. Hansen |
| 2021 | Speech Emotion Recognition Using Semi-Supervised Learning with Efficient Labeling Strategies. Zhi Zhu, Yoshinao Sato |
| 2021 | SpeechNAS: Towards Better Trade-Off Between Latency and Accuracy for Large-Scale Speaker Verification. Wentao Zhu, Tianlong Kong, Shun Lu, Jixiang Li, Dawei Zhang, Feng Deng, Xiaorui Wang, Sen Yang, Ji Liu |
| 2021 | Studying Squeeze-and-Excitation Used in CNN for Speaker Verification. Mickael Rouvier, Pierre-Michel Bousquet |
| 2021 | TENET: A Time-Reversal Enhancement Network for Noise-Robust ASR. Fu-An Chao, Shao-Wei Fan-Jiang, Bi-Cheng Yan, Jeih-weih Hung, Berlin Chen |
| 2021 | TGAVC: Improving Autoencoder Voice Conversion with Text-Guided and Adversarial Training. Huaizhen Tang, Xulong Zhang, Jianzong Wang, Ning Cheng, Zhen Zeng, Edward Xiao, Jing Xiao |
| 2021 | TS-RIR: Translated Synthetic Room Impulse Responses for Speech Augmentation. Anton Ratnarajah, Zhenyu Tang, Dinesh Manocha |
| 2021 | Target Language Extraction at Multilingual Cocktail Parties. Marvin Borsdorf, Haizhou Li, Tanja Schultz |
| 2021 | Textual Echo Cancellation. Shaojin Ding, Ye Jia, Ke Hu, Quan Wang |
| 2021 | Tiny-CRNN: Streaming Wakeword Detection in a Low Footprint Setting. Mohammad Omar Khursheed, Christin Jose, Rajath Kumar, Gengshen Fu, Brian Kulis, Santosh Kumar Cheekatmalla |
| 2021 | Topic Classification on Spoken Documents Using Deep Acoustic and Linguistic Features. Tan Liu, Wu Guo |
| 2021 | Towards Neural Diarization for Unlimited Numbers of Speakers Using Global and Local Attractors. Shota Horiguchi, Shinji Watanabe, Paola García, Yawen Xue, Yuki Takashima, Yohei Kawaguchi |
| 2021 | Towards Robust Mispronunciation Detection and Diagnosis for L2 English Learners with Accent-Modulating Methods. Shao-Wei Fan-Jiang, Bi-Cheng Yan, Tien-Hong Lo, Fu-An Chao, Berlin Chen |
| 2021 | Towards Using Heterogeneous Relation Graphs for End-to-End TTS. Amrith Setlur, Aman Madaan, Tanmay Parekh, Yiming Yang, Alan W. Black |
| 2021 | Tree-Constrained Pointer Generator for End-to-End Contextual Speech Recognition. Guangzhi Sun, Chao Zhang, Philip C. Woodland |
| 2021 | Two-Pass End-to-End ASR Model Compression. Nauman Dawalatabad, Tushar Vatsal, Ashutosh Gupta, Sungsoo Kim, Shatrughan Singh, Dhananjaya Gowda, Chanwoo Kim |
| 2021 | Uncertainty-Aware Pseudo-Labeling for Spoken Language Assessment. Binghuai Lin, Liyuan Wang |
| 2021 | Unsupervised Cross-Lingual Speech Emotion Recognition Using Pseudo Multilabel. Jin Li, Nan Yan, Lan Wang |
| 2021 | Unsupervised Domain Adaptation Schemes for Building ASR in Low-Resource Languages. Chandran Savithri Anoop, Prathosh A. P., A. G. Ramakrishnan |
| 2021 | Using Self Attention DNNs to Discover Phonemic Features for Audio Deep Fake Detection. Hira Dhamyal, Ayesha Ali, Ihsan Ayyub Qazi, Agha Ali Raza |
| 2021 | Utterance-Level Neural Confidence Measure for End-to-End Children Speech Recognition. Wei Liu, Tan Lee |
| 2021 | Variational Sequential Modeling, Learning and Understanding. Jen-Tzung Chien, Chih-Jung Tsai |
| 2021 | Vibrato Learning in Multi-Singer Singing Voice Synthesis. Ruolan Liu, Xue Wen, Chunhui Lu, Liming Song, June Sig Sung |
| 2021 | Voice to Action: Spoken Language Understanding for Memory-Constrained Systems. Ashutosh Gupta, Aditya Jayasimha, Aman Maghan, Shatrughan Singh, Dhananjaya Gowda, Chanwoo Kim |
| 2021 | Voxceleb Enrichment for Age and Gender Recognition. Khaled Hechmi, Trung Ngo Trong, Ville Hautamäki, Tomi Kinnunen |
| 2021 | Warped Ensembles: A Novel Technique for Improving CTC Based End-to-End Speech Recognition. Kiran Praveen, Hardik B. Sailor, Abhishek Pandey |
| 2021 | What does the User Want? Information Gain for Hierarchical Dialogue Policy Optimisation. Christian Geishauser, Songbo Hu, Hsien-Chin Lin, Nurul Lubis, Michael Heck, Shutong Feng, Carel van Niekerk, Milica Gasic |
| 2021 | Word-Level Confidence Estimation for RNN Transducers. Mingqiu Wang, Hagen Soltau, Laurent El Shafey, Izhak Shafran |
| 2021 | X-SHOT: Learning to Rank Voice Applications Via Cross-Locale Shard-Based Co-Training. Zheng Gao, Mohamed Abdelhady, Radhika Arava, Xibin Gao, Qian Hu, Wei Xiao, Thahir Mohamed |
| 2021 | w2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training. Yu-An Chung, Yu Zhang, Wei Han, Chung-Cheng Chiu, James Qin, Ruoming Pang, Yonghui Wu |