| 2020 | "This is Houston. Say again, please". The Behavox System for the Apollo-11 Fearless Steps Challenge (Phase II). Arseniy Gorin, Daniil Kulko, Steven Grima, Alex Glasman |
| 2020 | 1-D Row-Convolution LSTM: Fast Streaming ASR at Accuracy Parity with LC-BLSTM. Kshitiz Kumar, Chaojun Liu, Yifan Gong, Jian Wu |
| 2020 | 21st Annual Conference of the International Speech Communication Association, Interspeech 2020, Virtual Event, Shanghai, China, October 25-29, 2020. Helen Meng, Bo Xu, Thomas Fang Zheng |
| 2020 | A 43 Language Multilingual Punctuation Prediction Neural Network Model. Xinxing Li, Edward Lin |
| 2020 | A Comparative Re-Assessment of Feature Extractors for Deep Speaker Embeddings. Xuechen Liu, Md. Sahidullah, Tomi Kinnunen |
| 2020 | A Comparative Study of Speech Anonymization Metrics. Mohamed Maouche, Brij Mohan Lal Srivastava, Nathalie Vauquier, Aurélien Bellet, Marc Tommasi, Emmanuel Vincent |
| 2020 | A Comparison of Acoustic and Linguistics Methodologies for Alzheimer's Dementia Recognition. Nicholas Cummins, Yilin Pan, Zhao Ren, Julian Fritsch, Venkata Srikanth Nallanthighal, Heidi Christensen, Daniel Blackburn, Björn W. Schuller, Mathew Magimai-Doss, Helmer Strik, Aki Härmä |
| 2020 | A Comparison of English Rhythm Produced by Native American Speakers and Mandarin ESL Primary School Learners. Hongwei Ding, Binghuai Lin, Liyuan Wang, Hui Wang, Ruomei Fang |
| 2020 | A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning. Sameer Khurana, Antoine Laurent, Wei-Ning Hsu, Jan Chorowski, Adrian Lancucki, Ricard Marxer, James R. Glass |
| 2020 | A Cross-Channel Attention-Based Wave-U-Net for Multi-Channel Speech Enhancement. Minh Tri Ho, Jinyoung Lee, Bong-Ki Lee, Dong Hoon Yi, Hong-Goo Kang |
| 2020 | A Cyclical Post-Filtering Approach to Mismatch Refinement of Neural Vocoder for Text-to-Speech Systems. Yi-Chiao Wu, Patrick Lumban Tobing, Kazuki Yasuhara, Noriyuki Matsunaga, Yamato Ohtani, Tomoki Toda |
| 2020 | A DNN-HMM-DNN Hybrid Model for Discovering Word-Like Units from Spoken Captions and Image Regions. Liming Wang, Mark Hasegawa-Johnson |
| 2020 | A Deep 2D Convolutional Network for Waveform-Based Speech Recognition. Dino Oglic, Zoran Cvetkovic, Peter Bell, Steve Renals |
| 2020 | A Deep Learning Approach to Active Noise Control. Hao Zhang, DeLiang Wang |
| 2020 | A Deep Learning-Based Kalman Filter for Speech Enhancement. Sujan Kumar Roy, Aaron Nicolson, Kuldip K. Paliwal |
| 2020 | A Differentiable Perceptual Audio Metric Learned from Just Noticeable Differences. Pranay Manocha, Adam Finkelstein, Richard Zhang, Nicholas J. Bryan, Gautham J. Mysore, Zeyu Jin |
| 2020 | A Dynamic 3D Pronunciation Teaching Model Based on Pronunciation Attributes and Anatomy. Xiaoli Feng, Yanlu Xie, Yayue Deng, Boxue Li |
| 2020 | A Federated Approach in Training Acoustic Models. Dimitrios Dimitriadis, Ken'ichi Kumatani, Robert Gmyr, Yashesh Gaur, Sefik Emre Eskimez |
| 2020 | A Hybrid HMM-Waveglow Based Text-to-Speech Synthesizer Using Histogram Equalization for Low Resource Indian Languages. Mano Ranjith Kumar M., Sudhanshu Srivastava, Anusha Prakash, Hema A. Murthy |
| 2020 | A Joint Framework for Audio Tagging and Weakly Supervised Acoustic Event Detection Using DenseNet with Global Average Pooling. Chieh-Chi Kao, Bowen Shi, Ming Sun, Chao Wang |
| 2020 | A Lightweight Model Based on Separable Convolution for Speech Emotion Recognition. Ying Zhong, Ying Hu, Hao Huang, Wushour Silamu |
| 2020 | A Low Latency ASR-Free End to End Spoken Language Understanding System. Mohamed Mhiri, Samuel Myer, Vikrant Singh Tomar |
| 2020 | A Machine of Few Words: Interactive Speaker Recognition with Reinforcement Learning. Mathieu Seurin, Florian Strub, Philippe Preux, Olivier Pietquin |
| 2020 | A Mandarin L2 Learning APP with Mispronunciation Detection and Feedback. Yanlu Xie, Xiaoli Feng, Boxue Li, Jinsong Zhang, Yujia Jin |
| 2020 | A Mask-Based Model for Mandarin Chinese Polyphone Disambiguation. Haiteng Zhang, Huashan Pan, Xiulin Li |
| 2020 | A Multi-Scale Fusion Framework for Bimodal Speech Emotion Recognition. Ming Chen, Xudong Zhao |
| 2020 | A New Training Pipeline for an Improved Neural Transducer. Albert Zeyer, André Merboldt, Ralf Schlüter, Hermann Ney |
| 2020 | A Noise Robust Technique for Detecting Vowels in Speech Signals. Avinash Kumar, S. Shahnawazuddin, Waquar Ahmad |
| 2020 | A Noise-Aware Memory-Attention Network Architecture for Regression-Based Speech Enhancement. Yu-Xuan Wang, Jun Du, Li Chai, Chin-Hui Lee, Jia Pan |
| 2020 | A Perceptual Study of the Five Level Tones in Hmu (Xinzhai Variety). Wen Liu |
| 2020 | A Perceptually-Motivated Approach for Low-Complexity, Real-Time Enhancement of Fullband Speech. Jean-Marc Valin, Umut Isik, Neerad Phansalkar, Ritwik Giri, Karim Helwani, Arvindh Krishnaswamy |
| 2020 | A Pyramid Recurrent Network for Predicting Crowdsourced Speech-Quality Ratings of Real-World Signals. Xuan Dong, Donald S. Williamson |
| 2020 | A Real-Time Robot-Based Auxiliary System for Risk Evaluation of COVID-19 Infection. Wenqi Wei, Jianzong Wang, Jiteng Ma, Ning Cheng, Jing Xiao |
| 2020 | A Recursive Network with Dynamic Attention for Monaural Speech Enhancement. Andong Li, Chengshi Zheng, Cunhang Fan, Renhua Peng, Xiaodong Li |
| 2020 | A Robust and Cascaded Acoustic Echo Cancellation Based on Deep Learning. Chenggang Zhang, Xueliang Zhang |
| 2020 | A Semi-Blind Source Separation Approach for Speech Dereverberation. Ziteng Wang, Yueyue Na, Zhang Liu, Yun Li, Biao Tian, Qiang Fu |
| 2020 | A Sound Engineering Approach to Near End Listening Enhancement. Carol Chermaz, Simon King |
| 2020 | A Space-and-Speaker-Aware Iterative Mask Estimation Approach to Multi-Channel Speech Recognition in the CHiME-6 Challenge. Yanhui Tu, Jun Du, Lei Sun, Feng Ma, Jia Pan, Chin-Hui Lee |
| 2020 | A Transformer-Based Audio Captioning Model with Keyword Estimation. Yuma Koizumi, Ryo Masumura, Kyosuke Nishida, Masahiro Yasuda, Shoichiro Saito |
| 2020 | A Unified Framework for Low-Latency Speaker Extraction in Cocktail Party Environments. Yunzhe Hao, Jiaming Xu, Jing Shi, Peng Zhang, Lei Qin, Bo Xu |
| 2020 | ARET: Aggregated Residual Extended Time-Delay Neural Networks for Speaker Verification. Ruiteng Zhang, Jianguo Wei, Wenhuan Lu, Longbiao Wang, Meng Liu, Lin Zhang, Jiayu Jin, Junhai Xu |
| 2020 | ARVC: An Auto-Regressive Voice Conversion System Without Parallel Training Data. Zheng Lian, Zhengqi Wen, Xinyong Zhou, Songbai Pu, Shengkai Zhang, Jianhua Tao |
| 2020 | ASAPP-ASR: Multistream CNN and Self-Attentive SRU for SOTA Speech Recognition. Jing Pan, Joshua Shapiro, Jeremy Wohlwend, Kyu Jeong Han, Tao Lei, Tao Ma |
| 2020 | ASR Error Correction with Augmented Transformer for Entity Retrieval. Haoyu Wang, Shuyan Dong, Yue Liu, James Logan, Ashish Kumar Agrawal, Yang Liu |
| 2020 | ASR-Based Evaluation and Feedback for Individualized Reading Practice. Yu Bai, Ferdy Hubers, Catia Cucchiarini, Helmer Strik |
| 2020 | ASR-Free Pronunciation Assessment. Sitong Cheng, Zhixin Liu, Lantian Li, Zhiyuan Tang, Dong Wang, Thomas Fang Zheng |
| 2020 | ATCSpeech: A Multilingual Pilot-Controller Speech Corpus from Real Air Traffic Control Environment. Bo Yang, Xianlong Tan, Zhengmao Chen, Bing Wang, Min Ruan, Dan Li, Zhongping Yang, Xiping Wu, Yi Lin |
| 2020 | ATReSN-Net: Capturing Attentive Temporal Relations in Semantic Neighborhood for Acoustic Scene Classification. Liwen Zhang, Jiqing Han, Ziqiang Shi |
| 2020 | Abstractive Spoken Document Summarization Using Hierarchical Model with Multi-Stage Attention Diversity Optimization. Potsawee Manakul, Mark J. F. Gales, Linlin Wang |
| 2020 | Accurate Detection of Wake Word Start and End Using a CNN. Christin Jose, Yuriy Mishchenko, Thibaud Sénéchal, Anish Shah, Alex Escott, Shiv Naga Prasad Vitaladevuni |
| 2020 | Achieving Multi-Accent ASR via Unsupervised Acoustic Model Adaptation. M. A. Tugtekin Turan, Emmanuel Vincent, Denis Jouvet |
| 2020 | Acoustic Feature Extraction with Interpretable Deep Neural Network for Neurodegenerative Related Disorder Classification. Yilin Pan, Bahman Mirheidari, Zehai Tu, Ronan O'Malley, Traci Walker, Annalena Venneri, Markus Reuber, Daniel Blackburn, Heidi Christensen |
| 2020 | Acoustic Properties of Strident Fricatives at the Edges: Implications for Consonant Discrimination. Louis-Marie Lorin, Lorenzo Maselli, Léo Varnet, Maria Giavazzi |
| 2020 | Acoustic Scene Analysis with Multi-Head Attention Networks. Weimin Wang, Weiran Wang, Ming Sun, Chao Wang |
| 2020 | Acoustic Scene Classification Using Audio Tagging. Jee-weon Jung, Hye-jin Shim, Ju-ho Kim, Seung-Bin Kim, Ha-Jin Yu |
| 2020 | Acoustic Signal Enhancement Using Relative Harmonic Coefficients: Spherical Harmonics Domain Approach. Yonggang Hu, Prasanga N. Samarasinghe, Thushara D. Abhayapala |
| 2020 | Acoustic-Based Articulatory Phenotypes of Amyotrophic Lateral Sclerosis and Parkinson's Disease: Towards an Interpretable, Hypothesis-Driven Framework of Motor Control. Hannah P. Rowe, Sarah E. Gutz, Marc F. Maffei, Jordan R. Green |
| 2020 | Acoustic-to-Articulatory Inversion with Deep Autoregressive Articulatory-WaveNet. Narjes Bozorg, Michael T. Johnson |
| 2020 | Adaptive Compressive Onset-Enhancement for Improved Speech Intelligibility in Noise and Reverberation. Felicitas Bederna, Henning F. Schepker, Christian Rollwage, Simon Doclo, Arne Pusch, Jörg Bitzer, Jan Rennies |
| 2020 | Adaptive Domain-Aware Representation Learning for Speech Emotion Recognition. Weiquan Fan, Xiangmin Xu, Xiaofen Xing, Dongyan Huang |
| 2020 | Adaptive Neural Speech Enhancement with a Denoising Variational Autoencoder. Yoshiaki Bando, Kouhei Sekiguchi, Kazuyoshi Yoshii |
| 2020 | Adaptive Speaker Normalization for CTC-Based Speech Recognition. Fenglin Ding, Wu Guo, Bin Gu, Zhen-Hua Ling, Jun Du |
| 2020 | Advancing Multiple Instance Learning with Attention Modeling for Categorical Speech Emotion Recognition. Shuiyang Mao, P. C. Ching, C.-C. Jay Kuo, Tan Lee |
| 2020 | Adventitious Respiratory Classification Using Attentive Residual Neural Networks. Zijiang Yang, Shuo Liu, Meishu Song, Emilia Parada-Cabaleiro, Björn W. Schuller |
| 2020 | Adversarial Audio: A New Information Hiding Method. Yehao Kong, Jiliang Zhang |
| 2020 | Adversarial Dictionary Learning for Monaural Speech Enhancement. Yunyun Ji, Longting Xu, Wei-Ping Zhu |
| 2020 | Adversarial Domain Adaptation for Speaker Verification Using Partially Shared Network. Zhengyang Chen, Shuai Wang, Yanmin Qian |
| 2020 | Adversarial Latent Representation Learning for Speech Enhancement. Yuanhang Qiu, Ruili Wang |
| 2020 | Adversarial Separation Network for Speaker Recognition. Hanyi Zhang, Longbiao Wang, Yunchun Zhang, Meng Liu, Kong Aik Lee, Jianguo Wei |
| 2020 | Adversarial Separation and Adaptation Network for Far-Field Speaker Verification. Lu Yi, Man-Wai Mak |
| 2020 | Adversarially Trained Multi-Singer Sequence-to-Sequence Singing Synthesizer. Jie Wu, Jian Luan |
| 2020 | Affective Conditioning on Hierarchical Attention Networks Applied to Depression Detection from Transcribed Clinical Interviews. Danai Xezonaki, Georgios Paraskevopoulos, Alexandros Potamianos, Shrikanth Narayanan |
| 2020 | Age-Related Differences of Tone Perception in Mandarin-Speaking Seniors. Yan Feng, Gang Peng, William Shi-Yuan Wang |
| 2020 | Air-Tissue Boundary Segmentation in Real Time Magnetic Resonance Imaging Video Using 3-D Convolutional Neural Network. Renuka Mannem, Navaneetha Gaddam, Prasanta Kumar Ghosh |
| 2020 | All-in-One Transformer: Unifying Speech Recognition, Audio Tagging, and Event Detection. Niko Moritz, Gordon Wichern, Takaaki Hori, Jonathan Le Roux |
| 2020 | Alzheimer's Dementia Recognition Through Spontaneous Speech: The ADReSS Challenge. Saturnino Luz, Fasih Haider, Sofia de la Fuente, Davida Fromm, Brian MacWhinney |
| 2020 | An Acoustic Segment Model Based Segment Unit Selection Approach to Acoustic Scene Classification with Partial Utterances. Hu Hu, Sabato Marco Siniscalchi, Yannan Wang, Xue Bai, Jun Du, Chin-Hui Lee |
| 2020 | An Adaptive X-Vector Model for Text-Independent Speaker Verification. Bin Gu, Wu Guo, Fenglin Ding, Zhen-Hua Ling, Jun Du |
| 2020 | An Alternative to MFCCs for ASR. Pegah Ghahramani, Hossein Hadian, Daniel Povey, Hynek Hermansky, Sanjeev Khudanpur |
| 2020 | An Analysis of Prosodic Prominence Cues to Information Structure in Egyptian Arabic. Dina El Zarka, Anneliese Kelterer, Barbara Schuppler |
| 2020 | An Audio-Based Wakeword-Independent Verification System. Joe Wang, Rajath Kumar, Mike Rodehorst, Brian Kulis, Shiv Naga Prasad Vitaladevuni |
| 2020 | An Audio-Enriched BERT-Based Framework for Spoken Multiple-Choice Question Answering. Chia-Chih Kuo, Shang-Bao Luo, Kuan-Yu Chen |
| 2020 | An Early Study on Intelligent Analysis of Speech Under COVID-19: Severity, Sleep Quality, Fatigue, and Anxiety. Jing Han, Kun Qian, Meishu Song, Zijiang Yang, Zhao Ren, Shuo Liu, Juan Liu, Huaiyuan Zheng, Wei Ji, Tomoya Koike, Xiao Li, Zixing Zhang, Yoshiharu Yamamoto, Björn W. Schuller |
| 2020 | An Effective Domain Adaptive Post-Training Method for BERT in Response Selection. Taesun Whang, Dongyub Lee, Chanhee Lee, Kisu Yang, Dongsuk Oh, Heuiseok Lim |
| 2020 | An Effective End-to-End Modeling Approach for Mispronunciation Detection. Tien-Hong Lo, Shi-Yan Weng, Hsiu-Jui Chang, Berlin Chen |
| 2020 | An Effective Perturbation Based Semi-Supervised Learning Method for Sound Event Detection. Xu Zheng, Yan Song, Jie Yan, Li-Rong Dai, Ian McLoughlin, Lin Liu |
| 2020 | An Effective Speaker Recognition Method Based on Joint Identification and Verification Supervisions. Ying Liu, Yan Song, Yiheng Jiang, Ian McLoughlin, Lin Liu, Li-Rong Dai |
| 2020 | An Efficient Subband Linear Prediction for LPCNet-Based Neural Synthesis. Yang Cui, Xi Wang, Lei He, Frank K. Soong |
| 2020 | An Efficient Temporal Modeling Approach for Speech Emotion Recognition by Mapping Varied Duration Sentences into Fixed Number of Chunks. Wei-Cheng Lin, Carlos Busso |
| 2020 | An End-to-End Architecture of Online Multi-Channel Speech Separation. Jian Wu, Zhuo Chen, Jinyu Li, Takuya Yoshioka, Zhili Tan, Ed Lin, Yi Luo, Lei Xie |
| 2020 | An End-to-End Mispronunciation Detection System for L2 English Speech Leveraging Novel Anti-Phone Modeling. Bi-Cheng Yan, Meng-Che Wu, Hsiao-Tsung Hung, Berlin Chen |
| 2020 | An Evaluation of Manual and Semi-Automatic Laughter Annotation. Bogdan Ludusan, Petra Wagner |
| 2020 | An Evaluation of the Effect of Anxiety on Speech - Computational Prediction of Anxiety from Sustained Vowels. Alice Baird, Nicholas Cummins, Sebastian Schnieder, Jarek Krajewski, Björn W. Schuller |
| 2020 | An Interactive Adversarial Reward Learning-Based Spoken Language Understanding System. Yu Wang, Yilin Shen, Hongxia Jin |
| 2020 | An Investigation of Cross-Cultural Semi-Supervised Learning for Continuous Affect Recognition. Adria Mallol-Ragolta, Nicholas Cummins, Björn W. Schuller |
| 2020 | An Investigation of Few-Shot Learning in Spoken Term Classification. Yangbin Chen, Tom Ko, Lifeng Shang, Xiao Chen, Xin Jiang, Qing Li |
| 2020 | An Investigation of Phone-Based Subword Units for End-to-End Speech Recognition. Weiran Wang, Guangsen Wang, Aadyot Bhatnagar, Yingbo Zhou, Caiming Xiong, Richard Socher |
| 2020 | An Investigation of the Target Approximation Model for Tone Modeling and Recognition in Continuous Mandarin Speech. Yingming Gao, Xinyu Zhang, Yi Xu, Jinsong Zhang, Peter Birkholz |
| 2020 | An Investigation of the Virtual Lip Trajectories During the Production of Bilabial Stops and Nasal at Different Speaking Rates. Tilak Purohit, Prasanta Kumar Ghosh |
| 2020 | An NMF-HMM Speech Enhancement Method Based on Kullback-Leibler Divergence. Yang Xiang, Liming Shi, Jesper Lisby Højvang, Morten Højfeldt Rasmussen, Mads Græsbøll Christensen |
| 2020 | An Objective Voice Gender Scoring System and Identification of the Salient Acoustic Measures. Fuling Chen, Roberto Togneri, Murray Maybery, Diana Tan |
| 2020 | An Open Source Implementation of ITU-T Recommendation P.808 with Validation. Babak Naderi, Ross Cutler |
| 2020 | An Open-Source Voice Type Classifier for Child-Centered Daylong Recordings. Marvin Lavechin, Ruben Bousbib, Hervé Bredin, Emmanuel Dupoux, Alejandrina Cristià |
| 2020 | An Unsupervised Method to Select a Speaker Subset from Large Multi-Speaker Speech Synthesis Datasets. Pilar Oplustil Gallegos, Jennifer Williams, Joanna Rownicka, Simon King |
| 2020 | An Utterance Verification System for Word Naming Therapy in Aphasia. David S. Barbera, Mark A. Huckvale, Victoria Fleming, Emily Upton, Henry Coley-Fisher, Ian Shaw, William H. Latham, Alexander P. Leff, Jenny Crinion |
| 2020 | Analysis of Disfluency in Children's Speech. Trang Tran, Morgan Tinkler, Gary Yeung, Abeer Alwan, Mari Ostendorf |
| 2020 | Analyzing Breath Signals for the Interspeech 2020 ComParE Challenge. John Mendonça, Francisco Teixeira, Isabel Trancoso, Alberto Abad |
| 2020 | Analyzing Read Aloud Speech by Primary School Pupils: Insights for Research and Development. S. Limonard, Catia Cucchiarini, R. W. N. M. van Hout, Helmer Strik |
| 2020 | Analyzing the Quality and Stability of a Streaming End-to-End On-Device Speech Recognizer. Yuan Shangguan, Kate Knister, Yanzhang He, Ian McGraw, Françoise Beaufays |
| 2020 | Angular Margin Centroid Loss for Text-Independent Speaker Recognition. Yuheng Wei, Junzhao Du, Hui Liu |
| 2020 | Anti-Aliasing Regularization in Stacking Layers. Antoine Bruguier, Ananya Misra, Arun Narayanan, Rohit Prabhavalkar |
| 2020 | Aphasic Speech Recognition Using a Mixture of Speech Intelligibility Experts. Matthew Perez, Zakaria Aldeneh, Emily Mower Provost |
| 2020 | Are Germans Better Haters Than Danes? Language-Specific Implicit Prosodies of Types of Hate Speech and How They Relate to Perceived Severity and Societal Rules. Jana Neitsch, Oliver Niebuhr |
| 2020 | Are Neural Open-Domain Dialog Systems Robust to Speech Recognition Errors in the Dialog History? An Empirical Study. Karthik Gopalakrishnan, Behnam Hedayatnia, Longshaokan Wang, Yang Liu, Dilek Hakkani-Tür |
| 2020 | Are you Wearing a Mask? Improving Mask Detection from Speech Using Augmentation by Cycle-Consistent GANs. Nicolae-Catalin Ristea, Radu Tudor Ionescu |
| 2020 | Assessment of Parkinson's Disease Medication State Through Automatic Speech Analysis. Anna Pompili, Rubén Solera-Ureña, Alberto Abad, Rita Cardoso, Isabel Guimarães, Margherita Fabbri, Isabel P. Martins, Joaquim J. Ferreira |
| 2020 | Asteroid: The PyTorch-Based Audio Source Separation Toolkit for Researchers. Manuel Pariente, Samuele Cornell, Joris Cosentino, Sunit Sivasankaran, Efthymios Tzinis, Jens Heitkaemper, Michel Olvera, Fabian-Robert Stöter, Mathieu Hu, Juan M. Martín-Doñas, David Ditter, Ariel Frank, Antoine Deleforge, Emmanuel Vincent |
| 2020 | Atss-Net: Target Speaker Separation via Attention-Based Neural Network. Tingle Li, Qingjian Lin, Yuanyuan Bao, Ming Li |
| 2020 | Attention Forcing for Speech Synthesis. Qingyun Dou, Joshua Efiong, Mark J. F. Gales |
| 2020 | Attention Wave-U-Net for Acoustic Echo Cancellation. Jung-Hee Kim, Joon-Hyuk Chang |
| 2020 | Attention and Encoder-Decoder Based Models for Transforming Articulatory Movements at Different Speaking Rates. Abhayjeet Singh, Aravind Illa, Prasanta Kumar Ghosh |
| 2020 | Attention to Indexical Information Improves Voice Recall. Grant L. McGuire, Molly Babel |
| 2020 | Attention-Based Speaker Embeddings for One-Shot Voice Conversion. Tatsuma Ishihara, Daisuke Saito |
| 2020 | Attention-Driven Projections for Soundscape Classification. Dhanunjaya Varma Devalraju, Muralikrishna H, Padmanabhan Rajan, Dileep Aroor Dinesh |
| 2020 | Attentive Convolutional Recurrent Neural Network Using Phoneme-Level Acoustic Representation for Rare Sound Event Detection. Shreya G. Upadhyay, Bo-Hao Su, Chi-Chun Lee |
| 2020 | Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based Variable-Length Embedding. Seungwoo Choi, Seungju Han, Dongyoung Kim, Sungjoo Ha |
| 2020 | Audio Dequantization for High Fidelity Audio Generation in Flow-Based Neural Vocoder. Hyun-Wook Yoon, Sang-Hoon Lee, Hyeong-Rae Noh, Seong-Whan Lee |
| 2020 | Audio-Visual Multi-Channel Recognition of Overlapped Speech. Jianwei Yu, Bo Wu, Rongzhi Gu, Shi-Xiong Zhang, Lianwu Chen, Yong Xu, Meng Yu, Dan Su, Dong Yu, Xunying Liu, Helen Meng |
| 2020 | Audio-Visual Multi-Speaker Tracking Based on the GLMB Framework. Shoufeng Lin, Xinyuan Qian |
| 2020 | Audio-Visual Speaker Recognition with a Cross-Modal Discriminative Network. Ruijie Tao, Rohan Kumar Das, Haizhou Li |
| 2020 | Audiovisual Correspondence Learning in Humans and Machines. Venkat Krishnamohan, Akshara Soman, Anshul Gupta, Sriram Ganapathy |
| 2020 | Augmenting Generative Adversarial Networks for Speech Emotion Recognition. Siddique Latif, Muhammad Asim, Rajib Rana, Sara Khalifa, Raja Jurdak, Björn W. Schuller |
| 2020 | Augmenting Images for ASR and TTS Through Single-Loop and Dual-Loop Multimodal Chain Framework. Johanes Effendi, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura |
| 2020 | Augmenting Turn-Taking Prediction with Wearable Eye Activity During Conversation. Hang Li, Siyuan Chen, Julien Epps |
| 2020 | AutoSpeech 2020: The Second Automated Machine Learning Challenge for Speech Classification. Jingsong Wang, Tom Ko, Zhen Xu, Xiawei Guo, Souxiang Liu, Wei-Wei Tu, Lei Xie |
| 2020 | AutoSpeech: Neural Architecture Search for Speaker Recognition. Shaojin Ding, Tianlong Chen, Xinyu Gong, Weiwei Zha, Zhangyang Wang |
| 2020 | Autoencoder Bottleneck Features with Multi-Task Optimisation for Improved Continuous Dysarthric Speech Recognition. Zhengjun Yue, Heidi Christensen, Jon Barker |
| 2020 | Automated Screening for Alzheimer's Dementia Through Spontaneous Speech. Muhammad Shehram Shah Syed, Zafi Sherhan Syed, Margaret Lech, Elena Pirogova |
| 2020 | Automatic Analysis of Speech Prosody in Dutch. Na Hu, Berit Janssen, Judith Hanssen, Carlos Gussenhoven, Aoju Chen |
| 2020 | Automatic Assessment of Dysarthric Severity Level Using Audio-Video Cross-Modal Approach in Deep Learning. Han Tong, Hamid R. Sharifzadeh, Ian McLoughlin |
| 2020 | Automatic Detection of Accent and Lexical Pronunciation Errors in Spontaneous Non-Native English Speech. Konstantinos Kyriakopoulos, Kate M. Knill, Mark J. F. Gales |
| 2020 | Automatic Detection of Phonological Errors in Child Speech Using Siamese Recurrent Autoencoder. Si Ioi Ng, Tan Lee |
| 2020 | Automatic Discrimination of Apraxia of Speech and Dysarthria Using a Minimalistic Set of Handcrafted Features. Ina Kodrasi, Michaela Pernon, Marina Laganaro, Hervé Bourlard |
| 2020 | Automatic Estimation of Intelligibility Measure for Consonants in Speech. Ali Abavisani, Mark Hasegawa-Johnson |
| 2020 | Automatic Estimation of Pathological Voice Quality Based on Recurrent Neural Network Using Amplitude and Phase Spectrogram. Shunsuke Hidaka, Yogaku Lee, Kohei Wakamiya, Takashi Nakagawa, Tokihiko Kaburagi |
| 2020 | Automatic Glottis Detection and Segmentation in Stroboscopic Videos Using Convolutional Networks. Divya Degala, M. V. Achuth Rao, Rahul Krishnamurthy, Pebbili Gopikishore, Veeramani Priyadharshini, Prakash T. K., Prasanta Kumar Ghosh |
| 2020 | Automatic Prediction of Confidence Level from Children's Oral Reading Recordings. Kamini Sabu, Preeti Rao |
| 2020 | Automatic Prediction of Speech Intelligibility Based on X-Vectors in the Context of Head and Neck Cancer. Sebastião Quintas, Julie Mauclair, Virginie Woisard, Julien Pinquier |
| 2020 | Automatic Quality Assessment for Audio-Visual Verification Systems. The LOVe Submission to NIST SRE Challenge 2019. Grigory Antipov, Nicolas Gengembre, Olivier Le Blouch, Gaël Le Lan |
| 2020 | Automatic Scoring at Multi-Granularity for L2 Pronunciation. Binghuai Lin, Liyuan Wang, Xiaoli Feng, Jinsong Zhang |
| 2020 | Automatic Speech Recognition Benchmark for Air-Traffic Communications. Juan Zuluaga-Gomez, Petr Motlícek, Qingran Zhan, Karel Veselý, Rudolf A. Braun |
| 2020 | Automatic Speech Recognition for ILSE-Interviews: Longitudinal Conversational Speech Recordings Covering Aging and Cognitive Decline. Ayimunishagu Abulimiti, Jochen Weiner, Tanja Schultz |
| 2020 | Autosegmental Neural Nets: Should Phones and Tones be Synchronous or Asynchronous? Jialu Li, Mark Hasegawa-Johnson |
| 2020 | BLSTM-Driven Stream Fusion for Automatic Speech Recognition: Novel Methods and a Multi-Size Window Fusion Example. Timo Lohrenz, Tim Fingscheidt |
| 2020 | BUT Text-Dependent Speaker Verification System for SdSV Challenge 2020. Alicia Lozano-Diez, Anna Silnova, Bhargav Pulugundla, Johan Rohdin, Karel Veselý, Lukás Burget, Oldrich Plchot, Ondrej Glembek, Ondrej Novotný, Pavel Matejka |
| 2020 | Bandpass Noise Generation and Augmentation for Unified ASR. Kshitiz Kumar, Bo Ren, Yifan Gong, Jian Wu |
| 2020 | Bi-Encoder Transformer Network for Mandarin-English Code-Switching Speech Recognition Using Mixture of Experts. Yizhou Lu, Mingkun Huang, Hao Li, Jiaqi Guo, Yanmin Qian |
| 2020 | Bi-Level Speaker Supervision for One-Shot Speech Synthesis. Tao Wang, Jianhua Tao, Ruibo Fu, Jiangyan Yi, Zhengqi Wen, Chunyu Qiang |
| 2020 | Bidirectional LSTM Network with Ordered Neurons for Speech Enhancement. Xiaoqi Li, Yaxing Li, Yuanjie Dong, Shan Xu, Zhihui Zhang, Dan Wang, Shengwu Xiong |
| 2020 | Bilingual Acoustic Voice Variation is Similarly Structured Across Languages. Khia A. Johnson, Molly Babel, Robert A. Fuhrman |
| 2020 | BlaBla: Linguistic Feature Extraction for Clinical Analysis in Multiple Languages. Abhishek Shivkumar, Jack Weston, Raphael Lenain, Emil Fristed |
| 2020 | Black-Box Adaptation of ASR for Accented Speech. Kartik Khandelwal, Preethi Jyothi, Abhijeet Awasthi, Sunita Sarawagi |
| 2020 | Black-Box Attacks on Spoofing Countermeasures Using Transferability of Adversarial Examples. Yuekai Zhang, Ziyan Jiang, Jesús Villalba, Najim Dehak |
| 2020 | Blind Speech Signal Quality Estimation for Speaker Verification Systems. Galina Lavrentyeva, Marina Volkova, Anastasia Avdeeva, Sergey Novoselov, Artem Gorlanov, Tseren Andzhukaev, Artem Ivanov, Alexander Kozlov |
| 2020 | Brain networks enabling speech perception in everyday settings. Barbara G. Shinn-Cunningham |
| 2020 | Building a Robust Word-Level Wakeword Verification Network. Rajath Kumar, Mike Rodehorst, Joe Wang, Jiacheng Gu, Brian Kulis |
| 2020 | Bunched LPCNet: Vocoder for Low-Cost Neural Text-To-Speech Systems. Ravichander Vipperla, Sangjun Park, Kihyun Choo, Samin Ishtiaq, Kyoungbo Min, Sourav Bhattacharya, Abhinav Mehrotra, Alberto Gil C. P. Ramos, Nicholas D. Lane |
| 2020 | CAM: Uninteresting Speech Detector. Weiyi Lu, Yi Xu, Peng Yang, Belinda Zeng |
| 2020 | CAT: A CTC-CRF Based ASR Toolkit Bridging the Hybrid and the End-to-End Approaches Towards Data Efficiency and Low Latency. Keyu An, Hongyu Xiang, Zhijian Ou |
| 2020 | CATOTRON - A Neural Text-to-Speech System in Catalan. Baybars Külebi, Alp Öktem, Alex Peiró Lilja, Santiago Pascual, Mireia Farrús |
| 2020 | CSL-EMG_Array: An Open Access Corpus for EMG-to-Speech Conversion. Lorenz Diener, Mehrdad Roustay Vishkasougheh, Tanja Schultz |
| 2020 | CTC-Synchronous Training for Monotonic Attention Model. Hirofumi Inaguma, Masato Mimura, Tatsuya Kawahara |
| 2020 | CUCHILD: A Large-Scale Cantonese Corpus of Child Speech for Phonology and Articulation Assessment. Si Ioi Ng, Cymie Wing-Yee Ng, Jiarui Wang, Tan Lee, Kathy Yuet-Sheung Lee, Michael Chi-Fai Tong |
| 2020 | Can Auditory Nerve Models Tell us What's Different About WaveNet Vocoded Speech? Sébastien Le Maguer, Naomi Harte |
| 2020 | Can Speaker Augmentation Improve Multi-Speaker End-to-End TTS? Erica Cooper, Cheng-I Lai, Yusuke Yasuda, Junichi Yamagishi |
| 2020 | Caption Alignment for Low Resource Audio-Visual Data. Vighnesh Reddy Konda, Mayur Warialani, Rakesh Prasanth Achari, Varad Bhatnagar, Jayaprakash Akula, Preethi Jyothi, Ganesh Ramakrishnan, Gholamreza Haffari, Pankaj Singh |
| 2020 | Categorization of Whistled Consonants by French Speakers. Anaïs Tran Ngoc, Julien Meyer, Fanny Meunier |
| 2020 | Channel-Wise Subband Input for Better Voice and Accompaniment Separation on High Resolution Music. Haohe Liu, Lei Xie, Jian Wu, Geng Yang |
| 2020 | Characterization of Singaporean Children's English: Comparisons to American and British Counterparts Using Archetypal Analysis. Yuling Gu, Nancy F. Chen |
| 2020 | Class LM and Word Mapping for Contextual Biasing in End-to-End ASR. Rongqing Huang, Ossama Abdel-Hamid, Xinwei Li, Gunnar Evermann |
| 2020 | Classification of Manifest Huntington Disease Using Vowel Distortion Measures. Amrit Romana, John Bandon, Noelle Carlozzi, Angela Roberts, Emily Mower Provost |
| 2020 | Classify Imaginary Mandarin Tones with Cortical EEG Signals. Hua Li, Fei Chen |
| 2020 | ClovaCall: Korean Goal-Oriented Dialog Speech Corpus for Automatic Speech Recognition of Contact Centers. Jung-Woo Ha, Kihyun Nam, Jingu Kang, Sang-Woo Lee, Sohee Yang, Hyunhoon Jung, Hyeji Kim, Eunmi Kim, Soojin Kim, Hyun Ah Kim, Kyoungtae Doh, Chan Kyu Lee, Nako Sung, Sunghun Kim |
| 2020 | Coarticulation as Synchronised Sequential Target Approximation: An EMA Study. Zirui Liu, Yi Xu, Feng-fan Hsieh |
| 2020 | Combination of End-to-End and Hybrid Models for Speech Recognition. Jeremy Heng Meng Wong, Yashesh Gaur, Rui Zhao, Liang Lu, Eric Sun, Jinyu Li, Yifan Gong |
| 2020 | Combining Audio and Brain Activity for Predicting Speech Quality. Ivan Halim Parmonangan, Hiroki Tanaka, Sakriani Sakti, Satoshi Nakamura |
| 2020 | Compact Speaker Embedding: lrx-Vector. Munir Georges, Jonathan Huang, Tobias Bocklet |
| 2020 | Comparing EEG Analyses with Different Epoch Alignments in an Auditory Lexical Decision Experiment. Louis ten Bosch, Kimberley Mulder, Lou Boves |
| 2020 | Comparing Natural Language Processing Techniques for Alzheimer's Dementia Prediction in Spontaneous Speech. Thomas Searle, Zina M. Ibrahim, Richard J. B. Dobson |
| 2020 | Comparison of Glottal Source Parameter Values in Emotional Vowels. Yongwei Li, Jianhua Tao, Bin Liu, Donna Erickson, Masato Akagi |
| 2020 | Competency Evaluation in Voice Mimicking Using Acoustic Cues. Abhijith Girish, Adharsh Sabu, Akshay Prasannan Latha, Rajeev Rajan |
| 2020 | Competing Speaker Count Estimation on the Fusion of the Spectral and Spatial Embedding Space. Chao Peng, Xihong Wu, Tianshu Qu |
| 2020 | Complementary Language Model and Parallel Bi-LRNN for False Trigger Mitigation. Rishika Agarwal, Xiaochuan Niu, Pranay Dighe, Srikanth Vishnubhotla, Sameer Badaskar, Devang Naik |
| 2020 | Complex-Valued Variational Autoencoder: A Novel Deep Generative Model for Direct Representation of Complex Spectra. Toru Nakashika |
| 2020 | Compressing LSTM Networks with Hierarchical Coarse-Grain Sparsity. Deepak Kadetotad, Jian Meng, Visar Berisha, Chaitali Chakrabarti, Jae-sun Seo |
| 2020 | Computationally Efficient and Versatile Framework for Joint Optimization of Blind Speech Separation and Dereverberation. Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Hiroshi Sawada, Shoko Araki |
| 2020 | Computer Audition for Continuous Rainforest Occupancy Monitoring: The Case of Bornean Gibbons' Call Detection. Panagiotis Tzirakis, Alexander Shiarella, Robert M. Ewers, Björn W. Schuller |
| 2020 | Computer-Assisted Language Learning System: Automatic Speech Evaluation for Children Learning Malay and Tamil. Ke Shi, Kye Min Tan, Richeng Duan, Siti Umairah Md. Salleh, Nur Farah Ain Suhaimi, Rajan Vellu, Ngoc Thuy Huong Helen Thai, Nancy F. Chen |
| 2020 | Conditional Response Augmentation for Dialogue Using Knowledge Distillation. Myeongho Jeong, Seungtaek Choi, Hojae Han, Kyungho Kim, Seung-won Hwang |
| 2020 | Conditional Spoken Digit Generation with StyleGAN. Kasperi Palkama, Lauri Juvela, Alexander Ilin |
| 2020 | Confidence Measure for Speech-to-Concept End-to-End Spoken Language Understanding. Antoine Caubrière, Yannick Estève, Antoine Laurent, Emmanuel Morin |
| 2020 | Confidence Measures in Encoder-Decoder Models for Speech Recognition. Alejandro Woodward, Clara Bonnín, Issey Masuda, David Varas, Elisenda Bou-Balust, Juan Carlos Riveiro |
| 2020 | Conformer: Convolution-augmented Transformer for Speech Recognition. Anmol Gulati, James Qin, Chung-Cheng Chiu, Niki Parmar, Yu Zhang, Jiahui Yu, Wei Han, Shibo Wang, Zhengdong Zhang, Yonghui Wu, Ruoming Pang |
| 2020 | Congruent Audiovisual Speech Enhances Cortical Envelope Tracking During Auditory Selective Attention. Zhen Fu, Jing Chen |
| 2020 | Constrained Ratio Mask for Speech Enhancement Using DNN. Hongjiang Yu, Wei-Ping Zhu, Yuhong Yang |
| 2020 | Contemporary Polish Language Model (Version 2) Using Big Data and Sub-Word Approach. Krzysztof Wolk |
| 2020 | Context Dependent RNNLM for Automatic Transcription of Conversations. Srikanth Raj Chetupalli, Sriram Ganapathy |
| 2020 | Context-Aware Goodness of Pronunciation for Computer-Assisted Pronunciation Training. Jiatong Shi, Nan Huo, Qin Jin |
| 2020 | Context-Dependent Acoustic Modeling Without Explicit Phone Clustering. Tina Raissi, Eugen Beck, Ralf Schlüter, Hermann Ney |
| 2020 | Context-Dependent Domain Adversarial Neural Network for Multimodal Emotion Recognition. Zheng Lian, Jianhua Tao, Bin Liu, Jian Huang, Zhanlei Yang, Rongjun Li |
| 2020 | ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context. Wei Han, Zhengdong Zhang, Yu Zhang, Jiahui Yu, Chung-Cheng Chiu, James Qin, Anmol Gulati, Ruoming Pang, Yonghui Wu |
| 2020 | Contextual RNN-T for Open Domain ASR. Mahaveer Jain, Gil Keren, Jay Mahadeokar, Geoffrey Zweig, Florian Metze, Yatharth Saraf |
| 2020 | Contextualized Translation of Automatically Segmented Speech. Marco Gaido, Mattia Antonino Di Gangi, Matteo Negri, Mauro Cettolo, Marco Turchi |
| 2020 | Contextualizing ASR Lattice Rescoring with Hybrid Pointer Network Language Model. Da-Rong Liu, Chunxi Liu, Frank Zhang, Gabriel Synnaeve, Yatharth Saraf, Geoffrey Zweig |
| 2020 | Continual Learning for Multi-Dialect Acoustic Models. Brady Houston, Katrin Kirchhoff |
| 2020 | Continual Learning in Automatic Speech Recognition. Samik Sadhu, Hynek Hermansky |
| 2020 | Contrastive Predictive Coding of Audio with an Adversary. Luyu Wang, Kazuya Kawakami, Aäron van den Oord |
| 2020 | Contribution of RMS-Level-Based Speech Segments to Target Speech Decoding Under Noisy Conditions. Lei Wang, Ed X. Wu, Fei Chen |
| 2020 | Controllable Neural Prosody Synthesis. Max Morrison, Zeyu Jin, Justin Salamon, Nicholas J. Bryan, Gautham J. Mysore |
| 2020 | Controllable Neural Text-to-Speech Synthesis Using Intuitive Prosodic Features. Tuomo Raitio, Ramya Rasipuram, Dan Castellani |
| 2020 | Controlling the Strength of Emotions in Speech-Like Emotional Sound Generated by WaveNet. Kento Matsumoto, Sunao Hara, Masanobu Abe |
| 2020 | Conv-TasSAN: Separative Adversarial Network Based on Conv-TasNet. Chengyun Deng, Yi Zhang, Shiqian Ma, Yongtao Sha, Hui Song, Xiangang Li |
| 2020 | Conv-Transformer Transducer: Low Latency, Low Frame Rate, Streamable End-to-End Speech Recognition. Wenyong Huang, Wenchao Hu, Yu Ting Yeung, Xiao Chen |
| 2020 | Conversational Emotion Recognition Using Self-Attention Mechanisms and Graph Neural Networks. Zheng Lian, Jianhua Tao, Bin Liu, Jian Huang, Zhanlei Yang, Rongjun Li |
| 2020 | Converting Anyone's Emotion: Towards Speaker-Independent Emotional Voice Conversion. Kun Zhou, Berrak Sisman, Mingyang Zhang, Haizhou Li |
| 2020 | CopyCat: Many-to-Many Fine-Grained Prosody Transfer for Neural Text-to-Speech. Sri Karlapati, Alexis Moinet, Arnaud Joly, Viacheslav Klimkov, Daniel Sáez-Trigueros, Thomas Drugman |
| 2020 | Correlating Cepstra with Formant Frequencies: Implications for Phonetically-Informed Forensic Voice Comparison. Vincent Hughes, Frantz Clermont, Philip Harrison |
| 2020 | Correlation Between Prosody and Pragmatics: Case Study of Discourse Markers in French and English. Lou Lee, Denis Jouvet, Katarina Bartkova, Yvon Keromnes, Mathilde Dargnat |
| 2020 | Cortical Oscillatory Hierarchy for Natural Sentence Processing. Bin Zhao, Jianwu Dang, Gaoyan Zhang, Masashi Unoki |
| 2020 | Cosine-Distance Virtual Adversarial Training for Semi-Supervised Speaker-Discriminative Acoustic Embeddings. Florian L. Kreyssig, Philip C. Woodland |
| 2020 | Coswara - A Database of Breathing, Cough, and Voice Sounds for COVID-19 Diagnosis. Neeraj Kumar Sharma, Prashant Krishnan V, Rohit Kumar, Shreyas Ramoji, Srikanth Raj Chetupalli, Nirmala R., Prasanta Kumar Ghosh, Sriram Ganapathy |
| 2020 | Cotatron: Transcription-Guided Speech Encoder for Any-to-Many Voice Conversion Without Parallel Data. Seung Won Park, Doo-young Kim, Myun-chul Joe |
| 2020 | Cross Attention with Monotonic Alignment for Speech Transformer. Yingzhu Zhao, Chongjia Ni, Cheung-Chi Leung, Shafiq R. Joty, Eng Siong Chng, Bin Ma |
| 2020 | Cross-Domain Adaptation of Spoken Language Identification for Related Languages: The Curious Case of Slavic Languages. Badr M. Abdullah, Tania Avgustinova, Bernd Möbius, Dietrich Klakow |
| 2020 | Cross-Domain Adaptation with Discrepancy Minimization for Text-Independent Forensic Speaker Verification. Zhenyu Wang, Wei Xia, John H. L. Hansen |
| 2020 | Cross-Lingual Speaker Verification with Domain-Balanced Hard Prototype Mining and Language-Dependent Score Normalization. Jenthe Thienpondt, Brecht Desplanques, Kris Demuynck |
| 2020 | Cross-Lingual Text-To-Speech Synthesis via Domain Adaptation and Perceptual Similarity Regression in Speaker Space. Detai Xin, Yuki Saito, Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari |
| 2020 | Cross-Linguistic Interaction Between Phonological Categorization and Orthography Predicts Prosodic Effects in the Acquisition of Portuguese Liquids by L1-Mandarin Learners. Chao Zhou, Silke Hamann |
| 2020 | Cross-Linguistic Perception of Utterances with Willingness and Reluctance in Mandarin by Korean L2 Learners. Wenqian Li, Jung-Yueh Tu |
| 2020 | Crossmodal Sound Retrieval Based on Specific Target Co-Occurrence Denoted with Weak Labels. Masahiro Yasuda, Yasunori Ohishi, Yuma Koizumi, Noboru Harada |
| 2020 | Cues for Perception of Gender in Synthetic Voices and the Role of Identity. Maxwell Hope, Jason Lilley |
| 2020 | CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-Spectrogram Conversion. Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo |
| 2020 | Cyclic Spectral Modeling for Unsupervised Unit Discovery into Voice Conversion with Excitation and Waveform Modeling. Patrick Lumban Tobing, Tomoki Hayashi, Yi-Chiao Wu, Kazuhiro Kobayashi, Tomoki Toda |
| 2020 | DARTS-ASR: Differentiable Architecture Search for Multilingual Speech Recognition and Adaptation. Yi-Chen Chen, Jui-Yang Hsu, Cheng-Kuang Lee, Hung-yi Lee |
| 2020 | DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement. Yanxin Hu, Yun Liu, Shubo Lv, Mengtao Xing, Shimin Zhang, Yihui Fu, Jian Wu, Bihong Zhang, Lei Xie |
| 2020 | DNN No-Reference PSTN Speech Quality Prediction. Gabriel Mittag, Ross Cutler, Yasaman Hosseinkashi, Michael Revow, Sriram Srinivasan, Naglakshmi Chande, Robert Aichner |
| 2020 | Data Augmentation Using Prosody and False Starts to Recognize Non-Native Children's Speech. Hemant Kumar Kathania, Mittul Singh, Tamás Grósz, Mikko Kurimo |
| 2020 | Data Augmentation for Code-Switch Language Modeling by Fusing Multiple Text Generation Methods. Xinhui Hu, Qi Zhang, Lei Yang, Binbin Gu, Xinkang Xu |
| 2020 | Data Balancing for Boosting Performance of Low-Frequency Classes in Spoken Language Understanding. Judith Gaspers, Quynh Ngoc Thi Do, Fabian Triefenbach |
| 2020 | Data Efficient Voice Cloning from Noisy Samples with Domain Adversarial Training. Jian Cong, Shan Yang, Lei Xie, Guoqiao Yu, Guanglu Wan |
| 2020 | Datasets and Benchmarks for Task-Oriented Log Dialogue Ranking Task. Xinnuo Xu, Yizhe Zhang, Lars Liden, Sungjin Lee |
| 2020 | Decoding Imagined, Heard, and Spoken Speech: Classification and Regression of EEG Using a 14-Channel Dry-Contact Mobile Headset. Jonathan Clayton, Scott Wellington, Cassia Valentini-Botinhao, Oliver Watts |
| 2020 | Deep Architecture Enhancing Robustness to Noise, Adversarial Attacks, and Cross-Corpus Setting for Speech Emotion Recognition. Siddique Latif, Rajib Rana, Sara Khalifa, Raja Jurdak, Björn W. Schuller |
| 2020 | Deep Attentive End-to-End Continuous Breath Sensing from Speech. Alexis Deighton MacIntyre, Georgios Rizos, Anton Batliner, Alice Baird, Shahin Amiriparian, Antonia F. de C. Hamilton, Björn W. Schuller |
| 2020 | Deep Convolutional Spiking Neural Networks for Keyword Spotting. Emre Yilmaz, Özgür Bora Gevrek, Jibin Wu, Yuxiang Chen, Xuanbo Meng, Haizhou Li |
| 2020 | Deep Embedding Learning for Text-Dependent Speaker Verification. Peng Zhang, Peng Hu, Xueliang Zhang |
| 2020 | Deep F-Measure Maximization for End-to-End Speech Understanding. Leda Sari, Mark Hasegawa-Johnson |
| 2020 | Deep Learning Based Assessment of Synthetic Speech Naturalness. Gabriel Mittag, Sebastian Möller |
| 2020 | Deep Learning Based Dereverberation of Temporal Envelopes for Robust Speech Recognition. Anurenjan Purushothaman, Anirudh Sreeram, Rohit Kumar, Sriram Ganapathy |
| 2020 | Deep Learning Based Open Set Acoustic Scene Classification. Zuzanna Kwiatkowska, Beniamin Kalinowski, Michal Kosmider, Krzysztof Rykaczewski |
| 2020 | Deep MOS Predictor for Synthetic Speech Using Cluster-Based Modeling. Yeunju Choi, Youngmoon Jung, Hoirin Kim |
| 2020 | Deep Neural Network-Based Generalized Sidelobe Canceller for Robust Multi-Channel Speech Recognition. Guanjun Li, Shan Liang, Shuai Nie, Wenju Liu, Zhanlei Yang, Longshuai Xiao |
| 2020 | Deep Scattering Power Spectrum Features for Robust Speech Recognition. Neethu M. Joy, Dino Oglic, Zoran Cvetkovic, Peter Bell, Steve Renals |
| 2020 | Deep Self-Supervised Hierarchical Clustering for Speaker Diarization. Prachi Singh, Sriram Ganapathy |
| 2020 | Deep Speaker Embedding with Long Short Term Centroid Learning for Text-Independent Speaker Verification. Junyi Peng, Rongzhi Gu, Yuexian Zou |
| 2020 | Deep Speech Inpainting of Time-Frequency Masks. Mikolaj Kegler, Pierre Beckmann, Milos Cernak |
| 2020 | Deep Template Matching for Small-Footprint and Configurable Keyword Spotting. Peng Zhang, Xueliang Zhang |
| 2020 | Defense for Black-Box Attacks on Anti-Spoofing Models by Self-Supervised Learning. Haibin Wu, Andy T. Liu, Hung-yi Lee |
| 2020 | Densely Connected Time Delay Neural Network for Speaker Verification. Ya-Qi Yu, Wu-Jun Li |
| 2020 | Depthwise Separable Convolutional ResNet with Squeeze-and-Excitation Blocks for Small-Footprint Keyword Spotting. Menglong Xu, Xiao-Lei Zhang |
| 2020 | Design Choices for X-Vector Based Speaker Anonymization. Brij Mohan Lal Srivastava, Natalia A. Tomashenko, Xin Wang, Emmanuel Vincent, Junichi Yamagishi, Mohamed Maouche, Aurélien Bellet, Marc Tommasi |
| 2020 | Design and Development of a Human-Machine Dialog Corpus for the Automated Assessment of Conversational English Proficiency. Vikram Ramanarayanan |
| 2020 | Detecting Adversarial Examples for Speech Recognition via Uncertainty Quantification. Sina Däubener, Lea Schönherr, Asja Fischer, Dorothea Kolossa |
| 2020 | Detecting Audio Attacks on ASR Systems with Dropout Uncertainty. Tejas Jayashankar, Jonathan Le Roux, Pierre Moulin |
| 2020 | Detecting Domain-Specific Credibility and Expertise in Text and Speech. Shengli Hu |
| 2020 | Detecting and Analysing Spontaneous Oral Cancer Speech in the Wild. Bence Mark Halpern, Rob van Son, Michiel W. M. van den Brekel, Odette Scharenborg |
| 2020 | Detecting and Counting Overlapping Speakers in Distant Speech Scenarios. Samuele Cornell, Maurizio Omologo, Stefano Squartini, Emmanuel Vincent |
| 2020 | Detection of Subclinical Mild Traumatic Brain Injury (mTBI) Through Speech and Gait. Tanya Talkar, Sophia Yuditskaya, James R. Williamson, Adam C. Lammert, Hrishikesh Rao, Daniel J. Hannon, Anne T. O'Brien, Gloria Vergara-Diaz, Richard DeLaura, Douglas E. Sturim, Gregory A. Ciccarelli, Ross Zafonte, Jeff Palmer, Paolo Bonato, Thomas F. Quatieri |
| 2020 | Detection of Voicing and Place of Articulation of Fricatives with Deep Learning in a Virtual Speech and Language Therapy Tutor. Ivo Anjos, Maxine Eskénazi, Nuno Marques, Margarida Grilo, Isabel Guimarães, João Magalhães, Sofia Cavaco |
| 2020 | Developing RNN-T Models Surpassing High-Performance Hybrid Models with Customization Capability. Jinyu Li, Rui Zhao, Zhong Meng, Yanqing Liu, Wenning Wei, Sarangarajan Parthasarathy, Vadim Mazalov, Zhenghao Wang, Lei He, Sheng Zhao, Yifan Gong |
| 2020 | Developing an Open-Source Corpus of Yoruba Speech. Alexander Gutkin, Isin Demirsahin, Oddur Kjartansson, Clara Rivera, Kólá Túbosún |
| 2020 | Development of Multilingual ASR Using GlobalPhone for Less-Resourced Languages: The Case of Ethiopian Languages. Martha Yifiru Tachbelie, Solomon Teferra Abate, Tanja Schultz |
| 2020 | Development of a Speech Quality Database Under Uncontrolled Conditions. Alessandro Ragano, Emmanouil Benetos, Andrew Hines |
| 2020 | DiPCo - Dinner Party Corpus. Maarten Van Segbroeck, Ahmed Zaid, Ksenia Kutsenko, Cirenia Huerta, Tinh Nguyen, Xuewen Luo, Björn Hoffmeister, Jan Trmal, Maurizio Omologo, Roland Maas |
| 2020 | Differences in Gradient Emotion Perception: Human vs. Alexa Voices. Michelle Cohn, Eran Raveh, Kristin Predeck, Iona Gessinger, Bernd Möbius, Georgia Zellou |
| 2020 | Differential Beamforming for Uniform Circular Array with Directional Microphones. Weilong Huang, Jinwei Feng |
| 2020 | Dimensional Emotion Prediction Based on Interactive Context in Conversation. Xiaohan Shi, Sixia Li, Jianwu Dang |
| 2020 | Discovering Articulatory Speech Targets from Synthesized Random Babble. Heikki Rasilo, Yannick Jadoul |
| 2020 | Discriminative Method to Extract Coarse Prosodic Structure and its Application for Statistical Phrase/Accent Command Estimation. Yuma Shirahata, Daisuke Saito, Nobuaki Minematsu |
| 2020 | Discriminative Singular Spectrum Analysis for Bioacoustic Classification. Bernardo B. Gatto, Eulanda Miranda dos Santos, Juan Gabriel Colonna, Naoya Sogi, Lincon Sales de Souza, Kazuhiro Fukui |
| 2020 | Discriminative Transfer Learning for Optimizing ASR and Semantic Labeling in Task-Oriented Spoken Dialog. Yao Qian, Yu Shi, Michael Zeng |
| 2020 | Disfluencies and Fine-Tuning Pre-Trained Language Models for Detection of Alzheimer's Disease. Jiahong Yuan, Yuchen Bian, Xingyu Cai, Jiaji Huang, Zheng Ye, Kenneth Church |
| 2020 | Distant Supervision for Polyphone Disambiguation in Mandarin Chinese. Jiawen Zhang, Yuanyuan Zhao, Jiaqi Zhu, Jinba Xiao |
| 2020 | Distilling the Knowledge of BERT for Sequence-to-Sequence ASR. Hayato Futami, Hirofumi Inaguma, Sei Ueno, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara |
| 2020 | Distributed Summation Privacy for Speech Enhancement. Matthew O'Connor, W. Bastiaan Kleijn |
| 2020 | Do End-to-End Speech Recognition Models Care About Context? Lasse Borgholt, Jakob D. Havtorn, Zeljko Agic, Anders Søgaard, Lars Maaløe, Christian Igel |
| 2020 | Do Face Masks Introduce Bias in Speech Technologies? The Case of Automated Scoring of Speaking Proficiency. Anastassia Loukina, Keelan Evanini, Matthew Mulholland, Ian Blood, Klaus Zechner |
| 2020 | Does French Listeners' Ability to Use Accentual Information at the Word Level Depend on the Ear of Presentation? Amandine Michelas, Sophie Dufour |
| 2020 | Does Lexical Retrieval Deteriorate in Patients with Mild Cognitive Impairment? Analysis of Brain Functional Network Will Tell. Chongyuan Lian, Tianqi Wang, Mingxiao Gu, Manwa L. Ng, Feiqi Zhu, Lan Wang, Nan Yan |
| 2020 | Doing Something we Never could with Spoken Language Technologies-from early days to the era of deep learning. Lin-Shan Lee |
| 2020 | Domain Adaptation Using Class Similarity for Robust Speech Recognition. Han Zhu, Jiangjiang Zhao, Yuling Ren, Li Wang, Pengyuan Zhang |
| 2020 | Domain Adaptation for Enhancing Speech-Based Depression Detection in Natural Environmental Conditions Using Dilated CNNs. Zhaocheng Huang, Julien Epps, Dale Joachim, Brian Stasak, James R. Williamson, Thomas F. Quatieri |
| 2020 | Domain Adversarial Neural Networks for Dysarthric Speech Recognition. Dominika Woszczyk, Stavros Petridis, David E. Millard |
| 2020 | Domain Aware Training for Far-Field Small-Footprint Keyword Spotting. Haiwei Wu, Yan Jia, Yuanfei Nie, Ming Li |
| 2020 | Domain-Invariant Speaker Vector Projection by Model-Agnostic Meta-Learning. Jiawen Kang, Ruiqi Liu, Lantian Li, Yunqi Cai, Dong Wang, Thomas Fang Zheng |
| 2020 | Double Adversarial Network Based Monaural Speech Enhancement for Robust Speech Recognition. Zhihao Du, Jiqing Han, Xueliang Zhang |
| 2020 | Dual Attention in Time and Frequency Domain for Voice Activity Detection. Joohyung Lee, Youngmoon Jung, Hoirin Kim |
| 2020 | Dual Stage Learning Based Dynamic Time-Frequency Mask Generation for Audio Event Classification. Donghyeon Kim, Jaihyun Park, David K. Han, Hanseok Ko |
| 2020 | Dual-Adversarial Domain Adaptation for Generalized Replay Attack Detection. Hongji Wang, Heinrich Dinkel, Shuai Wang, Yanmin Qian, Kai Yu |
| 2020 | Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation. Jingjing Chen, Qirong Mao, Dong Liu |
| 2020 | Dual-Signal Transformation LSTM Network for Real-Time Noise Suppression. Nils L. Westhausen, Bernd T. Meyer |
| 2020 | DurIAN-SC: Duration Informed Attention Network Based Singing Voice Conversion System. Liqiang Zhang, Chengzhu Yu, Heng Lu, Chao Weng, Chunlei Zhang, Yusong Wu, Xiang Xie, Zijin Li, Dong Yu |
| 2020 | DurIAN: Duration Informed Attention Network for Speech Synthesis. Chengzhu Yu, Heng Lu, Na Hu, Meng Yu, Chao Weng, Kun Xu, Peng Liu, Deyi Tuo, Shiyin Kang, Guangzhi Lei, Dan Su, Dong Yu |
| 2020 | Dynamic Margin Softmax Loss for Speaker Verification. Dao Zhou, Longbiao Wang, Kong Aik Lee, Yibo Wu, Meng Liu, Jianwu Dang, Jianguo Wei |
| 2020 | Dynamic Prosody Generation for Speech Synthesis Using Linguistics-Driven Acoustic Embedding Selection. Shubhi Tyagi, Marco Nicolis, Jonas Rohnke, Thomas Drugman, Jaime Lorenzo-Trueba |
| 2020 | Dynamic Soft Windowing and Language Dependent Style Token for Code-Switching End-to-End Speech Synthesis. Ruibo Fu, Jianhua Tao, Zhengqi Wen, Jiangyan Yi, Chunyu Qiang, Tao Wang |
| 2020 | Dynamic Speaker Representations Adjustment and Decoder Factorization for Speaker Adaptation in End-to-End Speech Synthesis. Ruibo Fu, Jianhua Tao, Zhengqi Wen, Jiangyan Yi, Tao Wang, Chunyu Qiang |
| 2020 | Dysarthria Detection and Severity Assessment Using Rhythm-Based Metrics. Abner Hernandez, Eun Jung Yeo, Sunhee Kim, Minhwa Chung |
| 2020 | Dysarthric Speech Recognition Based on Deep Metric Learning. Yuki Takashima, Ryoichi Takashima, Tetsuya Takiguchi, Yasuo Ariki |
| 2020 | ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification. Brecht Desplanques, Jenthe Thienpondt, Kris Demuynck |
| 2020 | EEG-Based Short-Time Auditory Attention Detection Using Multi-Task Deep Learning. Zhuo Zhang, Gaoyan Zhang, Jianwu Dang, Shuang Wu, Di Zhou, Longbiao Wang |
| 2020 | Early Stage LM Integration Using Local and Global Log-Linear Combination. Wilfried Michel, Ralf Schlüter, Hermann Ney |
| 2020 | Effect of Adding Positional Information on Convolutional Neural Networks for End-to-End Speech Recognition. Jinhwan Park, Wonyong Sung |
| 2020 | Effect of Microphone Position Measurement Error on RIR and its Impact on Speech Intelligibility and Quality. Aditya Raikar, Karan Nathwani, Ashish Panda, Sunil Kumar Kopparapu |
| 2020 | Effect of Spectral Complexity Reduction and Number of Instruments on Musical Enjoyment with Cochlear Implants. Avamarie Brueggeman, John H. L. Hansen |
| 2020 | Effects of Communication Channels and Actor's Gender on Emotion Identification by Native Mandarin Speakers. Yi Lin, Hongwei Ding |
| 2020 | Effects of Dialectal Code-Switching on Speech Modules: A Study Using Egyptian Arabic Broadcast Speech. Shammur A. Chowdhury, Younes Samih, Mohamed Eldesouki, Ahmed Ali |
| 2020 | Efficient Low-Latency Speech Enhancement with Mobile Audio Streaming Networks. Michal Romaniuk, Piotr Masztalski, Karol Piaskowski, Mateusz Matuszewski |
| 2020 | Efficient MDI Adaptation for n-Gram Language Models. Ruizhe Huang, Ke Li, Ashish Arora, Daniel Povey, Sanjeev Khudanpur |
| 2020 | Efficient Minimum Word Error Rate Training of RNN-Transducer for End-to-End Speech Recognition. Jinxi Guo, Gautam Tiwari, Jasha Droppo, Maarten Van Segbroeck, Che-Wei Huang, Andreas Stolcke, Roland Maas |
| 2020 | Efficient Neural Speech Synthesis for Low-Resource Languages Through Multilingual Modeling. Marcel de Korte, Jaebok Kim, Esther Klabbers |
| 2020 | Efficient Wait-k Models for Simultaneous Machine Translation. Maha Elbayad, Laurent Besacier, Jakob Verbeek |
| 2020 | Efficient WaveGlow: An Improved WaveGlow Vocoder with Enhanced Speed. Wei Song, Guanghui Xu, Zhengchen Zhang, Chao Zhang, Xiaodong He, Bowen Zhou |
| 2020 | EigenEmo: Spectral Utterance Representation Using Dynamic Mode Decomposition for Speech Emotion Classification. Shuiyang Mao, P. C. Ching, Tan Lee |
| 2020 | Electroglottographic-Phonetic Study on Korean Phonation Induced by Tripartite Plosives in Yanbian Korean. Yinghao Li, Jinghua Zhang |
| 2020 | Emitting Word Timings with End-to-End Models. Tara N. Sainath, Ruoming Pang, David Rybach, Basi García, Trevor Strohman |
| 2020 | Emotion Profile Refinery for Speech Emotion Classification. Shuiyang Mao, Pak-Chung Ching, Tan Lee |
| 2020 | Empirical Interpretation of Speech Emotion Perception with Attention Based Model for Speech Emotion Recognition. Md. Asif Jalal, Rosanna Milner, Thomas Hain |
| 2020 | End-to-End ASR with Adaptive Span Self-Attention. Xuankai Chang, Aswin Shanmugam Subramanian, Pengcheng Guo, Shinji Watanabe, Yuya Fujita, Motoi Omachi |
| 2020 | End-to-End Deep Learning Speech Recognition Model for Silent Speech Challenge. Naoki Kimura, Zixiong Su, Takaaki Saeki |
| 2020 | End-to-End Domain-Adversarial Voice Activity Detection. Marvin Lavechin, Marie-Philippe Gill, Ruben Bousbib, Hervé Bredin, Leibny Paola García-Perera |
| 2020 | End-to-End Far-Field Speech Recognition with Unified Dereverberation and Beamforming. Wangyou Zhang, Aswin Shanmugam Subramanian, Xuankai Chang, Shinji Watanabe, Yanmin Qian |
| 2020 | End-to-End Keyword Search Based on Attention and Energy Scorer for Low Resource Languages. Zeyu Zhao, Weiqiang Zhang |
| 2020 | End-to-End Multi-Look Keyword Spotting. Meng Yu, Xuan Ji, Bo Wu, Dan Su, Dong Yu |
| 2020 | End-to-End Named Entity Recognition from English Speech. Hemant Yadav, Sreyan Ghosh, Yi Yu, Rajiv Ratn Shah |
| 2020 | End-to-End Neural Transformer Based Spoken Language Understanding. Martin Radfar, Athanasios Mouchtaris, Siegfried Kunzmann |
| 2020 | End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors. Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Yawen Xue, Kenji Nagamatsu |
| 2020 | End-to-End Speech Emotion Recognition Combined with Acoustic-to-Word ASR Model. Han Feng, Sei Ueno, Tatsuya Kawahara |
| 2020 | End-to-End Speech Intelligibility Prediction Using Time-Domain Fully Convolutional Neural Networks. Mathias Bach Pedersen, Morten Kolbæk, Asger Heidemann Andersen, Søren Holdt Jensen, Jesper Jensen |
| 2020 | End-to-End Speech-to-Dialog-Act Recognition. Viet-Trung Dang, Tianyu Zhao, Sei Ueno, Hirofumi Inaguma, Tatsuya Kawahara |
| 2020 | End-to-End Spoken Language Understanding Without Full Transcripts. Hong-Kwang Jeff Kuo, Zoltán Tüske, Samuel Thomas, Yinghui Huang, Kartik Audhkhasi, Brian Kingsbury, Gakuto Kurata, Zvi Kons, Ron Hoory, Luis A. Lastras |
| 2020 | End-to-End Task-Oriented Dialog System Through Template Slot Value Generation. Teakgyu Hong, Oh-Woog Kwon, Young-Kil Kim |
| 2020 | End-to-End Text-to-Speech Synthesis with Unaligned Multiple Language Units Based on Attention. Masashi Aso, Shinnosuke Takamichi, Hiroshi Saruwatari |
| 2020 | Enhancing Formant Information in Spectrographic Display of Speech. B. Yegnanarayana, Joseph M. Anand, Vishala Pannala |
| 2020 | Enhancing Intelligibility of Dysarthric Speech Using Gated Convolutional-Based Voice Conversion System. Chen-Yu Chen, Wei-Zhong Zheng, Syu-Siang Wang, Yu Tsao, Pei-Chun Li, Ying-Hui Lai |
| 2020 | Enhancing Monotonic Multihead Attention for Streaming ASR. Hirofumi Inaguma, Masato Mimura, Tatsuya Kawahara |
| 2020 | Enhancing Monotonicity for Robust Autoregressive Transformer TTS. Xiangyu Liang, Zhiyong Wu, Runnan Li, Yanqing Liu, Sheng Zhao, Helen Meng |
| 2020 | Enhancing Sequence-to-Sequence Text-to-Speech with Morphology. Jason Taylor, Korin Richmond |
| 2020 | Enhancing Speech Intelligibility in Text-To-Speech Synthesis Using Speaking Style Conversion. Dipjyoti Paul, P. V. Muhammed Shifas, Yannis Pantazis, Yannis Stylianou |
| 2020 | Enhancing Transferability of Black-Box Adversarial Attacks via Lifelong Learning for Speech Emotion Recognition Models. Zhao Ren, Jing Han, Nicholas Cummins, Björn W. Schuller |
| 2020 | Enhancing the Interaural Time Difference of Bilateral Cochlear Implants with the Temporal Limits Encoder. Yangyang Wan, Huali Zhou, Qinglin Meng, Nengheng Zheng |
| 2020 | Ensemble Approaches for Uncertainty in Spoken Language Assessment. Xixin Wu, Kate M. Knill, Mark J. F. Gales, Andrey Malinin |
| 2020 | Ensemble of Students Taught by Probabilistic Teachers to Improve Speech Emotion Recognition. Kusha Sridhar, Carlos Busso |
| 2020 | Ensembling End-to-End Deep Models for Computational Paralinguistics Tasks: ComParE 2020 Mask and Breathing Sub-Challenges. Maxim Markitantov, Denis Dresvyanskiy, Danila Mamontov, Heysem Kaya, Wolfgang Minker, Alexey Karpov |
| 2020 | Entity Linking for Short Text Using Structured Knowledge Graph via Multi-Grained Text Matching. Binxuan Huang, Han Wang, Tong Wang, Yue Liu, Yang Liu |
| 2020 | Environment Sound Classification Using Multiple Feature Channels and Attention Based Deep Convolutional Neural Network. Jivitesh Sharma, Ole-Christoffer Granmo, Morten Goodwin |
| 2020 | Environmental Sound Classification with Parallel Temporal-Spectral Attention. Helin Wang, Yuexian Zou, Dading Chong, Wenwu Wang |
| 2020 | Er-Suffixation in Southwestern Mandarin: An EMA and Ultrasound Study. Jing Huang, Feng-fan Hsieh, Yueh-Chin Chang |
| 2020 | Evaluating Automatically Generated Phoneme Captions for Images. Justin van der Hout, Zoltán D'Haese, Mark Hasegawa-Johnson, Odette Scharenborg |
| 2020 | Evaluating and Optimizing Prosodic Alignment for Automatic Dubbing. Marcello Federico, Yogesh Virkar, Robert Enyedi, Roberto Barra-Chicote |
| 2020 | Evaluating the Reliability of Acoustic Speech Embeddings. Robin Algayres, Mohamed Salah Zaïem, Benoît Sagot, Emmanuel Dupoux |
| 2020 | Evolutionary Algorithm Enhanced Neural Architecture Search for Text-Independent Speaker Verification. Xiaoyang Qu, Jianzong Wang, Jing Xiao |
| 2020 | Evolved Speech-Transformer: Applying Neural Architecture Search to End-to-End Automatic Speech Recognition. Jihwan Kim, Jisung Wang, Sangki Kim, Yeha Lee |
| 2020 | Exploiting Conic Affinity Measures to Design Speech Enhancement Systems Operating in Unseen Noise Conditions. Pavlos Papadopoulos, Shrikanth Narayanan |
| 2020 | Exploiting Cross-Domain Visual Feature Generation for Disordered Speech Recognition. Shansong Liu, Xurong Xie, Jianwei Yu, Shoukang Hu, Mengzhe Geng, Rongfeng Su, Shi-Xiong Zhang, Xunying Liu, Helen Meng |
| 2020 | Exploiting Deep Sentential Context for Expressive End-to-End Speech Synthesis. Fengyu Yang, Shan Yang, Qinghua Wu, Yujun Wang, Lei Xie |
| 2020 | Exploiting Multi-Modal Features from Pre-Trained Networks for Alzheimer's Dementia Recognition. Junghyun Koo, Jie Hwan Lee, Jaewoo Pyo, Yujin Jo, Kyogu Lee |
| 2020 | Exploration of Acoustic and Lexical Cues for the INTERSPEECH 2020 Computational Paralinguistic Challenge. Ziqing Yang, Zifan An, Zehao Fan, Chengye Jing, Houwei Cao |
| 2020 | Exploration of Audio Quality Assessment and Anomaly Localisation Using Attention Models. Qiang Huang, Thomas Hain |
| 2020 | Exploration of End-to-End Synthesisers for Zero Resource Speech Challenge 2020. Karthik Pandia D. S, Anusha Prakash, Mano Ranjith Kumar M., Hema A. Murthy |
| 2020 | Exploring Deep Hybrid Tensor-to-Vector Network Architectures for Regression Based Speech Enhancement. Jun Qi, Hu Hu, Yannan Wang, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee |
| 2020 | Exploring Lexicon-Free Modeling Units for End-to-End Korean and Korean-English Code-Switching Speech Recognition. Jisung Wang, Jihwan Kim, Sangki Kim, Yeha Lee |
| 2020 | Exploring Listeners' Speech Rate Preferences. Olympia Simantiraki, Martin Cooke |
| 2020 | Exploring MMSE Score Prediction Using Verbal and Non-Verbal Cues. Shahla Farzana, Natalie Parde |
| 2020 | Exploring TTS Without T Using Biologically/Psychologically Motivated Neural Network Modules (ZeroSpeech 2020). Takashi Morita, Hiroki Koda |
| 2020 | Exploring Text and Audio Embeddings for Multi-Dimension Elderly Emotion Recognition. Mariana Julião, Alberto Abad, Helena Moniz |
| 2020 | Exploring Transformers for Large-Scale Speech Recognition. Liang Lu, Changliang Liu, Jinyu Li, Yifan Gong |
| 2020 | Exploring the Use of an Artificial Accent of English to Assess Phonetic Learning in Monolingual and Bilingual Speakers. Laura Spinu, Jiwon Hwang, Nadya Pincus, Mariana Vasilita |
| 2020 | Exploring the Use of an Unsupervised Autoregressive Model as a Shared Encoder for Text-Dependent Speaker Verification. Vijay Ravi, Ruchao Fan, Amber Afshan, Huanhua Lu, Abeer Alwan |
| 2020 | Extended Study on the Use of Vocal Tract Variables to Quantify Neuromotor Coordination in Depression. Nadee Seneviratne, James R. Williamson, Adam C. Lammert, Thomas F. Quatieri, Carol Y. Espy-Wilson |
| 2020 | Extrapolating False Alarm Rates in Automatic Speaker Verification. Alexey Sholokhov, Tomi Kinnunen, Ville Vestman, Kong Aik Lee |
| 2020 | F0 Patterns in Mandarin Statements of Mandarin and Cantonese Speakers. Yike Yang, Si Chen, Xi Chen |
| 2020 | F0 Slope and Mean: Cues to Speech Segmentation in French. Maria del Mar Cordero, Fanny Meunier, Nicolas Grimault, Stéphane Pota, Elsa Spinelli |
| 2020 | FEARLESS STEPS Challenge (FS-2): Supervised Learning with Massive Naturalistic Apollo Data. Aditya Joglekar, John H. L. Hansen, Meena Chandra Shekhar, Abhijeet Sangwan |
| 2020 | FT Speech: Danish Parliament Speech Corpus. Andreas Kirkedal, Marija Stepanovic, Barbara Plank |
| 2020 | Face2Speech: Towards Multi-Speaker Text-to-Speech Synthesis Using an Embedding Vector Predicted from a Face Image. Shunsuke Goto, Kotaro Onishi, Yuki Saito, Kentaro Tachibana, Koichiro Mori |
| 2020 | FaceFilter: Audio-Visual Speech Separation Using Still Images. Soo-Whan Chung, Soyeon Choe, Joon Son Chung, Hong-Goo Kang |
| 2020 | Fast and Lightweight On-Device TTS with Tacotron2 and LPCNet. Vadim Popov, Stanislav Kamenev, Mikhail A. Kudinov, Sergey Repyevsky, Tasnima Sadekova, Vitalii Bushaev, Vladimir Kryzhanovskiy, Denis Parkhomenko |
| 2020 | Fast and Slow Acoustic Model. Kshitiz Kumar, Emilian Stoimenov, Hosam Khalil, Jian Wu |
| 2020 | Faster, Simpler and More Accurate Hybrid ASR Systems Using Wordpieces. Frank Zhang, Yongqiang Wang, Xiaohui Zhang, Chunxi Liu, Yatharth Saraf, Geoffrey Zweig |
| 2020 | FeatherWave: An Efficient High-Fidelity Neural Vocoder with Multi-Band Linear Prediction. Qiao Tian, Zewang Zhang, Heng Lu, Ling-Hui Chen, Shan Liu |
| 2020 | FinChat: Corpus and Evaluation Setup for Finnish Chat Conversations on Everyday Topics. Katri Leino, Juho Leinonen, Mittul Singh, Sami Virpioja, Mikko Kurimo |
| 2020 | Finding Intelligible Consonant-Vowel Sounds Using High-Quality Articulatory Synthesis. Daniel R. van Niekerk, Anqi Xu, Branislav Gerazov, Paul Konstantin Krug, Peter Birkholz, Yi Xu |
| 2020 | Finnish ASR with Deep Transformer Models. Abhilash Jain, Aku Rouhe, Stig-Arne Grönroos, Mikko Kurimo |
| 2020 | Focal Loss for Punctuation Prediction. Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Ye Bai, Cunhang Fan |
| 2020 | Formant Tracking Using Dilated Convolutional Networks Through Dense Connection with Gating Mechanism. Wang Dai, Jinsong Zhang, Yingming Gao, Wei Wei, Dengfeng Ke, Binghuai Lin, Yanlu Xie |
| 2020 | Frame-Level Signal-to-Noise Ratio Estimation Using Deep Learning. Hao Li, DeLiang Wang, Xueliang Zhang, Guanglai Gao |
| 2020 | Frame-Wise Online Unsupervised Adaptation of DNN-HMM Acoustic Model from Perspective of Robust Adaptive Filtering. Ryu Takeda, Kazunori Komatani |
| 2020 | From Speaker Verification to Multispeaker Speech Synthesis, Deep Transfer with Feedback Constraint. Zexin Cai, Chuxiong Zhang, Ming Li |
| 2020 | Fundamental Frequency Model for Postfiltering at Low Bitrates in a Transform-Domain Speech and Audio Codec. Sneha Das, Tom Bäckström, Guillaume Fuchs |
| 2020 | Fusion Architectures for Word-Based Audiovisual Speech Recognition. Michael Wand, Jürgen Schmidhuber |
| 2020 | FusionRNN: Shared Neural Parameters for Multi-Channel Distant Speech Recognition. Titouan Parcollet, Xinchi Qiu, Nicholas D. Lane |
| 2020 | GAN-Based Data Generation for Speech Emotion Recognition. Sefik Emre Eskimez, Dimitrios Dimitriadis, Robert Gmyr, Kenichi Kumanati |
| 2020 | GAZEV: GAN-Based Zero-Shot Voice Conversion Over Non-Parallel Speech Corpus. Zining Zhang, Bingsheng He, Zhenjie Zhang |
| 2020 | GEV Beamforming Supported by DOA-Based Masks Generated on Pairs of Microphones. François Grondin, Jean-Samuel Lauzon, Jonathan Vincent, François Michaud |
| 2020 | Gaming Corpus for Studying Social Screams. Hiroki Mori, Yuki Kikuchi |
| 2020 | Gated Multi-Head Attention Pooling for Weakly Labelled Audio Tagging. Sixin Hong, Yuexian Zou, Wenwu Wang |
| 2020 | Gated Recurrent Fusion of Spatial and Spectral Features for Multi-Channel Speech Separation with Deep Embedding Representations. Cunhang Fan, Jianhua Tao, Bin Liu, Jiangyan Yi, Zhengqi Wen |
| 2020 | Generalized Minimal Distortion Principle for Blind Source Separation. Robin Scheibler |
| 2020 | Generative Adversarial Network Based Acoustic Echo Cancellation. Yi Zhang, Chengyun Deng, Shiqian Ma, Yongtao Sha, Hui Song, Xiangang Li |
| 2020 | Generative Adversarial Training Data Adaptation for Very Low-Resource Automatic Speech Recognition. Kohei Matsuura, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara |
| 2020 | Generic Indic Text-to-Speech Synthesisers with Rapid Adaptation in an End-to-End Framework. Anusha Prakash, Hema A. Murthy |
| 2020 | Glottal Closure Instants Detection from EGG Signal by Classification Approach. Gurunath Reddy M., K. Sreenivasa Rao, Partha Pratim Das |
| 2020 | Group Gated Fusion on Attention-Based Bidirectional Alignment for Multimodal Emotion Recognition. Pengfei Liu, Kun Li, Helen Meng |
| 2020 | HRI-RNN: A User-Robot Dynamics-Oriented RNN for Engagement Decrease Detection. Asma Atamna, Chloé Clavel |
| 2020 | Harmonic Lowering for Accelerating Harmonic Convolution for Audio Signals. Hirotoshi Takeuchi, Kunio Kashino, Yasunori Ohishi, Hiroshi Saruwatari |
| 2020 | Hearing-Impaired Bio-Inspired Cochlear Models for Real-Time Auditory Applications. Arthur Van Den Broucke, Deepak Baby, Sarah Verhulst |
| 2020 | HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks. Jiaqi Su, Zeyu Jin, Adam Finkelstein |
| 2020 | Hide and Speak: Towards Deep Neural Networks for Speech Steganography. Felix Kreuk, Yossi Adi, Bhiksha Raj, Rita Singh, Joseph Keshet |
| 2020 | Hider-Finder-Combiner: An Adversarial Architecture for General Speech Signal Modification. Jacob J. Webber, Olivier Perrotin, Simon King |
| 2020 | Hierarchical Multi-Grained Generative Model for Expressive Speech Synthesis. Yukiya Hono, Kazuna Tsuboi, Kei Sawada, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda |
| 2020 | Hierarchical Multi-Stage Word-to-Grapheme Named Entity Corrector for Automatic Speech Recognition. Abhinav Garg, Ashutosh Gupta, Dhananjaya Gowda, Shatrughan Singh, Chanwoo Kim |
| 2020 | High Performance Sequence-to-Sequence Model for Streaming Speech Recognition. Thai-Son Nguyen, Ngoc-Quan Pham, Sebastian Stüker, Alex Waibel |
| 2020 | High Quality Streaming Speech Synthesis with Low, Sentence-Length-Independent Latency. Nikolaos Ellinas, Georgios Vamvoukakis, Konstantinos Markopoulos, Aimilios Chalamandaris, Georgia Maniati, Panos Kakoulidis, Spyros Raptis, June Sig Sung, Hyoungmin Park, Pirros Tsiakoulis |
| 2020 | How Does Label Noise Affect the Quality of Speaker Embeddings? Minh Pham, Zeqian Li, Jacob Whitehill |
| 2020 | How Ordinal Are Your Data? Sadari Jayawardena, Julien Epps, Zhaocheng Huang |
| 2020 | How Rhythm and Timbre Encode Mooré Language in Bendré Drummed Speech. Laure Dentel, Julien Meyer |
| 2020 | Hybrid Network Feature Extraction for Depression Assessment from Speech. Ziping Zhao, Qifei Li, Nicholas Cummins, Bin Liu, Haishuai Wang, Jianhua Tao, Björn W. Schuller |
| 2020 | Hybrid Transformer/CTC Networks for Hardware Efficient Voice Triggering. Saurabh Adya, Vineet Garg, Siddharth Sigtia, Pramod Simha, Chandra Dhir |
| 2020 | ICE-Talk: An Interface for a Controllable Expressive Talking Machine. Noé Tits, Kevin El Haddad, Thierry Dutoit |
| 2020 | INTERSPEECH 2020 Deep Noise Suppression Challenge: A Fully Convolutional Recurrent Network (FCRN) for Joint Dereverberation and Denoising. Maximilian Strake, Bruno Defraene, Kristoff Fluyt, Wouter Tirry, Tim Fingscheidt |
| 2020 | Identify Speakers in Cocktail Parties with End-to-End Attention. Junzhe Zhu, Mark Hasegawa-Johnson, Leda Sari |
| 2020 | Identifying Causal Relationships Between Behavior and Local Brain Activity During Natural Conversation. Youssef Hmamouche, Laurent Prévot, Magalie Ochs, Thierry Chaminade |
| 2020 | Identifying Important Time-Frequency Locations in Continuous Speech Utterances. Hassan Salami Kavaki, Michael I. Mandel |
| 2020 | Implicit Transfer of Privileged Acoustic Information in a Generalized Knowledge Distillation Framework. Takashi Fukuda, Samuel Thomas |
| 2020 | Improved Guided Source Separation Integrated with a Strong Back-End for the CHiME-6 Dinner Party Scenario. Hangting Chen, Pengyuan Zhang, Qian Shi, Zuozhen Liu |
| 2020 | Improved Hybrid Streaming ASR with Transformer Language Models. Pau Baquero-Arnal, Javier Jorge, Adrià Giménez, Joan Albert Silvestre-Cerdà, Javier Iranzo-Sánchez, Albert Sanchís, Jorge Civera, Alfons Juan |
| 2020 | Improved Learning of Word Embeddings with Word Definitions and Semantic Injection. Yichi Zhang, Yinpei Dai, Zhijian Ou, Huixin Wang, Junlan Feng |
| 2020 | Improved Model for Vocal Folds with a Polyp with Potential Application. Jônatas Santos, Jugurta Montalvão, Israel Santos |
| 2020 | Improved Noisy Student Training for Automatic Speech Recognition. Daniel S. Park, Yu Zhang, Ye Jia, Wei Han, Chung-Cheng Chiu, Bo Li, Yonghui Wu, Quoc V. Le |
| 2020 | Improved Prosody from Learned F0 Codebook Representations for VQ-VAE Speech Waveform Reconstruction. Yi Zhao, Haoyu Li, Cheng-I Lai, Jennifer Williams, Erica Cooper, Junichi Yamagishi |
| 2020 | Improved RawNet with Feature Map Scaling for Text-Independent Speaker Verification Using Raw Waveforms. Jee-weon Jung, Seung-Bin Kim, Hye-jin Shim, Ju-ho Kim, Ha-Jin Yu |
| 2020 | Improved Speech Enhancement Using TCN with Multiple Encoder-Decoder Layers. Vinith Kishore, Nitya Tiwari, Periyasamy Paramasivam |
| 2020 | Improved Speech Enhancement Using a Time-Domain GAN with Mask Learning. Ju Lin, Sufeng Niu, Adriaan J. de Lind van Wijngaarden, Jerome L. McClendon, Melissa C. Smith, Kuang-Ching Wang |
| 2020 | Improved Training Strategies for End-to-End Speech Recognition in Digital Voice Assistants. Hitesh Tulsiani, Ashtosh Sapru, Harish Arsikere, Surabhi Punjabi, Sri Garimella |
| 2020 | Improved Zero-Shot Voice Conversion Using Explicit Conditioning Signals. Shahan Nercessian |
| 2020 | Improving Code-Switching Language Modeling with Artificially Generated Texts Using Cycle-Consistent Adversarial Networks. Chia-Yu Li, Ngoc Thang Vu |
| 2020 | Improving Cognitive Impairment Classification by Generative Neural Network-Based Feature Augmentation. Bahman Mirheidari, Daniel Blackburn, Ronan O'Malley, Annalena Venneri, Traci Walker, Markus Reuber, Heidi Christensen |
| 2020 | Improving Cross-Lingual Transfer Learning for End-to-End Speech Recognition with Speech Translation. Changhan Wang, Juan Miguel Pino, Jiatao Gu |
| 2020 | Improving Detection of Alzheimer's Disease Using Automatic Speech Recognition to Identify High-Quality Segments for More Robust Feature Extraction. Yilin Pan, Bahman Mirheidari, Markus Reuber, Annalena Venneri, Daniel Blackburn, Heidi Christensen |
| 2020 | Improving End-to-End Speech-to-Intent Classification with Reptile. Yusheng Tian, Philip John Gorinski |
| 2020 | Improving Low Resource Code-Switched ASR Using Augmented Code-Switched TTS. Yash Sharma, Basil Abraham, Karan Taneja, Preethi Jyothi |
| 2020 | Improving Multi-Scale Aggregation Using Feature Pyramid Module for Robust Speaker Verification of Variable-Duration Utterances. Youngmoon Jung, Seong Min Kye, Yeunju Choi, Myunghun Jung, Hoirin Kim |
| 2020 | Improving On-Device Speaker Verification Using Federated Learning with Privacy. Filip Granqvist, Matt Seigel, Rogier C. van Dalen, Áine Cahill, Stephen Shum, Matthias Paulik |
| 2020 | Improving Opus Low Bit Rate Quality with Neural Speech Synthesis. Jan Skoglund, Jean-Marc Valin |
| 2020 | Improving Partition-Block-Based Acoustic Echo Canceler in Under-Modeling Scenarios. Wenzhi Fan, Jing Lu |
| 2020 | Improving Replay Detection System with Channel Consistency DenseNeXt for the ASVspoof 2019 Challenge. Chao Zhang, Junjie Cheng, Yanmei Gu, Huacan Wang, Jun Ma, Shaojun Wang, Jing Xiao |
| 2020 | Improving Speech Emotion Recognition Using Graph Attentive Bi-Directional Gated Recurrent Unit Network. Bo-Hao Su, Chun-Min Chang, Yun-Shao Lin, Chi-Chun Lee |
| 2020 | Improving Speech Intelligibility Through Speaker Dependent and Independent Spectral Style Conversion. Tuan Dinh, Alexander Kain, Kris Tjaden |
| 2020 | Improving Speech Recognition Using GAN-Based Speech Synthesis and Contrastive Unspoken Text Selection. Zhehuai Chen, Andrew Rosenberg, Yu Zhang, Gary Wang, Bhuvana Ramabhadran, Pedro J. Moreno |
| 2020 | Improving Speech Recognition of Compound-Rich Languages. Prabhat Pandey, Volker Leutnant, Simon Wiesler, Jahn Heymann, Daniel Willett |
| 2020 | Improving Tail Performance of a Deliberation E2E ASR Model Using a Large Text Corpus. Cal Peyser, Sepand Mavandadi, Tara N. Sainath, James Apfel, Ruoming Pang, Shankar Kumar |
| 2020 | Improving Transformer-Based Speech Recognition with Unsupervised Pre-Training and Multi-Task Semantic Knowledge Learning. Song Li, Lin Li, Qingyang Hong, Lingling Liu |
| 2020 | Improving Unsupervised Sparsespeech Acoustic Models with Categorical Reparameterization. Benjamin Milde, Chris Biemann |
| 2020 | Improving Vietnamese Named Entity Recognition from Speech Using Word Capitalization and Punctuation Recovery Models. Thai Binh Nguyen, Quang Minh Nguyen, Thi Thu Hien Nguyen, Quoc Truong Do, Luong Chi Mai |
| 2020 | Improving X-Vector and PLDA for Text-Dependent Speaker Verification. Zhuxin Chen, Yue Lin |
| 2020 | Improving the Performance of Acoustic-to-Articulatory Inversion by Removing the Training Loss of Noncritical Portions of Articulatory Channels Dynamically. Qiang Fang |
| 2020 | Improving the Prosody of RNN-Based English Text-To-Speech Synthesis by Incorporating a BERT Model. Tom Kenter, Manish Sharma, Rob Clark |
| 2020 | Improving the Speaker Identity of Non-Parallel Many-to-Many Voice Conversion with Adversarial Speaker Recognition. Shaojin Ding, Guanlong Zhao, Ricardo Gutierrez-Osuna |
| 2020 | In Defence of Metric Learning for Speaker Recognition. Joon Son Chung, Jaesung Huh, Seongkyu Mun, Minjae Lee, Hee-Soo Heo, Soyeon Choe, Chiheon Ham, Sunghwan Jung, Bong-Jin Lee, Icksang Han |
| 2020 | Inaudible Adversarial Perturbations for Targeted Attack in Speaker Recognition. Qing Wang, Pengcheng Guo, Lei Xie |
| 2020 | Incorporating Broad Phonetic Information for Speech Enhancement. Yen-Ju Lu, Chien-Feng Liao, Xugang Lu, Jeih-weih Hung, Yu Tsao |
| 2020 | Increasing the Intelligibility and Naturalness of Alaryngeal Speech Using Voice Conversion and Synthetic Fundamental Frequency. Tuan Dinh, Alexander Kain, Robin Samlan, Beiming Cao, Jun Wang |
| 2020 | Incremental Machine Speech Chain Towards Enabling Listening While Speaking in Real-Time. Sashi Novitasari, Andros Tjandra, Tomoya Yanagita, Sakriani Sakti, Satoshi Nakamura |
| 2020 | Incremental Text to Speech for Neural Sequence-to-Sequence Models Using Reinforcement Learning. Devang S. Ram Mohan, Raphael Lenain, Lorenzo Foglianti, Tian Huey Teh, Marlene Staib, Alexandra Torresquintero, Jiameng Gao |
| 2020 | Independent Echo Path Modeling for Stereophonic Acoustic Echo Cancellation. Yi Gao, Ian Liu, J. Zheng, Cheng Luo, Bin Li |
| 2020 | Independent and Automatic Evaluation of Speaker-Independent Acoustic-to-Articulatory Reconstruction. Maud Parrot, Juliette Millet, Ewan Dunbar |
| 2020 | Individual Variation in Language Attitudes Toward Voice-AI: The Role of Listeners' Autistic-Like Traits. Michelle Cohn, Melina Sarian, Kristin Predeck, Georgia Zellou |
| 2020 | Insertion-Based Modeling for End-to-End Automatic Speech Recognition. Yuya Fujita, Shinji Watanabe, Motoi Omachi, Xuankai Chang |
| 2020 | Instantaneous Time Delay Estimation of Broadband Signals. B. H. V. S. Narayana Murthy, J. V. Satyanarayana, Nivedita Chennupati, B. Yegnanarayana |
| 2020 | Integrating the Application and Realization of Mandarin 3rd Tone Sandhi in the Resolution of Sentence Ambiguity. Wei Lai, Aini Li |
| 2020 | Intelligibility Enhancement Based on Speech Waveform Modification Using Hearing Impairment. Shu Hikosaka, Shogo Seki, Tomoki Hayashi, Kazuhiro Kobayashi, Kazuya Takeda, Hideki Banno, Tomoki Toda |
| 2020 | Intelligibility-Enhancing Speech Modifications - The Hurricane Challenge 2.0. Jan Rennies, Henning F. Schepker, Cassia Valentini-Botinhao, Martin Cooke |
| 2020 | Interaction of Tone and Voicing in Mizo. Wendy Lalhminghlui, Priyankoo Sarmah |
| 2020 | Interactive Text-to-Speech System via Joint Style Analysis. Yang Gao, Weiyi Zheng, Zhaojun Yang, Thilo Köhler, Christian Fuegen, Qing He |
| 2020 | Intra-Class Variation Reduction of Speaker Representation in Disentanglement Framework. Yoohwan Kwon, Soo-Whan Chung, Hong-Goo Kang |
| 2020 | Intra-Utterance Similarity Preserving Knowledge Distillation for Audio Tagging. Chun-Chieh Chang, Chieh-Chi Kao, Ming Sun, Chao Wang |
| 2020 | Introducing the VoicePrivacy Initiative. Natalia A. Tomashenko, Brij Mohan Lal Srivastava, Xin Wang, Emmanuel Vincent, Andreas Nautsch, Junichi Yamagishi, Nicholas W. D. Evans, Jose Patino, Jean-François Bonastre, Paul-Gauthier Noé, Massimiliano Todisco |
| 2020 | Investigating Effective Additional Contextual Factors in DNN-Based Spontaneous Speech Synthesis. Yuki Yamashita, Tomoki Koriyama, Yuki Saito, Shinnosuke Takamichi, Yusuke Ijima, Ryo Masumura, Hiroshi Saruwatari |
| 2020 | Investigating Light-ResNet Architecture for Spoofing Detection Under Mismatched Conditions. Prasanth Parasu, Julien Epps, Kaavya Sriskandaraja, Gajan Suthokumar |
| 2020 | Investigating Robustness of Adversarial Samples Detection for Automatic Speaker Verification. Xu Li, Na Li, Jinghua Zhong, Xixin Wu, Xunying Liu, Dan Su, Dong Yu, Helen Meng |
| 2020 | Investigating Self-Supervised Pre-Training for End-to-End Speech Translation. Ha Nguyen, Fethi Bougares, Natalia A. Tomashenko, Yannick Estève, Laurent Besacier |
| 2020 | Investigating the Visual Lombard Effect with Gabor Based Features. Waito Chiu, Yan Xu, Andrew Abel, Chun Lin, Zhengzheng Tu |
| 2020 | Investigation of Data Augmentation Techniques for Disordered Speech Recognition. Mengzhe Geng, Xurong Xie, Shansong Liu, Jianwei Yu, Shoukang Hu, Xunying Liu, Helen Meng |
| 2020 | Investigation of Large-Margin Softmax in Neural Language Modeling. Jingjing Huo, Yingbo Gao, Weiyue Wang, Ralf Schlüter, Hermann Ney |
| 2020 | Investigation of NICT Submission for Short-Duration Speaker Verification Challenge 2020. Peng Shen, Xugang Lu, Hisashi Kawai |
| 2020 | Investigation of Phase Distortion on Perceived Speech Quality for Hearing-Impaired Listeners. Zhuohuang Zhang, Donald S. Williamson, Yi Shen |
| 2020 | Is Everything Fine, Grandma? Acoustic and Linguistic Modeling for Robust Elderly Speech Emotion Recognition. Gizem Sogancioglu, Oxana Verkholyak, Heysem Kaya, Dmitrii Fedotov, Tobias Cadèe, Albert Ali Salah, Alexey Karpov |
| 2020 | Iterative Compression of End-to-End ASR Model Using AutoML. Abhinav Mehrotra, Lukasz Dudziak, Jinsu Yeo, Young-Yoon Lee, Ravichander Vipperla, Mohamed S. Abdelfattah, Sourav Bhattacharya, Samin Ishtiaq, Alberto Gil C. P. Ramos, SangJeong Lee, Daehyun Kim, Nicholas D. Lane |
| 2020 | Iterative Pseudo-Labeling for Speech Recognition. Qiantong Xu, Tatiana Likhomanenko, Jacob Kahn, Awni Y. Hannun, Gabriel Synnaeve, Ronan Collobert |
| 2020 | JDI-T: Jointly Trained Duration Informed Transformer for Text-To-Speech without Explicit Alignment. Dan Lim, Won Jang, Gyeonghwan O, Heayoung Park, Bongwan Kim, Jaesam Yoon |
| 2020 | Joint Detection of Sentence Stress and Phrase Boundary for Prosody. Binghuai Lin, Liyuan Wang, Xiaoli Feng, Jinsong Zhang |
| 2020 | Joint Prediction of Punctuation and Disfluency in Speech Transcripts. Binghuai Lin, Liyuan Wang |
| 2020 | Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of any Number of Speakers. Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Tianyan Zhou, Takuya Yoshioka |
| 2020 | Joint Training for Simultaneous Speech Denoising and Dereverberation with Deep Embedding Representations. Cunhang Fan, Jianhua Tao, Bin Liu, Jiangyan Yi, Zhengqi Wen |
| 2020 | Jointly Encoding Word Confusion Network and Dialogue Context with BERT for Spoken Language Understanding. Chen Liu, Su Zhu, Zijian Zhao, Ruisheng Cao, Lu Chen, Kai Yu |
| 2020 | Jointly Fine-Tuning "BERT-Like" Self Supervised Models to Improve Multimodal Speech Emotion Recognition. Shamane Siriwardhana, Andrew Reis, Rivindu Weerasekera, Suranga Nanayakkara |
| 2020 | JukeBox: A Multilingual Singer Recognition Dataset. Anurag Chowdhury, Austin Cozzo, Arun Ross |
| 2020 | Kaldi-Web: An Installation-Free, On-Device Speech Recognition System. Mathieu Hu, Laurent Pierron, Emmanuel Vincent, Denis Jouvet |
| 2020 | Knowledge Distillation from Offline to Streaming RNN Transducer for End-to-End Speech Recognition. Gakuto Kurata, George Saon |
| 2020 | Knowledge-and-Data-Driven Amplitude Spectrum Prediction for Hierarchical Neural Vocoders. Yang Ai, Zhen-Hua Ling |
| 2020 | LAIX Corpus of Chinese Learner English: Towards a Benchmark for L2 English ASR. Yanhong Wang, Huan Luan, Jiahong Yuan, Bin Wang, Hui Lin |
| 2020 | LVCSR with Transformer Language Models. Eugen Beck, Ralf Schlüter, Hermann Ney |
| 2020 | Language Model Data Augmentation Based on Text Domain Transfer. Atsunori Ogawa, Naohiro Tawara, Marc Delcroix |
| 2020 | Language Modeling for Speech Analytics in Under-Resourced Languages. Simone Wills, Pieter Uys, Charl Johannes van Heerden, Etienne Barnard |
| 2020 | Large Scale Evaluation of Importance Maps in Automatic Speech Recognition. Viet Anh Trinh, Michael I. Mandel |
| 2020 | Large Scale Weakly and Semi-Supervised Learning for Low-Resource Video ASR. Kritika Singh, Vimal Manohar, Alex Xiao, Sergey Edunov, Ross B. Girshick, Vitaliy Liptchinsky, Christian Fuegen, Yatharth Saraf, Geoffrey Zweig, Abdelrahman Mohamed |
| 2020 | Large-Scale End-to-End Multilingual Speech Recognition and Language Identification with Multi-Task Learning. Wenxin Hou, Yue Dong, Bairong Zhuang, Longfei Yang, Jiatong Shi, Takahiro Shinozaki |
| 2020 | Large-Scale Transfer Learning for Low-Resource Spoken Language Understanding. Xueli Jia, Jianzong Wang, Zhiyong Zhang, Ning Cheng, Jing Xiao |
| 2020 | Lattice-Free Maximum Mutual Information Training of Multilingual Speech Recognition Systems. Srikanth R. Madikeri, Banriskhem K. Khonglah, Sibo Tong, Petr Motlícek, Hervé Bourlard, Daniel Povey |
| 2020 | Laughter Synthesis: Combining Seq2seq Modeling with Transfer Learning. Noé Tits, Kevin El Haddad, Thierry Dutoit |
| 2020 | Learnable Spectro-Temporal Receptive Fields for Robust Voice Type Discrimination. Tyler Vuong, Yangyang Xia, Richard M. Stern |
| 2020 | Learning Better Speech Representations by Worsening Interference. Jun Wang |
| 2020 | Learning Complex Spectral Mapping for Speech Enhancement with Improved Cross-Corpus Generalization. Ashutosh Pandey, DeLiang Wang |
| 2020 | Learning Contextual Language Embeddings for Monaural Multi-Talker Speech Recognition. Wangyou Zhang, Yanmin Qian |
| 2020 | Learning Fast Adaptation on Cross-Accented Speech Recognition. Genta Indra Winata, Samuel Cahyawijaya, Zihan Liu, Zhaojiang Lin, Andrea Madotto, Peng Xu, Pascale Fung |
| 2020 | Learning Higher Representations from Pre-Trained Deep Models with Data Augmentation for the COMPARE 2020 Challenge Mask Task. Tomoya Koike, Kun Qian, Björn W. Schuller, Yoshiharu Yamamoto |
| 2020 | Learning Intonation Pattern Embeddings for Arabic Dialect Identification. Aitor Arronte Alvarez, Elsayed Sabry Abdelaal Issa |
| 2020 | Learning Joint Articulatory-Acoustic Representations with Normalizing Flows. Pramit Saha, Sidney S. Fels |
| 2020 | Learning Speaker Embedding from Text-to-Speech. Jaejin Cho, Piotr Zelasko, Jesús Villalba, Shinji Watanabe, Najim Dehak |
| 2020 | Learning Syllable-Level Discrete Prosodic Representation for Expressive Speech Generation. Guangyan Zhang, Ying Qin, Tan Lee |
| 2020 | Learning Utterance-Level Representations with Label Smoothing for Speech Emotion Recognition. Jian Huang, Jianhua Tao, Bin Liu, Zheng Lian |
| 2020 | Learning Voice Representation Using Knowledge Distillation for Automatic Voice Casting. Adrien Gresse, Mathias Quillot, Richard Dufour, Jean-François Bonastre |
| 2020 | Learning to Detect Bipolar Disorder and Borderline Personality Disorder with Language and Speech in Non-Clinical Interviews. Bo Wang, Yue Wu, Niall Taylor, Terry J. Lyons, Maria Liakata, Alejo J. Nevado-Holgado, Kate E. A. Saunders |
| 2020 | Learning to Recognize Per-Rater's Emotion Perception Using Co-Rater Training Strategy with Soft and Hard Labels. Huang-Cheng Chou, Chi-Chun Lee |
| 2020 | Length- and Noise-Aware Training Techniques for Short-Utterance Speaker Recognition. Wenda Chen, Jonathan Huang, Tobias Bocklet |
| 2020 | Leveraging Unlabeled Speech for Sequence Discriminative Training of Acoustic Models. Ashtosh Sapru, Sri Garimella |
| 2020 | Lexical Stress in Urdu. Benazir Mumtaz, Tina Bögel, Miriam Butt |
| 2020 | Light Convolutional Neural Network with Feature Genuinization for Detection of Synthetic Speech Attacks. Zhenzong Wu, Rohan Kumar Das, Jichen Yang, Haizhou Li |
| 2020 | Lightweight End-to-End Speech Recognition from Raw Audio Data Using Sinc-Convolutions. Ludwig Kürzinger, Nicolas Lindae, Palle Klewitz, Gerhard Rigoll |
| 2020 | Lightweight LPCNet-Based Neural Vocoder with Tensor Decomposition. Hiroki Kanagawa, Yusuke Ijima |
| 2020 | Lightweight Online Noise Reduction on Embedded Devices Using Hierarchical Recurrent Neural Networks. Hendrik Schröter, Tobias Rosenkranz, Alberto N. Escalante-B., Pascal Zobel, Andreas Maier |
| 2020 | Links Between Production and Perception of Glottalisation in Individual Australian English Speaker/Listeners. Joshua Penney, Felicity Cox, Anita Szakay |
| 2020 | Lip Graph Assisted Audio-Visual Speech Recognition Using Bidirectional Synchronous Fusion. Hong Liu, Zhan Chen, Bing Yang |
| 2020 | Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition. Ye Bai, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Zhengqi Wen, Shuai Zhang |
| 2020 | Listen to What You Want: Neural Network-Based Universal Sound Selector. Tsubasa Ochiai, Marc Delcroix, Yuma Koizumi, Hiroaki Ito, Keisuke Kinoshita, Shoko Araki |
| 2020 | Listen, Watch and Understand at the Cocktail Party: Audio-Visual-Contextual Speech Separation. Chenda Li, Yanmin Qian |
| 2020 | Lite Audio-Visual Speech Enhancement. Shang-Yi Chuang, Yu Tsao, Chen-Chou Lo, Hsin-Min Wang |
| 2020 | Low Latency Auditory Attention Detection with Common Spatial Pattern Analysis of EEG Signals. Siqi Cai, Enze Su, Yonghao Song, Longhan Xie, Haizhou Li |
| 2020 | Low Latency End-to-End Streaming Speech Recognition with a Scout Network. Chengyi Wang, Yu Wu, Liang Lu, Shujie Liu, Jinyu Li, Guoli Ye, Ming Zhou |
| 2020 | Low Latency Speech Recognition Using End-to-End Prefetching. Shuo-Yiin Chang, Bo Li, David Rybach, Yanzhang He, Wei Li, Tara N. Sainath, Trevor Strohman |
| 2020 | Low-Latency Sequence-to-Sequence Speech Recognition and Translation by Partial Hypothesis Selection. Danni Liu, Gerasimos Spanakis, Jan Niehues |
| 2020 | Low-Latency Single Channel Speech Dereverberation Using U-Net Convolutional Neural Networks. Ahmet Emin Bulut, Kazuhito Koishida |
| 2020 | LungRN+NL: An Improved Adventitious Lung Sound Classification Using Non-Local Block ResNet Neural Network with Mixup Data Augmentation. Yi Ma, Xinzi Xu, Yongfu Li |
| 2020 | MIRNet: Learning Multiple Identities Representations in Overlapped Speech. Hyewon Han, Soo-Whan Chung, Hong-Goo Kang |
| 2020 | MLNET: An Adaptive Multiple Receptive-Field Attention Neural Network for Voice Activity Detection. Zhenpeng Zheng, Jianzong Wang, Ning Cheng, Jian Luo, Jing Xiao |
| 2020 | MLS: A Large-Scale Multilingual Dataset for Speech Research. Vineel Pratap, Qiantong Xu, Anuroop Sriram, Gabriel Synnaeve, Ronan Collobert |
| 2020 | Making a Distinction Between Schizophrenia and Bipolar Disorder Based on Temporal Parameters in Spontaneous Speech. Gábor Gosztolya, Anita Bagi, Szilvia Szalóki, István Szendi, Ildikó Hoffmann |
| 2020 | Malayalam-English Code-Switched: Grapheme to Phoneme System. Sreeja Manghat, Sreeram Manghat, Tanja Schultz |
| 2020 | Mandarin Lexical Tones: A Corpus-Based Study of Word Length, Syllable Position and Prosodic Position on Duration. Yaru Wu, Martine Adda-Decker, Lori Lamel |
| 2020 | Mandarin and English Adults' Cue-Weighting of Lexical Stress. Zhen Zeng, Karen Mattock, Liquan Liu, Varghese Peter, Alba Tuninetti, Feng-Ming Tsao |
| 2020 | Mask CTC: Non-Autoregressive End-to-End ASR with CTC and Mask Predict. Yosuke Higuchi, Shinji Watanabe, Nanxin Chen, Tetsuji Ogawa, Tetsunori Kobayashi |
| 2020 | Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters. Vineel Pratap, Anuroop Sriram, Paden Tomasello, Awni Y. Hannun, Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert |
| 2020 | MatchboxNet: 1D Time-Channel Separable Convolutional Neural Network Architecture for Speech Commands Recognition. Somshubra Majumdar, Boris Ginsburg |
| 2020 | Memory Controlled Sequential Self Attention for Sound Recognition. Arjun Pankajakshan, Helen L. Bear, Vinod Subramanian, Emmanouil Benetos |
| 2020 | Mentoring-Reverse Mentoring for Unsupervised Multi-Channel Speech Source Separation. Yu Nakagome, Masahito Togami, Tetsuji Ogawa, Tetsunori Kobayashi |
| 2020 | Meta Multi-Task Learning for Speech Emotion Recognition. Ruichu Cai, Kaibin Guo, Boyan Xu, Xiaoyan Yang, Zhenjie Zhang |
| 2020 | Meta-Learning for Short Utterance Speaker Recognition with Imbalance Length Pairs. Seong Min Kye, Youngmoon Jung, Haebeom Lee, Sung Ju Hwang, Hoirin Kim |
| 2020 | Meta-Learning for Speech Emotion Recognition Considering Ambiguity of Emotion Labels. Takuya Fujioka, Takeshi Homma, Kenji Nagamatsu |
| 2020 | Metadata-Aware End-to-End Keyword Spotting. Hongyi Liu, Apurva Abhyankar, Yuriy Mishchenko, Thibaud Sénéchal, Gengshen Fu, Brian Kulis, Noah D. Stein, Anish Shah, Shiv Naga Prasad Vitaladevuni |
| 2020 | Metric Learning Loss Functions to Reduce Domain Mismatch in the x-Vector Space for Language Recognition. Raphaël Duroselle, Denis Jouvet, Irina Illina |
| 2020 | Microphone Array Post-Filter for Target Speech Enhancement Without a Prior Information of Point Interferers. Guanjun Li, Shan Liang, Shuai Nie, Wenju Liu, Zhanlei Yang, Longshuai Xiao |
| 2020 | Microprosodic Variability in Plosives in German and Austrian German. Margaret Zellers, Barbara Schuppler |
| 2020 | Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition. Chao Weng, Chengzhu Yu, Jia Cui, Chunlei Zhang, Dong Yu |
| 2020 | Mixed Case Contextual ASR Using Capitalization Masks. Diamantino Caseiro, Pat Rondon, Quoc-Nam Le The, Petar S. Aleksic |
| 2020 | Mixtures of Deep Neural Experts for Automated Speech Scoring. Sara Papi, Edmondo Trentin, Roberto Gretter, Marco Matassoni, Daniele Falavigna |
| 2020 | MoBoAligner: A Neural Alignment Model for Non-Autoregressive TTS with Monotonic Boundary Search. Naihan Li, Shujie Liu, Yanqing Liu, Sheng Zhao, Ming Liu, Ming Zhou |
| 2020 | Mobile-Assisted Prosody Training for Limited English Proficiency: Learner Background and Speech Learning Pattern. Kevin Hirschi, Okim Kang, Catia Cucchiarini, John H. L. Hansen, Keelan Evanini, Helmer Strik |
| 2020 | Modeling ASR Ambiguity for Neural Dialogue State Tracking. Vaishali Pal, Fabien Guillot, Manish Shrivastava, Jean-Michel Renders, Laurent Besacier |
| 2020 | Modeling Global Body Configurations in American Sign Language. Nicholas Wilkins, Max Cordes Galbraith, Ifeoma Nwogu |
| 2020 | Monolingual Data Selection Analysis for English-Mandarin Hybrid Code-Switching Speech Recognition. Haobo Zhang, Haihua Xu, Van Tung Pham, Hao Huang, Eng Siong Chng |
| 2020 | Multi-Encoder-Decoder Transformer for Code-Switching Speech Recognition. Xinyuan Zhou, Emre Yilmaz, Yanhua Long, Yijie Li, Haizhou Li |
| 2020 | Multi-Lingual Multi-Speaker Text-to-Speech Synthesis for Voice Cloning with Online Speaker Enrollment. Zhaoyu Liu, Brian Mak |
| 2020 | Multi-Modal Attention for Speech Emotion Recognition. Zexu Pan, Zhaojie Luo, Jichen Yang, Haizhou Li |
| 2020 | Multi-Modal Embeddings Using Multi-Task Learning for Emotion Recognition. Aparna Khare, Srinivas Parthasarathy, Shiva Sundaram |
| 2020 | Multi-Modal Fusion with Gating Using Audio, Lexical and Disfluency Features for Alzheimer's Dementia Recognition from Spontaneous Speech. Morteza Rohanian, Julian Hough, Matthew Purver |
| 2020 | Multi-Modality Matters: A Performance Leap on VoxCeleb. Zhengyang Chen, Shuai Wang, Yanmin Qian |
| 2020 | Multi-Path RNN for Hierarchical Modeling of Long Sequential Data and its Application to Speaker Stream Separation. Keisuke Kinoshita, Thilo von Neumann, Marc Delcroix, Tomohiro Nakatani, Reinhold Haeb-Umbach |
| 2020 | Multi-Reference Neural TTS Stylization with Adversarial Cycle Consistency. Matt Whitehill, Shuang Ma, Daniel McDuff, Yale Song |
| 2020 | Multi-Scale Convolution for Robust Keyword Spotting. Chen Yang, Xue Wen, Liming Song |
| 2020 | Multi-Scale TCN: Exploring Better Temporal DNN Model for Causal Speech Enhancement. Lu Zhang, Mingjiang Wang |
| 2020 | Multi-Speaker Emotion Conversion via Latent Variable Regularization and a Chained Encoder-Decoder-Predictor Network. Ravi Shankar, Hsi-Wei Hsieh, Nicolas Charon, Archana Venkataraman |
| 2020 | Multi-Speaker Text-to-Speech Synthesis Using Deep Gaussian Processes. Kentaro Mitsui, Tomoki Koriyama, Hiroshi Saruwatari |
| 2020 | Multi-Stream Attention-Based BLSTM with Feature Segmentation for Speech Emotion Recognition. Yuya Chiba, Takashi Nose, Akinori Ito |
| 2020 | Multi-Talker ASR for an Unknown Number of Sources: Joint Training of Source Counting, Separation and ASR. Thilo von Neumann, Christoph Böddeker, Lukas Drude, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani, Reinhold Haeb-Umbach |
| 2020 | Multi-Task Learning for End-to-End Noise-Robust Bandwidth Extension. Nana Hou, Chenglin Xu, Joey Tianyi Zhou, Eng Siong Chng, Haizhou Li |
| 2020 | Multi-Task Learning for Voice Related Recognition Tasks. Ana Montalvo, José R. Calvo, Jean-François Bonastre |
| 2020 | Multi-Task Network for Noise-Robust Keyword Spotting and Speaker Verification Using CTC-Based Soft VAD and Global Query Attention. Myunghun Jung, Youngmoon Jung, Jahyun Goo, Hoirin Kim |
| 2020 | Multi-Task Siamese Neural Network for Improving Replay Attack Detection. Patrick von Platen, Fei Tao, Gökhan Tür |
| 2020 | MultiSpeech: Multi-Speaker Text to Speech with Transformer. Mingjian Chen, Xu Tan, Yi Ren, Jin Xu, Hao Sun, Sheng Zhao, Tao Qin |
| 2020 | Multilingual Acoustic and Language Modeling for Ethio-Semitic Languages. Solomon Teferra Abate, Martha Yifiru Tachbelie, Tanja Schultz |
| 2020 | Multilingual Jointly Trained Acoustic and Written Word Embeddings. Yushi Hu, Shane Settle, Karen Livescu |
| 2020 | Multilingual Speech Recognition Using Language-Specific Phoneme Recognition as Auxiliary Task for Indian Languages. Hardik B. Sailor, Thomas Hain |
| 2020 | Multilingual Speech Recognition with Self-Attention Structured Parameterization. Yun Zhu, Parisa Haghani, Anshuman Tripathi, Bhuvana Ramabhadran, Brian Farris, Hainan Xu, Han Lu, Hasim Sak, Isabel Leal, Neeraj Gaur, Pedro J. Moreno, Qian Zhang |
| 2020 | Multimodal Association for Speaker Verification. Suwon Shon, James R. Glass |
| 2020 | Multimodal Deception Detection Using Automatically Extracted Acoustic, Visual, and Lexical Features. Jiaxuan Zhang, Sarah Ita Levitan, Julia Hirschberg |
| 2020 | Multimodal Emotion Recognition Using Cross-Modal Attention and 1D Convolutional Neural Networks. Krishna D. N, Ankita Patil |
| 2020 | Multimodal Inductive Transfer Learning for Detection of Alzheimer's Dementia and its Severity. Utkarsh Sarawgi, Wazeer Zulfikar, Nouran Soliman, Pattie Maes |
| 2020 | Multimodal Semi-Supervised Learning Framework for Punctuation Prediction in Conversational Speech. Monica Sunkara, Srikanth Ronanki, Dhanush Bekal, Sravan Bodapati, Katrin Kirchhoff |
| 2020 | Multimodal Sign Language Recognition via Temporal Deformable Convolutional Sequence Learning. Katerina Papadimitriou, Gerasimos Potamianos |
| 2020 | Multimodal Speech Emotion Recognition Using Cross Attention with Aligned Audio and Text. Yoonhyung Lee, Seunghyun Yoon, Kyomin Jung |
| 2020 | Multimodal Target Speech Separation with Voice and Face References. Leyuan Qu, Cornelius Weber, Stefan Wermter |
| 2020 | Multiscale System for Alzheimer's Dementia Recognition Through Spontaneous Speech. Erik Edwards, Charles Dognin, Bajibabu Bollepalli, Maneesh Kumar Singh |
| 2020 | NAAGN: Noise-Aware Attention-Gated Network for Speech Enhancement. Feng Deng, Tao Jiang, Xiaorui Wang, Chen Zhang, Yan Li |
| 2020 | NEC-TT Speaker Verification System for SRE'19 CTS Challenge. Kong Aik Lee, Koji Okabe, Hitoshi Yamamoto, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Keisuke Ishikawa, Koichi Shinoda |
| 2020 | NPU Speaker Verification System for INTERSPEECH 2020 Far-Field Speaker Verification Challenge. Li Zhang, Jian Wu, Lei Xie |
| 2020 | Naturalness Enhancement with Linguistic Information in End-to-End TTS Using Unsupervised Parallel Encoding. Alex Peiró Lilja, Mireia Farrús |
| 2020 | Neural Architecture Search for Keyword Spotting. Tong Mo, Yakun Yu, Mohammad Salameh, Di Niu, Shangling Jui |
| 2020 | Neural Architecture Search on Acoustic Scene Classification. Jixiang Li, Chuming Liang, Bo Zhang, Zhao Wang, Fei Xiang, Xiangxiang Chu |
| 2020 | Neural Discriminant Analysis for Deep Speaker Embedding. Lantian Li, Dong Wang, Thomas Fang Zheng |
| 2020 | Neural Entrainment to Natural Speech Envelope Based on Subject Aligned EEG Signals. Di Zhou, Gaoyan Zhang, Jianwu Dang, Shuang Wu, Zhuo Zhang |
| 2020 | Neural Homomorphic Vocoder. Zhijun Liu, Kuan Chen, Kai Yu |
| 2020 | Neural Language Modeling with Implicit Cache Pointers. Ke Li, Daniel Povey, Sanjeev Khudanpur |
| 2020 | Neural PLDA Modeling for End-to-End Speaker Verification. Shreyas Ramoji, Prashant Krishnan V, Sriram Ganapathy |
| 2020 | Neural Representations of Dialogical History for Improving Upcoming Turn Acoustic Parameters Prediction. Simone Fuscone, Benoît Favre, Laurent Prévot |
| 2020 | Neural Spatio-Temporal Beamformer for Target Speech Separation. Yong Xu, Meng Yu, Shi-Xiong Zhang, Lianwu Chen, Chao Weng, Jianming Liu, Dong Yu |
| 2020 | Neural Speech Completion. Kazuki Tsunematsu, Johanes Effendi, Sakriani Sakti, Satoshi Nakamura |
| 2020 | Neural Speech Decoding for Amyotrophic Lateral Sclerosis. Debadatta Dash, Paul Ferrari, Angel W. Hernandez-Mulero, Daragh Heitzman, Sara G. Austin, Jun Wang |
| 2020 | Neural Speech Separation Using Spatially Distributed Microphones. Dongmei Wang, Zhuo Chen, Takuya Yoshioka |
| 2020 | Neural Text-to-Speech with a Modeling-by-Generation Excitation Vocoder. Eunwoo Song, Min-Jae Hwang, Ryuichi Yamamoto, Jin-Seob Kim, Ohsung Kwon, Jae-Min Kim |
| 2020 | Neural Zero-Inflated Quality Estimation Model for Automatic Speech Recognition System. Kai Fan, Bo Li, Jiayi Wang, Shiliang Zhang, Boxing Chen, Niyu Ge, Zhijie Yan |
| 2020 | Neutral Tone in Changde Mandarin. Zhenrui Zhang, Fang Hu |
| 2020 | Neutralization of Voicing Distinction of Stops in Tohoku Dialects of Japanese: Field Work and Acoustic Measurements. Ai Mizoguchi, Ayako Hashimoto, Sanae Matsui, Setsuko Imatomi, Ryunosuke Kobayashi, Mafuyu Kitahara |
| 2020 | New Advances in Speaker Diarization. Hagai Aronowitz, Weizhong Zhu, Masayuki Suzuki, Gakuto Kurata, Ron Hoory |
| 2020 | Noise Tokens: Learning Neural Noise Templates for Environment-Aware Speech Enhancement. Haoyu Li, Junichi Yamagishi |
| 2020 | Noisy-Reverberant Speech Enhancement Using DenseUNet with Time-Frequency Attention. Yan Zhao, DeLiang Wang |
| 2020 | Non-Autoregressive End-to-End TTS with Coarse-to-Fine Decoding. Tao Wang, Xuefei Liu, Jianhua Tao, Jiangyan Yi, Ruibo Fu, Zhengqi Wen |
| 2020 | Non-Intrusive Diagnostic Monitoring of Fullband Speech Quality. Sebastian Möller, Tobias Hübschen, Thilo Michael, Gabriel Mittag, Gerhard Schmidt |
| 2020 | Non-Native Children's Automatic Speech Recognition: The INTERSPEECH 2020 Shared Task ALTA Systems. Kate M. Knill, Linlin Wang, Yu Wang, Xixin Wu, Mark J. F. Gales |
| 2020 | Non-Parallel Emotion Conversion Using a Deep-Generative Hybrid Network and an Adversarial Pair Discriminator. Ravi Shankar, Jacob Sager, Archana Venkataraman |
| 2020 | Non-Parallel Many-to-Many Voice Conversion with PSR-StarGAN. Yanping Li, Dongxiang Xu, Yan Zhang, Yang Wang, Binbin Chen |
| 2020 | Non-Parallel Voice Conversion with Fewer Labeled Data by Conditional Generative Adversarial Networks. Minchuan Chen, Weijian Hou, Jun Ma, Shaojun Wang, Jing Xiao |
| 2020 | Nonlinear ISA with Auxiliary Variables for Learning Speech Representations. Amrith Setlur, Barnabás Póczos, Alan W. Black |
| 2020 | Nonlinear Residual Echo Suppression Based on Multi-Stream Conv-TasNet. Hongsheng Chen, Teng Xiang, Kai Chen, Jing Lu |
| 2020 | Nonlinear Residual Echo Suppression Using a Recurrent Neural Network. Lukas Pfeifenberger, Franz Pernkopf |
| 2020 | Nonparallel Emotional Speech Conversion Using VAE-GAN. Yuexin Cao, Zhengchen Liu, Minchuan Chen, Jun Ma, Shaojun Wang, Jing Xiao |
| 2020 | Nonparallel Training of Exemplar-Based Voice Conversion System Using INCA-Based Alignment Technique. Hitoshi Suda, Gaku Kotani, Daisuke Saito |
| 2020 | Now You're Speaking My Language: Visual Language Identification. Triantafyllos Afouras, Joon Son Chung, Andrew Zisserman |
| 2020 | ORCA-CLEAN: A Deep Denoising Toolkit for Killer Whale Communication. Christian Bergler, Manuel Schmitt, Andreas Maier, Simeon Smeele, Volker Barth, Elmar Nöth |
| 2020 | On Front-End Gain Invariant Modeling for Wake Word Spotting. Yixin Gao, Noah D. Stein, Chieh-Chi Kao, Yunliang Cai, Ming Sun, Tao Zhang, Shiv Naga Prasad Vitaladevuni |
| 2020 | On Improving Code Mixed Speech Synthesis with Mixlingual Grapheme-to-Phoneme Model. Shubham Bansal, Arijit Mukherjee, Sandeepkumar Satpal, Rupesh Kumar Mehta |
| 2020 | On Loss Functions and Recurrency Training for GAN-Based Speech Enhancement Systems. Zhuohuang Zhang, Chengyun Deng, Yi Shen, Donald S. Williamson, Yongtao Sha, Yi Zhang, Hui Song, Xiangang Li |
| 2020 | On Parameter Adaptation in Softmax-Based Cross-Entropy Loss for Improved Convergence Speed and Accuracy in DNN-Based Speaker Recognition. Magdalena Rybicka, Konrad Kowalczyk |
| 2020 | On Semi-Supervised LF-MMI Training of Acoustic Models with Limited Data. Imran A. Sheikh, Emmanuel Vincent, Irina Illina |
| 2020 | On Synthesis for Supervised Monaural Speech Separation in Time Domain. Jingjing Chen, Qirong Mao, Dong Liu |
| 2020 | On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition. Jinyu Li, Yu Wu, Yashesh Gaur, Chengyi Wang, Rui Zhao, Shujie Liu |
| 2020 | On the Robustness and Training Dynamics of Raw Waveform Models. Erfan Loweimi, Peter Bell, Steve Renals |
| 2020 | On the Usage of Multi-Feature Integration for Speaker Verification and Language Identification. Zheng Li, Miao Zhao, Jing Li, Lin Li, Qingyang Hong |
| 2020 | One Model, Many Languages: Meta-Learning for Multilingual Text-to-Speech. Tomás Nekvinda, Ondrej Dusek |
| 2020 | Ongoing Phonologization of Word-Final Voicing Alternations in Two Romance Languages: Romanian and French. Mathilde Hutin, Adèle Jatteau, Ioana Vasilescu, Lori Lamel, Martine Adda-Decker |
| 2020 | Online Blind Reverberation Time Estimation Using CRNNs. Shuwen Deng, Wolfgang Mack, Emanuël A. P. Habets |
| 2020 | Online Directional Speech Enhancement Using Geometrically Constrained Independent Vector Analysis. Li Li, Kazuhito Koishida, Shoji Makino |
| 2020 | Online Monaural Speech Enhancement Using Delayed Subband LSTM. Xiaofei Li, Radu Horaud |
| 2020 | Open-Set Short Utterance Forensic Speaker Verification Using Teacher-Student Network with Explicit Inductive Bias. Mufan Sang, Wei Xia, John H. L. Hansen |
| 2020 | Optimization and Evaluation of an Intelligibility-Improving Signal Processing Approach (IISPA) for the Hurricane Challenge 2.0 with FADE. Marc René Schädler |
| 2020 | Overview of the Interspeech TLT2020 Shared Task on ASR for Non-Native Children's Speech. Roberto Gretter, Marco Matassoni, Daniele Falavigna, Keelan Evanini, Chee Wee Leong |
| 2020 | POCO: A Voice Spoofing and Liveness Detection Corpus Based on Pop Noise. Kosuke Akimoto, Seng Pei Liew, Sakiko Mishima, Ryo Mizushima, Kong Aik Lee |
| 2020 | Pair Expansion for Learning Multilingual Semantic Embeddings Using Disjoint Visually-Grounded Speech Audio Datasets. Yasunori Ohishi, Akisato Kimura, Takahito Kawanishi, Kunio Kashino, David Harwath, James R. Glass |
| 2020 | Paralinguistic Classification of Mask Wearing by Image Classifiers and Fusion. Jeno Szep, Salim Hariri |
| 2020 | Parallel Rescoring with Transformer for Streaming On-Device Speech Recognition. Wei Li, James Qin, Chung-Cheng Chiu, Ruoming Pang, Yanzhang He |
| 2020 | Pardon the Interruption: An Analysis of Gender and Turn-Taking in U.S. Supreme Court Oral Arguments. Haley Lepp, Gina-Anne Levow |
| 2020 | Parkinson's Disease Detection from Speech Using Single Frequency Filtering Cepstral Coefficients. Sudarsana Reddy Kadiri, Rashmi Kethireddy, Paavo Alku |
| 2020 | Partial AUC Optimisation Using Recurrent Neural Networks for Music Detection with Limited Training Data. Pablo Gimeno, Victoria Mingote, Alfonso Ortega Giménez, Antonio Miguel, Eduardo Lleida |
| 2020 | Peking Opera Synthesis via Duration Informed Attention Network. Yusong Wu, Shengchen Li, Chengzhu Yu, Heng Lu, Chao Weng, Liqiang Zhang, Dong Yu |
| 2020 | Perceptimatic: A Human Speech Perception Benchmark for Unsupervised Subword Modelling. Juliette Millet, Ewan Dunbar |
| 2020 | Perception and Production of Mandarin Initial Stops by Native Urdu Speakers. Dan Du, Xianjin Zhu, Zhu Li, Jinsong Zhang |
| 2020 | Perception of Concatenative vs. Neural Text-To-Speech (TTS): Differences in Intelligibility in Noise and Language Attitudes. Michelle Cohn, Georgia Zellou |
| 2020 | Perception of English Fricatives and Affricates by Advanced Chinese Learners of English. Yizhou Lan |
| 2020 | Perception of Japanese Consonant Length by Native Speakers of Korean Differing in Japanese Learning Experience. Kimiko Tsukada, Joo-Yeon Kim, Jeong-Im Han |
| 2020 | Perception of Privacy Measured in the Crowd - Paired Comparison on the Effect of Background Noises. Anna Leschanowsky, Sneha Das, Tom Bäckström, Pablo Pérez Zarazaga |
| 2020 | Phase Based Spectro-Temporal Features for Building a Robust ASR System. Anirban Dutta, Ashishkumar Prabhakar Gudmalwar, Ch. V. Rama Rao |
| 2020 | Phase-Aware Music Super-Resolution Using Generative Adversarial Networks. Shichao Hu, Bin Zhang, Beici Liang, Ethan Zhao, Simon Lui |
| 2020 | Phoneme-to-Grapheme Conversion Based Large-Scale Pre-Training for End-to-End Automatic Speech Recognition. Ryo Masumura, Naoki Makishima, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Shota Orihashi |
| 2020 | Phonetic Accommodation of L2 German Speakers to the Virtual Language Learning Tutor Mirabella. Iona Gessinger, Bernd Möbius, Bistra Andreeva, Eran Raveh, Ingmar Steiner |
| 2020 | Phonetic Entrainment in Cooperative Dialogues: A Case of Russian. Alla Menshikova, Daniil Kocharov, Tatiana Kachkovskaia |
| 2020 | Phonetic, Frame Clustering and Intelligibility Analyses for the INTERSPEECH 2020 ComParE Challenge. Claude Montacié, Marie-José Caraty |
| 2020 | Phonetically-Aware Coupled Network For Short Duration Text-Independent Speaker Verification. Siqi Zheng, Yun Lei, Hongbin Suo |
| 2020 | Phonological Features for 0-Shot Multilingual Speech Synthesis. Marlene Staib, Tian Huey Teh, Alexandra Torresquintero, Devang S. Ram Mohan, Lorenzo Foglianti, Raphael Lenain, Jiameng Gao |
| 2020 | Pitch Declination and Final Lowering in Northeastern Mandarin. Ping Cui, Jianjing Kuang |
| 2020 | PoCoNet: Better Speech Enhancement with Frequency-Positional Embeddings, Semi-Supervised Conversational Data, and Biased Loss. Umut Isik, Ritwik Giri, Neerad Phansalkar, Jean-Marc Valin, Karim Helwani, Arvindh Krishnaswamy |
| 2020 | Poetic Meter Classification Using i-Vector-MTF Fusion. Rajeev Rajan, Aiswarya Vinod Kumar, Ben P. Babu |
| 2020 | Polishing the Classical Likelihood Ratio Test by Supervised Learning for Voice Activity Detection. Tianjiao Xu, Hui Zhang, Xueliang Zhang |
| 2020 | Predicting Collaborative Task Performance Using Graph Interlocutor Acoustic Network in Small Group Interaction. Shun-Chang Zhong, Bo-Hao Su, Wei Huang, Yi-Ching Liu, Chi-Chun Lee |
| 2020 | Predicting Detection Filters for Small Footprint Open-Vocabulary Keyword Spotting. Théodore Bluche, Thibault Gisselbrecht |
| 2020 | Predicting Intelligibility of Enhanced Speech Using Posteriors Derived from DNN-Based ASR System. Kenichi Arai, Shoko Araki, Atsunori Ogawa, Keisuke Kinoshita, Tomohiro Nakatani, Toshio Irino |
| 2020 | Prediction of Head Motion from Speech Waveforms with a Canonical-Correlation-Constrained Autoencoder. JinHong Lu, Hiroshi Shimodaira |
| 2020 | Prediction of Sleepiness Ratings from Voice by Man and Machine. Mark A. Huckvale, András Beke, Mirei Ikushima |
| 2020 | Pretrained Semantic Speech Embeddings for End-to-End Spoken Language Understanding via Cross-Modal Teacher-Student Learning. Pavel Denisov, Ngoc Thang Vu |
| 2020 | Principal Style Components: Expressive Style Control and Cross-Speaker Transfer in Neural TTS. Alexander Sorin, Slava Shechtman, Ron Hoory |
| 2020 | Privacy Guarantees for De-Identifying Text Transformations. David Ifeoluwa Adelani, Ali Davody, Thomas Kleinbauer, Dietrich Klakow |
| 2020 | Processes and Consequences of Co-Articulation in Mandarin V Mingqiong Luo |
| 2020 | Pronunciation Erroneous Tendency Detection with Language Adversarial Represent Learning. Longfei Yang, Kaiqi Fu, Jinsong Zhang, Takahiro Shinozaki |
| 2020 | Prosodic Characteristics of Genuine and Mock (Im)polite Mandarin Utterances. Chengwei Xu, Wentao Gu |
| 2020 | Prosody Learning Mechanism for Speech Synthesis System Without Text Length Limit. Zhen Zeng, Jianzong Wang, Ning Cheng, Jing Xiao |
| 2020 | Prosody and Breathing: A Comparison Between Rhetorical and Information-Seeking Questions in German and Brazilian Portuguese. Jana Neitsch, Plínio A. Barbosa, Oliver Niebuhr |
| 2020 | Prototypical Q Networks for Automatic Conversational Diagnosis and Few-Shot New Disease Adaption. Hongyin Luo, Shang-wen Li, James R. Glass |
| 2020 | Punctuation Prediction in Spontaneous Conversations: Can We Mitigate ASR Errors with Retrofitted Word Embeddings? Lukasz Augustyniak, Piotr Szymanski, Mikolaj Morzy, Piotr Zelasko, Adrian Szymczak, Jan Mizgajski, Yishay Carmiel, Najim Dehak |
| 2020 | PyChain: A Fully Parallelized PyTorch Implementation of LF-MMI for End-to-End ASR. Yiwen Shao, Yiming Wang, Daniel Povey, Sanjeev Khudanpur |
| 2020 | Quantification of Transducer Misalignment in Ultrasound Tongue Imaging. Tamás Gábor Csapó, Kele Xu |
| 2020 | Quantization Aware Training with Absolute-Cosine Regularization for Automatic Speech Recognition. Hieu Duy Nguyen, Anastasios Alexandridis, Athanasios Mouchtaris |
| 2020 | Quasi-Periodic Parallel WaveGAN Vocoder: A Non-Autoregressive Pitch-Dependent Dilated Convolution Model for Parametric Speech Generation. Yi-Chiao Wu, Tomoki Hayashi, Takuma Okamoto, Hisashi Kawai, Tomoki Toda |
| 2020 | Quaternion Neural Networks for Multi-Channel Distant Speech Recognition. Xinchi Qiu, Titouan Parcollet, Mirco Ravanelli, Nicholas D. Lane, Mohamed Morchid |
| 2020 | RECOApy: Data Recording, Pre-Processing and Phonetic Transcription for End-to-End Speech-Based Applications. Adriana Stan |
| 2020 | Rapid Enhancement of NLP Systems by Acquisition of Data in Correlated Domains. Tejas Udayakumar, Kinnera Saranu, Mayuresh Sanjay Oak, Ajit Ashok Saunshikhar, Sandip Shriram Bapat |
| 2020 | Rapid RNN-T Adaptation Using Personalized Speech Synthesis and Neural Language Generator. Yan Huang, Jinyu Li, Lei He, Wenning Wei, William Gale, Yifan Gong |
| 2020 | Raw Sign and Magnitude Spectra for Multi-Head Acoustic Modelling. Erfan Loweimi, Peter Bell, Steve Renals |
| 2020 | Raw Speech Waveform Based Classification of Patients with ALS, Parkinson's Disease and Healthy Controls Using CNN-BLSTM. Jhansi Mallela, Aravind Illa, Yamini Belur, Atchayaram Nalini, Ravi Yadav, Pradeep Reddy, Dipanjan Gope, Prasanta Kumar Ghosh |
| 2020 | Re-Weighted Interval Loss for Handling Data Imbalance Problem of End-to-End Keyword Spotting. Kun Zhang, Zhiyong Wu, Daode Yuan, Jian Luan, Jia Jia, Helen Meng, Binheng Song |
| 2020 | Real Time Speech Enhancement in the Waveform Domain. Alexandre Défossez, Gabriel Synnaeve, Yossi Adi |
| 2020 | Real-Time Single-Channel Deep Neural Network-Based Speech Enhancement on Edge Devices. Nikhil Shankar, Gautam Shreedhar Bhat, Issa M. S. Panahi |
| 2020 | Real-Time, Full-Band, Online DNN-Based Voice Conversion System Using a Single CPU. Takaaki Saeki, Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari |
| 2020 | Recognising Emotions in Dysarthric Speech Using Typical Speech Data. Lubna Alhinti, Stuart P. Cunningham, Heidi Christensen |
| 2020 | Recognition-Synthesis Based Non-Parallel Voice Conversion with Adversarial Learning. Jing-Xuan Zhang, Zhen-Hua Ling, Li-Rong Dai |
| 2020 | Recognize Mispronunciations to Improve Non-Native Acoustic Modeling Through a Phone Decoder Built from One Edit Distance Finite State Automaton. Wei Chu, Yang Liu, Jianwei Zhou |
| 2020 | Reconciliation of Multiple Corpora for Speech Emotion Recognition by Multiple Classifiers with an Adversarial Corpus Discriminator. Zhi Zhu, Yoshinao Sato |
| 2020 | Reformer-TTS: Neural Speech Synthesis with Reformer Network. Hyeong Rae Ihm, Joun Yeop Lee, Byoung Jin Choi, Sung Jun Cheon, Nam Soo Kim |
| 2020 | Regional Resonance of the Lower Vocal Tract and its Contribution to Speaker Characteristics. Lin Zhang, Kiyoshi Honda, Jianguo Wei, Seiji Adachi |
| 2020 | Relational Teacher Student Learning with Neural Label Embedding for Device Adaptation in Acoustic Scene Classification. Hu Hu, Sabato Marco Siniscalchi, Yannan Wang, Chin-Hui Lee |
| 2020 | Relative Positional Encoding for Speech Recognition and Direct Translation. Ngoc-Quan Pham, Thanh-Le Ha, Tuan-Nam Nguyen, Thai-Son Nguyen, Elizabeth Salesky, Sebastian Stüker, Jan Niehues, Alex Waibel |
| 2020 | Releasing a Toolkit and Comparing the Performance of Language Embeddings Across Various Spoken Language Identification Datasets. Matias Lindgren, Tommi Jauhiainen, Mikko Kurimo |
| 2020 | Removing Bias with Residual Mixture of Multi-View Attention for Speech Emotion Recognition. Md Asif Jalal, Rosanna Milner, Thomas Hain, Roger K. Moore |
| 2020 | Representation Based Meta-Learning for Few-Shot Spoken Intent Recognition. Ashish R. Mittal, Samarth Bharadwaj, Shreya Khare, Saneem A. Chemmengath, Karthik Sankaranarayanan, Brian Kingsbury |
| 2020 | Rescore in a Flash: Compact, Cache Efficient Hashing Data Structures for n-Gram Language Models. Grant P. Strimel, Ariya Rastrow, Gautam Tiwari, Adrien Piérard, Jon Webb |
| 2020 | Resource-Adaptive Deep Learning for Visual Speech Recognition. Alexandros Koumparoulis, Gerasimos Potamianos, Samuel Thomas, Edmilson Da Silva Morais |
| 2020 | Reverberation Modeling for Source-Filter-Based Neural Vocoder. Yang Ai, Xin Wang, Junichi Yamagishi, Zhen-Hua Ling |
| 2020 | Rhythmic Convergence in Canadian French Varieties? Svetlana Kaminskaïa |
| 2020 | Risk Forecasting from Earnings Calls Acoustics and Network Correlations. Ramit Sawhney, Arshiya Aggarwal, Piyush Khanna, Puneet Mathur, Taru Jain, Rajiv Ratn Shah |
| 2020 | Robust Beam Search for Encoder-Decoder Attention Based Speech Recognition Without Length Bias. Wei Zhou, Ralf Schlüter, Hermann Ney |
| 2020 | Robust Pitch Regression with Voiced/Unvoiced Classification in Nonstationary Noise Environments. Dung N. Tran, Uros Batricevic, Kazuhito Koishida |
| 2020 | Robust Raw Waveform Speech Recognition Using Relevance Weighted Representations. Purvi Agrawal, Sriram Ganapathy |
| 2020 | Robust Text-Dependent Speaker Verification via Character-Level Information Preservation for the SdSV Challenge 2020. Sung Hwan Mun, Woo Hyun Kang, Min Hyun Han, Nam Soo Kim |
| 2020 | S2IGAN: Speech-to-Image Generation via Adversarial Learning. Xinsheng Wang, Tingting Qiao, Jihua Zhu, Alan Hanjalic, Odette Scharenborg |
| 2020 | SAN-M: Memory Equipped Self-Attention for End-to-End Speech Recognition. Zhifu Gao, Shiliang Zhang, Ming Lei, Ian McLoughlin |
| 2020 | SCADA: Stochastic, Consistent and Adversarial Data Augmentation to Improve ASR. Gary Wang, Andrew Rosenberg, Zhehuai Chen, Yu Zhang, Bhuvana Ramabhadran, Pedro J. Moreno |
| 2020 | SEANet: A Multi-Modal Speech Enhancement Network. Marco Tagliasacchi, Yunpeng Li, Karolis Misiunas, Dominik Roblek |
| 2020 | SERIL: Noise Adaptive Speech Enhancement Using Regularization-Based Incremental Learning. Chi-Chang Lee, Yu-Chen Lin, Hsuan-Tien Lin, Hsin-Min Wang, Yu Tsao |
| 2020 | STC-Innovation Speaker Recognition Systems for Far-Field Speaker Verification Challenge 2020. Aleksei Gusev, Vladimir Volokhov, Alisa Vinogradova, Tseren Andzhukaev, Andrey Shulipa, Sergey Novoselov, Timur Pekhovsky, Alexander Kozlov |
| 2020 | Scaling Processes of Clause Chains in Pitjantjatjara. Rebecca Defina, Catalina Torres, Hywel Stoakes |
| 2020 | Scaling Up Online Speech Recognition Using ConvNets. Vineel Pratap, Qiantong Xu, Jacob Kahn, Gilad Avidov, Tatiana Likhomanenko, Awni Y. Hannun, Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert |
| 2020 | SdSV Challenge 2020: Large-Scale Evaluation of Short-Duration Speaker Verification. Hossein Zeinali, Kong Aik Lee, Jahangir Alam, Lukás Burget |
| 2020 | Secondary Phonetic Cues in the Production of the Nasal Short-a System in California English. Georgia Zellou, Rebecca Scarborough, Renee Kemp |
| 2020 | Seeing Voices and Hearing Voices: Learning Discriminative Embeddings Using Cross-Modal Self-Supervision. Soo-Whan Chung, Hong-Goo Kang, Joon Son Chung |
| 2020 | Segment Aggregation for Short Utterances Speaker Verification Using Raw Waveforms. Seung-Bin Kim, Jee-weon Jung, Hye-jin Shim, Ju-ho Kim, Ha-Jin Yu |
| 2020 | Segment-Level Effects of Gender, Nationality and Emotion Information on Text-Independent Speaker Verification. Kai Li, Masato Akagi, Yibo Wu, Jianwu Dang |
| 2020 | Self-Attention Encoding and Pooling for Speaker Recognition. Pooyan Safari, Miquel India, Javier Hernando |
| 2020 | Self-Attentive Similarity Measurement Strategies in Speaker Diarization. Qingjian Lin, Yu Hou, Ming Li |
| 2020 | Self-Distillation for Improving CTC-Transformer-Based ASR Systems. Takafumi Moriya, Tsubasa Ochiai, Shigeki Karita, Hiroshi Sato, Tomohiro Tanaka, Takanori Ashihara, Ryo Masumura, Yusuke Shinohara, Marc Delcroix |
| 2020 | Self-Expressing Autoencoders for Unsupervised Spoken Term Discovery. Saurabhchand Bhati, Jesús Villalba, Piotr Zelasko, Najim Dehak |
| 2020 | Self-Supervised Adversarial Multi-Task Learning for Vocoder-Based Monaural Speech Enhancement. Zhihao Du, Ming Lei, Jiqing Han, Shiliang Zhang |
| 2020 | Self-Supervised Contrastive Learning for Unsupervised Phoneme Segmentation. Felix Kreuk, Joseph Keshet, Yossi Adi |
| 2020 | Self-Supervised Pre-Training with Acoustic Configurations for Replay Spoofing Detection. Hye-jin Shim, Hee-Soo Heo, Jee-weon Jung, Ha-Jin Yu |
| 2020 | Self-Supervised Representations Improve End-to-End Speech Translation. Anne Wu, Changhan Wang, Juan Miguel Pino, Jiatao Gu |
| 2020 | Self-Supervised Spoofing Audio Detection Scheme. Ziyue Jiang, Hongcheng Zhu, Li Peng, Wenbing Ding, Yanzhen Ren |
| 2020 | Self-Training for End-to-End Speech Translation. Juan Miguel Pino, Qiantong Xu, Xutai Ma, Mohammad Javad Dousti, Yun Tang |
| 2020 | Self-and-Mixed Attention Decoder with Deep Acoustic Structure for Transformer-Based LVCSR. Xinyuan Zhou, Grandee Lee, Emre Yilmaz, Yanhua Long, Jiaen Liang, Haizhou Li |
| 2020 | Semantic Complexity in End-to-End Spoken Language Understanding. Joseph P. McKenna, Samridhi Choudhary, Michael Saxon, Grant P. Strimel, Athanasios Mouchtaris |
| 2020 | Semantic Mask for Transformer Based End-to-End Speech Recognition. Chengyi Wang, Yu Wu, Yujiao Du, Jinyu Li, Shujie Liu, Liang Lu, Shuo Ren, Guoli Ye, Sheng Zhao, Ming Zhou |
| 2020 | Semi-Supervised ASR by End-to-End Self-Training. Yang Chen, Weiran Wang, Chao Wang |
| 2020 | Semi-Supervised End-to-End ASR via Teacher-Student Learning with Conditional Posterior Distribution. Zi-qiang Zhang, Yan Song, Jianshu Zhang, Ian McLoughlin, Li-Rong Dai |
| 2020 | Semi-Supervised Learning for Character Expression of Spoken Dialogue Systems. Kenta Yamamoto, Koji Inoue, Tatsuya Kawahara |
| 2020 | Semi-Supervised Learning for Multi-Speaker Text-to-Speech Synthesis Using Discrete Speech Representation. Tao Tu, Yuan-Jui Chen, Alexander H. Liu, Hung-yi Lee |
| 2020 | Semi-Supervised Learning with Data Augmentation for End-to-End ASR. Felix Weninger, Franco Mana, Roberto Gemello, Jesús Andrés-Ferrer, Puming Zhan |
| 2020 | Semi-Supervised Self-Produced Speech Enhancement and Suppression Based on Joint Source Modeling of Air- and Body-Conducted Signals Using Variational Autoencoder. Shogo Seki, Moe Takada, Tomoki Toda |
| 2020 | Sentence Level Estimation of Psycholinguistic Norms Using Joint Multidimensional Annotations. Anil Ramakrishna, Shrikanth Narayanan |
| 2020 | Separating Varying Numbers of Sources with Auxiliary Autoencoding Loss. Yi Luo, Nima Mesgarani |
| 2020 | Sequence-Level Self-Learning with Multiple Hypotheses. Ken'ichi Kumatani, Dimitrios Dimitriadis, Yashesh Gaur, Robert Gmyr, Sefik Emre Eskimez, Jinyu Li, Michael Zeng |
| 2020 | Sequence-to-Sequence Articulatory Inversion Through Time Convolution of Sub-Band Frequency Signals. Abdolreza Sabzi Shahrebabaki, Sabato Marco Siniscalchi, Giampiero Salvi, Torbjørn Svendsen |
| 2020 | Serialized Output Training for End-to-End Overlapped Speech Recognition. Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Takuya Yoshioka |
| 2020 | Shadowability Annotation with Fine Granularity on L2 Utterances and its Improvement with Native Listeners' Script-Shadowing. Zhenchao Lin, Ryo Takashima, Daisuke Saito, Nobuaki Minematsu, Noriko Nakanishi |
| 2020 | Should we Hard-Code the Recurrence Concept or Learn it Instead ? Exploring the Transformer Architecture for Audio-Visual Speech Recognition. George Sterpu, Christian Saam, Naomi Harte |
| 2020 | Shouted Speech Compensation for Speaker Verification Robust to Vocal Effort Conditions. Santi Prieto, Alfonso Ortega Giménez, Iván López-Espejo, Eduardo Lleida |
| 2020 | Siamese Convolutional Neural Network Using Gaussian Probability Feature for Spoofing Speech Detection. Zhenchun Lei, Yingen Yang, Changhong Liu, Jihua Ye |
| 2020 | Siamese X-Vector Reconstruction for Domain Adapted Speaker Recognition. Shai Rozenberg, Hagai Aronowitz, Ron Hoory |
| 2020 | Similarity-and-Independence-Aware Beamformer: Method for Target Source Extraction Using Magnitude Spectrogram as Reference. Atsuo Hiroe |
| 2020 | Simulating Realistically-Spatialised Simultaneous Speech Using Video-Driven Speaker Detection and the CHiME-5 Dataset. Jack Deadman, Jon Barker |
| 2020 | Simultaneous Conversion of Speaker Identity and Emotion Based on Multiple-Domain Adaptive RBM. Takuya Kishida, Shin Tsukamoto, Toru Nakashika |
| 2020 | Singing Synthesis: With a Little Help from my Attention. Orazio Angelini, Alexis Moinet, Kayoko Yanagisawa, Thomas Drugman |
| 2020 | Singing Voice Extraction with Attention-Based Spectrograms Fusion. Hao Shi, Longbiao Wang, Sheng Li, Chenchen Ding, Meng Ge, Nan Li, Jianwu Dang, Hiroshi Seki |
| 2020 | Single Headed Attention Based Sequence-to-Sequence Model for State-of-the-Art Results on Switchboard. Zoltán Tüske, George Saon, Kartik Audhkhasi, Brian Kingsbury |
| 2020 | Single-Channel Blind Direct-to-Reverberation Ratio Estimation Using Masking. Wolfgang Mack, Shuwen Deng, Emanuël A. P. Habets |
| 2020 | Single-Channel Speech Enhancement by Subspace Affinity Minimization. Dung N. Tran, Kazuhito Koishida |
| 2020 | SkipConvNet: Skip Convolutional Neural Network for Speech Dereverberation Using Optimally Smoothed Spectral Mapping. Vinay Kothapally, Wei Xia, Shahram Ghorbani, John H. L. Hansen, Wei Xue, Jing Huang |
| 2020 | Small-Footprint Keyword Spotting with Multi-Scale Temporal Convolution. Ximin Li, Xiaodong Wei, Xiaowei Qin |
| 2020 | Smart Tube: A Biofeedback System for Vocal Training and Therapy Through Tube Phonation. Naoko Kawamura, Tatsuya Kitamura, Kenta Hamada |
| 2020 | SoapBox Labs Fluency Assessment Platform for Child Speech. Amelia C. Kelly, Eleni Karamichali, Armin Saeb, Karel Veselý, Nicholas Parslow, Gloria Montoya Gomez, Agape Deng, Arnaud Letondor, Niall Mullally, Adrian Hempel, Robert O'Regan, Qiru Zhou |
| 2020 | Soapbox Labs Verification Platform for Child Speech. Amelia C. Kelly, Eleni Karamichali, Armin Saeb, Karel Veselý, Nicholas Parslow, Agape Deng, Arnaud Letondor, Robert O'Regan, Qiru Zhou |
| 2020 | Social and Functional Pressures in Vocal Alignment: Differences for Human and Voice-AI Interlocutors. Georgia Zellou, Michelle Cohn |
| 2020 | Sound Event Localization and Detection Based on Multiple DOA Beamforming and Multi-Task Learning. Wei Xue, Ying Tong, Chao Zhang, Guohong Ding, Xiaodong He, Bowen Zhou |
| 2020 | Sound-Image Grounding Based Focusing Mechanism for Efficient Automatic Spoken Language Acquisition. Mingxin Zhang, Tomohiro Tanaka, Wenxin Hou, Shengzhou Gao, Takahiro Shinozaki |
| 2020 | SpEx+: A Complete Time Domain Speaker Extraction Network. Meng Ge, Chenglin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang, Haizhou Li |
| 2020 | Sparse Mixture of Local Experts for Efficient Speech Enhancement. Aswin Sivaraman, Minje Kim |
| 2020 | Sparseness-Aware DOA Estimation with Majorization Minimization. Masahito Togami, Robin Scheibler |
| 2020 | Spatial Covariance Matrix Estimation for Reverberant Speech with Application to Speech Enhancement. Ran Weisman, Vladimir Tourbabin, Paul Calamia, Boaz Rafaely |
| 2020 | Spatial Resolution of Early Reflection for Speech and White Noise. Xiaoli Zhong, Hao Song, Xuejie Liu |
| 2020 | Speaker Adaptive Training for Speech Recognition Based on Attention-Over-Attention Mechanism. Genshun Wan, Jia Pan, Qingran Wang, Jianqing Gao, Zhongfu Ye |
| 2020 | Speaker Attribution with Voice Profiles by Graph-Based Semi-Supervised Learning. Jixuan Wang, Xiong Xiao, Jian Wu, Ranjani Ramamurthy, Frank Rudzicz, Michael Brudno |
| 2020 | Speaker Code Based Speaker Adaptive Training Using Model Agnostic Meta-Learning. Huaxin Wu, Genshun Wan, Jia Pan |
| 2020 | Speaker Conditional WaveRNN: Towards Universal Neural Vocoder for Unseen Speaker and Recording Conditions. Dipjyoti Paul, Yannis Pantazis, Yannis Stylianou |
| 2020 | Speaker Conditioned Acoustic-to-Articulatory Inversion Using x-Vectors. Aravind Illa, Prasanta Kumar Ghosh |
| 2020 | Speaker Dependent Acoustic-to-Articulatory Inversion Using Real-Time MRI of the Vocal Tract. Tamás Gábor Csapó |
| 2020 | Speaker Dependent Articulatory-to-Acoustic Mapping Using Real-Time MRI of the Vocal Tract. Tamás Gábor Csapó |
| 2020 | Speaker Diarization System Based on DPCA Algorithm for Fearless Steps Challenge Phase-2. Xueshuai Zhang, Wenchao Wang, Pengyuan Zhang |
| 2020 | Speaker Discrimination in Humans and Machines: Effects of Speaking Style Variability. Amber Afshan, Jody Kreiman, Abeer Alwan |
| 2020 | Speaker Identification for Household Scenarios with Self-Attention and Adversarial Training. Ruirui Li, Jyun-Yu Jiang, Xian Wu, Chu-Cheng Hsieh, Andreas Stolcke |
| 2020 | Speaker Re-Identification with Speaker Dependent Speech Enhancement. Yanpei Shi, Qiang Huang, Thomas Hain |
| 2020 | Speaker Representation Learning Using Global Context Guided Channel and Time-Frequency Transformations. Wei Xia, John H. L. Hansen |
| 2020 | Speaker and Phoneme-Aware Speech Bandwidth Extension with Residual Dual-Path Network. Nana Hou, Chenglin Xu, Van Tung Pham, Joey Tianyi Zhou, Eng Siong Chng, Haizhou Li |
| 2020 | Speaker-Aware Linear Discriminant Analysis in Speaker Verification. Naijun Zheng, Xixin Wu, Jinghua Zhong, Xunying Liu, Helen Meng |
| 2020 | Speaker-Aware Monaural Speech Separation. Jiahao Xu, Kun Hu, Chang Xu, Tran Duc Chung, Zhiyong Wang |
| 2020 | Speaker-Conditional Chain Model for Speech Separation and Extraction. Jing Shi, Jiaming Xu, Yusuke Fujita, Shinji Watanabe, Bo Xu |
| 2020 | Speaker-Independent Mel-Cepstrum Estimation from Articulator Movements Using D-Vector Input. Kouichi Katsurada, Korin Richmond |
| 2020 | Speaker-Utterance Dual Attention for Speaker and Utterance Verification. Tianchi Liu, Rohan Kumar Das, Maulik C. Madhavi, Shengmei Shen, Haizhou Li |
| 2020 | Speaking Speed Control of End-to-End Speech Synthesis Using Sentence-Level Conditioning. Jae-Sung Bae, Hanbin Bae, Young-Sun Joo, Junmo Lee, Gyeong-Hoon Lee, Hoon-Young Cho |
| 2020 | SpecMark: A Spectral Watermarking Framework for IP Protection of Speech Recognition Systems. Huili Chen, Bita Darvish Rouhani, Farinaz Koushanfar |
| 2020 | SpecSwap: A Simple Data Augmentation Method for End-to-End Speech Recognition. Xingcheng Song, Zhiyong Wu, Yiheng Huang, Dan Su, Helen Meng |
| 2020 | Spectral Moment and Duration of Burst of Plosives in Speech of Children with Hearing Impairment and Typically Developing Children - A Comparative Study. Ajish K. Abraham, M. Pushpavathi, N. Sreedevi, A. Navya, Vikram C. Mathad, S. R. Mahadeva Prasanna |
| 2020 | Spectrum Correction: Acoustic Scene Classification with Mismatched Recording Devices. Michal Kosmider |
| 2020 | Speech Clarity Improvement by Vocal Self-Training Using a Hearing Impairment Simulator and its Correlation with an Auditory Modulation Index. Toshio Irino, Soichi Higashiyama, Hanako Yoshigi |
| 2020 | Speech Driven Talking Head Generation via Attentional Landmarks Based Representation. Wentao Wang, Yan Wang, Jianqing Sun, Qingsong Liu, Jiaen Liang, Teng Li |
| 2020 | Speech Emotion Recognition 'in the Wild' Using an Autoencoder. Vipula Dissanayake, Haimo Zhang, Mark Billinghurst, Suranga Nanayakkara |
| 2020 | Speech Emotion Recognition with Discriminative Feature Learning. Huan Zhou, Kai Liu |
| 2020 | Speech Enhancement Based on Beamforming and Post-Filtering by Combining Phase Information. Rui Cheng, Changchun Bao |
| 2020 | Speech Enhancement with Stochastic Temporal Convolutional Networks. Julius Richter, Guillaume Carbajal, Timo Gerkmann |
| 2020 | Speech Pseudonymisation Assessment Using Voice Similarity Matrices. Paul-Gauthier Noé, Jean-François Bonastre, Driss Matrouf, Natalia A. Tomashenko, Andreas Nautsch, Nicholas W. D. Evans |
| 2020 | Speech Rate Task-Specific Representation Learning from Acoustic-Articulatory Data. Renuka Mannem, Hima Jyothi R., Aravind Illa, Prasanta Kumar Ghosh |
| 2020 | Speech Recognition and Multi-Speaker Diarization of Long Conversations. Huanru Henry Mao, Shuyang Li, Julian J. McAuley, Garrison W. Cottrell |
| 2020 | Speech Representation Learning for Emotion Recognition Using End-to-End ASR with Factorized Adaptation. Sung-Lin Yeh, Yun-Shao Lin, Chi-Chun Lee |
| 2020 | Speech Sentiment and Customer Satisfaction Estimation in Socialbot Conversations. Yelin Kim, Joshua Levy, Yang Liu |
| 2020 | Speech Separation Based on Multi-Stage Elaborated Dual-Path Deep BiLSTM with Auxiliary Identity Loss. Ziqiang Shi, Rujie Liu, Jiqing Han |
| 2020 | Speech Spectrogram Estimation from Intracranial Brain Activity Using a Quantization Approach. Miguel Angrick, Christian Herff, Garett D. Johnson, Jerry J. Shih, Dean J. Krusienski, Tanja Schultz |
| 2020 | Speech Transformer with Speaker Aware Persistent Memory. Yingzhu Zhao, Chongjia Ni, Cheung-Chi Leung, Shafiq R. Joty, Eng Siong Chng, Bin Ma |
| 2020 | Speech to Semantics: Improve ASR and NLU Jointly via All-Neural Interfaces. Milind Rao, Anirudh Raju, Pranav Dheram, Bach Bui, Ariya Rastrow |
| 2020 | Speech to Text Adaptation: Towards an Efficient Cross-Modal Distillation. Won-Ik Cho, Donghyun Kwak, Ji Won Yoon, Nam Soo Kim |
| 2020 | Speech-Image Semantic Alignment Does Not Depend on Any Prior Classification Tasks. Masood S. Mortazavi |
| 2020 | Speech-XLNet: Unsupervised Acoustic Model Pretraining for Self-Attention Networks. Xingchen Song, Guangsen Wang, Yiheng Huang, Zhiyong Wu, Dan Su, Helen Meng |
| 2020 | Speech-to-Singing Conversion Based on Boundary Equilibrium GAN. Da-Yi Wu, Yi-Hsuan Yang |
| 2020 | SpeechBERT: An Audio-and-Text Jointly Learned Language Model for End-to-End Spoken Question Answering. Yung-Sung Chuang, Chi-Liang Liu, Hung-yi Lee, Lin-Shan Lee |
| 2020 | SpeechMix - Augmenting Deep Sound Recognition Using Hidden Space Interpolations. Amit Jindal, Narayanan Elavathur Ranganatha, Aniket Didolkar, Arijit Ghosh Chowdhury, Di Jin, Ramit Sawhney, Rajiv Ratn Shah |
| 2020 | SpeedySpeech: Efficient Neural Speech Synthesis. Jan Vainer, Ondrej Dusek |
| 2020 | Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition. Zhengkun Tian, Jiangyan Yi, Jianhua Tao, Ye Bai, Shuai Zhang, Zhengqi Wen |
| 2020 | Spoken Content and Voice Factorization for Few-Shot Speaker Adaptation. Tao Wang, Jianhua Tao, Ruibo Fu, Jiangyan Yi, Zhengqi Wen, Rongxiu Zhong |
| 2020 | Spoken Language 'Grammatical Error Correction'. Yiting Lu, Mark J. F. Gales, Yu Wang |
| 2020 | Spoofing Attack Detection Using the Non-Linear Fusion of Sub-Band Classifiers. Hemlata Tak, Jose Patino, Andreas Nautsch, Nicholas W. D. Evans, Massimiliano Todisco |
| 2020 | Spot the Conversation: Speaker Diarisation in the Wild. Joon Son Chung, Jaesung Huh, Arsha Nagrani, Triantafyllos Afouras, Andrew Zisserman |
| 2020 | Spotting the Traces of Depression in Read Speech: An Approach Based on Computational Paralinguistics and Social Signal Processing. Fuxiang Tao, Anna Esposito, Alessandro Vinciarelli |
| 2020 | Squeeze for Sneeze: Compact Neural Networks for Cold and Flu Recognition. Merlin Albes, Zhao Ren, Björn W. Schuller, Nicholas Cummins |
| 2020 | Stacked 1D Convolutional Networks for End-to-End Small Footprint Voice Trigger Detection. Takuya Higuchi, Mohammad Ghasemzadeh, Kisun You, Chandra Dhir |
| 2020 | Staged Knowledge Distillation for End-to-End Dysarthric Speech Recognition and Speech Attribute Transcription. Yuqin Lin, Longbiao Wang, Sheng Li, Jianwu Dang, Chenchen Ding |
| 2020 | State Sequence Pooling Training of Acoustic Models for Keyword Spotting. Kuba Lopatka, Tobias Bocklet |
| 2020 | Statistical Testing on ASR Performance via Blockwise Bootstrap. Zhe Liu, Fuchun Peng |
| 2020 | Statistical and Neural Network Based Speech Activity Detection in Non-Stationary Acoustic Environments. Jens Heitkaemper, Joerg Schmalenstroeer, Reinhold Haeb-Umbach |
| 2020 | StoRIR: Stochastic Room Impulse Response Generation for Audio Data Augmentation. Piotr Masztalski, Mateusz Matuszewski, Karol Piaskowski, Michal Romaniuk |
| 2020 | Stochastic Convolutional Recurrent Networks for Language Modeling. Jen-Tzung Chien, Yu-Min Huang |
| 2020 | Stochastic Curiosity Exploration for Dialogue Systems. Jen-Tzung Chien, Po-Chien Hsu |
| 2020 | Stochastic Talking Face Generation Using Latent Distribution Matching. Ravindra Yadav, Ashish Sardana, Vinay P. Namboodiri, Rajesh M. Hegde |
| 2020 | Strategies for End-to-End Text-Independent Speaker Verification. Weiwei Lin, Man-Wai Mak, Jen-Tzung Chien |
| 2020 | StrawNet: Self-Training WaveNet for TTS in Low-Data Regimes. Manish Sharma, Tom Kenter, Rob Clark |
| 2020 | Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition. Shiliang Zhang, Zhifu Gao, Haoneng Luo, Ming Lei, Jie Gao, Zhijie Yan, Lei Xie |
| 2020 | Streaming Keyword Spotting on Mobile Devices. Oleg Rybakov, Natasha Kononenko, Niranjan Subrahmanya, Mirkó Visontai, Stella Laurenzo |
| 2020 | Streaming On-Device End-to-End ASR System for Privacy-Sensitive Voice-Typing. Abhinav Garg, Gowtham P. Vadisetti, Dhananjaya Gowda, Sichen Jin, Aditya Jayasimha, Youngho Han, Jiyeon Kim, Junmo Park, Kwangyoun Kim, SooYeon Kim, Young-Yoon Lee, Kyungbo Min, Chanwoo Kim |
| 2020 | Streaming Transformer-Based Acoustic Models Using Self-Attention with Augmented Memory. Chunyang Wu, Yongqiang Wang, Yangyang Shi, Ching-Feng Yeh, Frank Zhang |
| 2020 | Style Attuned Pre-Training and Parameter Efficient Fine-Tuning for Spoken Language Understanding. Jin Cao, Jun Wang, Wael Hamza, Kelly Vanee, Shang-wen Li |
| 2020 | Style Variation as a Vantage Point for Code-Switching. Khyathi Raghavi Chandu, Alan W. Black |
| 2020 | Sub-Band Knowledge Distillation Framework for Speech Enhancement. Xiang Hao, Shixue Wen, Xiangdong Su, Yun Liu, Guanglai Gao, Xiaofei Li |
| 2020 | Subband Kalman Filtering with DNN Estimated Parameters for Speech Enhancement. Hongjiang Yu, Wei-Ping Zhu, Benoît Champagne |
| 2020 | Subjective Quality Evaluation of Speech Signals Transmitted via BPL-PLC Wired System. Przemyslaw Falkowski-Gilski, Grzegorz Debita, Marcin Habrych, Bogdan Miedzinski, Przemyslaw Jedlikowski, Bartosz Polnik, Jan Wandzio, Xin Wang |
| 2020 | Subword Regularization: An Analysis of Scalability and Generalization for End-to-End Automatic Speech Recognition. Egor Lakomkin, Jahn Heymann, Ilya Sklyar, Simon Wiesler |
| 2020 | Successes, Challenges and Opportunities for Speech Technology in Conversational Agents. Shehzad Mevawalla |
| 2020 | Sum-Product Networks for Robust Automatic Speaker Identification. Aaron Nicolson, Kuldip K. Paliwal |
| 2020 | Supervised Domain Adaptation for Text-Independent Speaker Verification Using Limited Data. Seyyed Saeed Sarfjoo, Srikanth R. Madikeri, Petr Motlícek, Sébastien Marcel |
| 2020 | Surfboard: Audio Feature Extraction for Modern Machine Learning. Raphael Lenain, Jack Weston, Abhishek Shivkumar, Emil Fristed |
| 2020 | Surgical Mask Detection with Convolutional Neural Networks and Data Augmentations on Spectrograms. Steffen Illium, Robert Müller, Andreas Sedlmeier, Claudia Linnhoff-Popien |
| 2020 | Surgical Mask Detection with Deep Recurrent Phonetic Models. Philipp Klumpp, Tomás Arias-Vergara, Juan Camilo Vásquez-Correa, Paula Andrea Pérez-Toro, Florian Hönig, Elmar Nöth, Juan Rafael Orozco-Arroyave |
| 2020 | THUEE System for NIST SRE19 CTS Challenge. Ruyun Li, Tianyu Liang, Dandan Song, Yi Liu, Yangcheng Wu, Can Xu, Peng Ouyang, Xianwei Zhang, Xianhong Chen, Weiqiang Zhang, Shouyi Yin, Liang He |
| 2020 | TMT: A Transformer-Based Modal Translator for Improving Multimodal Sequence Representations in Audio Visual Scene-Aware Dialog. Wubo Li, Dongwei Jiang, Wei Zou, Xiangang Li |
| 2020 | TTS Skins: Speaker Conversion via ASR. Adam Polyak, Lior Wolf, Yaniv Taigman |
| 2020 | Tackling the ADReSS Challenge: A Multimodal Approach to the Automated Recognition of Alzheimer's Dementia. Matej Martinc, Senja Pollak |
| 2020 | Target-Speaker Voice Activity Detection: A Novel Approach for Multi-Speaker Diarization in a Dinner Party Scenario. Ivan Medennikov, Maxim Korenevsky, Tatiana Prisyach, Yuri Y. Khokhlov, Mariya Korenevskaya, Ivan Sorokin, Tatiana Timofeeva, Anton Mitrofanov, Andrei Andrusenko, Ivan Podluzhny, Aleksandr Laptev, Aleksei Romanenko |
| 2020 | Targeted Content Feedback in Spoken Language Learning and Assessment. Xinhao Wang, Klaus Zechner, Christopher Hamill |
| 2020 | Task-Oriented Dialog Generation with Enhanced Entity Representation. Zhenhao He, Jiachun Wang, Jian Chen |
| 2020 | Temporal Attention Convolutional Network for Speech Emotion Recognition with Latent Representation. Jiaxing Liu, Zhilei Liu, Longbiao Wang, Yuan Gao, Lili Guo, Jianwu Dang |
| 2020 | Testing the Limits of Representation Mixing for Pronunciation Correction in End-to-End Speech Synthesis. Jason Fong, Jason Taylor, Simon King |
| 2020 | Text-Independent Speaker Verification with Dual Attention Network. Jingyu Li, Tan Lee |
| 2020 | That Sounds Familiar: An Analysis of Phonetic Representations Transfer Across Languages. Piotr Zelasko, Laureano Moro-Velázquez, Mark Hasegawa-Johnson, Odette Scharenborg, Najim Dehak |
| 2020 | The "Sound of Silence" in EEG - Cognitive Voice Activity Detection. Rini A. Sharon, Hema A. Murthy |
| 2020 | The Acoustic Realization of Mandarin Tones in Fast Speech. Ping Tang, Shanpeng Li |
| 2020 | The Attacker's Perspective on Automatic Speaker Verification: An Overview. Rohan Kumar Das, Xiaohai Tian, Tomi Kinnunen, Haizhou Li |
| 2020 | The DKU Speech Activity Detection and Speaker Identification Systems for Fearless Steps Challenge Phase-02. Qingjian Lin, Tingle Li, Ming Li |
| 2020 | The Different Enhancement Roles of Covarying Cues in Thai and Mandarin Tones. Nari Rhee, Jianjing Kuang |
| 2020 | The Effect of Input on the Production of English Tense and Lax Vowels by Chinese Learners: Evidence from an Elementary School in China. Mengrou Li, Ying Chen, Jie Cui |
| 2020 | The Effect of Language Dominance on the Selective Attention of Segments and Tones in Urdu-Cantonese Speakers. Yi Liu, Jinghong Ning |
| 2020 | The Effect of Language Proficiency on the Perception of Segmental Foreign Accent. Rubén Pérez Ramón, María Luisa García Lecumberri, Martin Cooke |
| 2020 | The INESC-ID Multi-Modal System for the ADReSS 2020 Challenge. Anna Pompili, Thomas Rolland, Alberto Abad |
| 2020 | The INTERSPEECH 2020 Computational Paralinguistics Challenge: Elderly Emotion, Breathing & Masks. Björn W. Schuller, Anton Batliner, Christian Bergler, Eva-Maria Messner, Antonia F. de C. Hamilton, Shahin Amiriparian, Alice Baird, Georgios Rizos, Maximilian Schmitt, Lukas Stappen, Harald Baumeister, Alexis Deighton MacIntyre, Simone Hantke |
| 2020 | The INTERSPEECH 2020 Deep Noise Suppression Challenge: Datasets, Subjective Testing Framework, and Challenge Results. Chandan K. A. Reddy, Vishak Gopal, Ross Cutler, Ebrahim Beyrami, Roger Cheng, Harishchandra Dubey, Sergiy Matusevych, Robert Aichner, Ashkan Aazami, Sebastian Braun, Puneet Rana, Sriram Srinivasan, Johannes Gehrke |
| 2020 | The INTERSPEECH 2020 Far-Field Speaker Verification Challenge. Xiaoyi Qin, Ming Li, Hui Bu, Wei Rao, Rohan Kumar Das, Shrikanth Narayanan, Haizhou Li |
| 2020 | The Implication of Sound Level on Spatial Selective Auditory Attention for Cochlear Implant Users: Behavioral and Electrophysiological Measurement. Sara Akbarzadeh, Sungmin Lee, Chin-Tuan Tan |
| 2020 | The Importance of Time-Frequency Averaging for Binaural Speaker Localization in Reverberant Environments. Hanan Beit-On, Vladimir Tourbabin, Boaz Rafaely |
| 2020 | The JD AI Speaker Verification System for the FFSVC 2020 Challenge. Ying Tong, Wei Xue, Shanluo Huang, Fan Lu, Chao Zhang, Guohong Ding, Xiaodong He |
| 2020 | The MSP-Conversation Corpus. Luz Martinez-Lucas, Mohammed Abdelwahab, Carlos Busso |
| 2020 | The Method of Random Directions Optimization for Stereo Audio Source Separation. Oleg Golokolenko, Gerald Schuller |
| 2020 | The NTNU System at the Interspeech 2020 Non-Native Children's Speech ASR Challenge. Tien-Hong Lo, Fu-An Chao, Shi-Yan Weng, Berlin Chen |
| 2020 | The Phonetic Bases of Vocal Expressed Emotion: Natural versus Acted. Hira Dhamyal, Shahan Ali Memon, Bhiksha Raj, Rita Singh |
| 2020 | The Phonology and Phonetics of Kaifeng Mandarin Vowels. Lei Wang |
| 2020 | The Privacy ZEBRA: Zero Evidence Biometric Recognition Assessment. Andreas Nautsch, Jose Patino, Natalia A. Tomashenko, Junichi Yamagishi, Paul-Gauthier Noé, Jean-François Bonastre, Massimiliano Todisco, Nicholas W. D. Evans |
| 2020 | The TalTech Systems for the Short-Duration Speaker Verification Challenge 2020. Tanel Alumäe, Jörgen Valk |
| 2020 | The XMUSPEECH System for Short-Duration Speaker Verification Challenge 2020. Tao Jiang, Miao Zhao, Lin Li, Qingyang Hong |
| 2020 | The XMUSPEECH System for the AP19-OLR Challenge. Zheng Li, Miao Zhao, Jing Li, Yiming Zhi, Lin Li, Qingyang Hong |
| 2020 | The Zero Resource Speech Challenge 2020: Discovering Discrete Subword and Word Units. Ewan Dunbar, Julien Karadayi, Mathieu Bernard, Xuan-Nga Cao, Robin Algayres, Lucas Ondel, Laurent Besacier, Sakriani Sakti, Emmanuel Dupoux |
| 2020 | The cognitive status of simple and complex models. Janet B. Pierrehumbert |
| 2020 | Time-Domain Target-Speaker Speech Separation with Waveform-Based Speaker Embedding. Jianshu Zhao, Shengzhou Gao, Takahiro Shinozaki |
| 2020 | TinyLSTMs: Efficient Neural Speech Enhancement for Hearing Aids. Igor Fedorov, Marko Stamenovic, Carl Jensen, Li-Chia Yang, Ari Mandell, Yiming Gan, Matthew Mattina, Paul N. Whatmough |
| 2020 | To BERT or not to BERT: Comparing Speech and Language-Based Approaches for Alzheimer's Disease Detection. Aparna Balagopalan, Benjamin Eyre, Frank Rudzicz, Jekaterina Novikova |
| 2020 | Tone Learning in Low-Resource Bilingual TTS. Ruolan Liu, Xue Wen, Chunhui Lu, Xiao Chen |
| 2020 | Tone Variations in Regionally Accented Mandarin. Yanping Li, Catherine T. Best, Michael D. Tyler, Denis Burnham |
| 2020 | Tongue and Lip Motion Patterns in Alaryngeal Speech. Kristin J. Teplansky, Alan Wisler, Beiming Cao, Wendy Liang, Chad W. Whited, Ted Mau, Jun Wang |
| 2020 | Toward Remote Patient Monitoring of Speech, Video, Cognitive and Respiratory Biomarkers Using Multimodal Dialog Technology. Vikram Ramanarayanan, Oliver Roesler, Michael Neumann, David Pautler, Doug Habberstad, Andrew Cornish, Hardik Kothare, Vignesh Murali, Jackson Liscombe, Dirk Schnelle-Walka, Patrick L. Lange, David Suendermann-Oeft |
| 2020 | Toward Silent Paralinguistics: Speech-to-EMG - Retrieving Articulatory Muscle Activity from Speech. Catarina Botelho, Lorenz Diener, Dennis Küster, Kevin Scheck, Shahin Amiriparian, Björn W. Schuller, Tanja Schultz, Alberto Abad, Isabel Trancoso |
| 2020 | Towards Automatic Assessment of Voice Disorders: A Clinical Approach. Purva Barche, Krishna Gurugubelli, Anil Kumar Vuppala |
| 2020 | Towards Context-Aware End-to-End Code-Switching Speech Recognition. Zimeng Qiu, Yiyuan Li, Xinjian Li, Florian Metze, William M. Campbell |
| 2020 | Towards Interpreting Deep Learning Models to Understand Loss of Speech Intelligibility in Speech Disorders - Step 1: CNN Model-Based Phone Classification. Sondes Abderrazek, Corinne Fredouille, Alain Ghio, Muriel Lalain, Christine Meunier, Virginie Woisard |
| 2020 | Towards Learning a Universal Non-Semantic Representation of Speech. Joel Shor, Aren Jansen, Ronnie Maor, Oran Lang, Omry Tuval, Félix de Chaumont Quitry, Marco Tagliasacchi, Ira Shavitt, Dotan Emanuel, Yinnon Haviv |
| 2020 | Towards Natural Bilingual and Code-Switched Speech Synthesis Based on Mix of Monolingual Recordings and Cross-Lingual Voice Conversion. Shengkui Zhao, Trung Hieu Nguyen, Hao Wang, Bin Ma |
| 2020 | Towards Silent Paralinguistics: Deriving Speaking Mode and Speaker ID from Electromyographic Signals. Lorenz Diener, Shahin Amiriparian, Catarina Botelho, Kevin Scheck, Dennis Küster, Isabel Trancoso, Björn W. Schuller, Tanja Schultz |
| 2020 | Towards Speech Robustness for Acoustic Scene Classification. Shuo Liu, Andreas Triantafyllopoulos, Zhao Ren, Björn W. Schuller |
| 2020 | Towards Universal Text-to-Speech. Jingzhou Yang, Lei He |
| 2020 | Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner Party Transcription. Andrei Andrusenko, Aleksandr Laptev, Ivan Medennikov |
| 2020 | Towards a Comprehensive Assessment of Speech Intelligibility for Pathological Speech. Wei Xue, Viviana Mendoza Ramos, Wieke Harmsen, Catia Cucchiarini, R. W. N. M. van Hout, Helmer Strik |
| 2020 | Towards an ASR Error Robust Spoken Language Understanding System. Weitong Ruan, Yaroslav Nechaev, Luoxin Chen, Chengwei Su, Imre Kiss |
| 2020 | Training Keyword Spotting Models on Non-IID Data with Federated Learning. Andrew Hard, Kurt Partridge, Cameron Nguyen, Niranjan Subrahmanya, Aishanee Shah, Pai Zhu, Ignacio López-Moreno, Rajiv Mathews |
| 2020 | Training Speaker Enrollment Models by Network Optimization. Victoria Mingote, Antonio Miguel, Alfonso Ortega Giménez, Eduardo Lleida |
| 2020 | Transfer Learning Approaches for Streaming End-to-End Speech Recognition System. Vikas Joshi, Rui Zhao, Rupesh R. Mehta, Kshitiz Kumar, Jinyu Li |
| 2020 | Transfer Learning for Improving Singing-Voice Detection in Polyphonic Instrumental Music. Yuanbo Hou, Frank K. Soong, Jian Luan, Shengchen Li |
| 2020 | Transfer Learning of Articulatory Information Through Phone Information. Abdolreza Sabzi Shahrebabaki, Negar Olfati, Sabato Marco Siniscalchi, Giampiero Salvi, Torbjørn Svendsen |
| 2020 | Transfer Learning of the Expressivity Using FLOW Metric Learning in Multispeaker Text-to-Speech Synthesis. Ajinkya Kulkarni, Vincent Colotte, Denis Jouvet |
| 2020 | Transferring Source Style in Non-Parallel Voice Conversion. Songxiang Liu, Yuewen Cao, Shiyin Kang, Na Hu, Xunying Liu, Dan Su, Dong Yu, Helen Meng |
| 2020 | Transformer VQ-VAE for Unsupervised Unit Discovery and Speech Synthesis: ZeroSpeech 2020 Challenge. Andros Tjandra, Sakriani Sakti, Satoshi Nakamura |
| 2020 | Transformer with Bidirectional Decoder for Speech Recognition. Xi Chen, Songyang Zhang, Dandan Song, Peng Ouyang, Shouyi Yin |
| 2020 | Transformer-Based Long-Context End-to-End Speech Recognition. Takaaki Hori, Niko Moritz, Chiori Hori, Jonathan Le Roux |
| 2020 | Transliteration Based Data Augmentation for Training Multilingual ASR Acoustic Models in Low Resource Settings. Samuel Thomas, Kartik Audhkhasi, Brian Kingsbury |
| 2020 | Two Different Mechanisms of Movable Mandible for Vocal-Tract Model with Flexible Tongue. Takayuki Arai |
| 2020 | Two-Stage Polyphonic Sound Event Detection Based on Faster R-CNN-LSTM with Multi-Token Connectionist Temporal Classification. In Young Park, Hong Kook Kim |
| 2020 | U-Net Based Direct-Path Dominance Test for Robust Direction-of-Arrival Estimation. Hao Wang, Kai Chen, Jing Lu |
| 2020 | UNSW System Description for the Shared Task on Automatic Speech Recognition for Non-Native Children's Speech. Mostafa Ali Shahin, Renée Lu, Julien Epps, Beena Ahmed |
| 2020 | Ultrasound-Based Articulatory-to-Acoustic Mapping with WaveGlow Speech Synthesis. Tamás Gábor Csapó, Csaba Zainkó, László Tóth, Gábor Gosztolya, Alexandra Markó |
| 2020 | Uncertainty-Aware Machine Support for Paper Reviewing on the Interspeech 2019 Submission Corpus. Lukas Stappen, Georgios Rizos, Madina Hasan, Thomas Hain, Björn W. Schuller |
| 2020 | UncommonVoice: A Crowdsourced Dataset of Dysphonic Speech. Meredith Moore, Piyush Papreja, Michael Saxon, Visar Berisha, Sethuraman Panchanathan |
| 2020 | Unconditional Audio Generation with Generative Adversarial Networks and Cycle Regularization. Jen-Yu Liu, Yu-Hua Chen, Yin-Cheng Yeh, Yi-Hsuan Yang |
| 2020 | Understanding Racial Disparities in Automatic Speech Recognition: The Case of Habitual "be". Joshua L. Martin, Kevin Tang |
| 2020 | Understanding Self-Attention of Self-Supervised Audio Transformers. Shu-Wen Yang, Andy T. Liu, Hung-yi Lee |
| 2020 | Understanding the Effect of Voice Quality and Accent on Talker Similarity. Anurag Das, Guanlong Zhao, John Levis, Evgeny Chukharev-Hudilainen, Ricardo Gutierrez-Osuna |
| 2020 | Universal Adversarial Attacks on Spoken Language Assessment Systems. Vyas Raina, Mark J. F. Gales, Kate M. Knill |
| 2020 | Universal Speech Transformer. Yingzhu Zhao, Chongjia Ni, Cheung-Chi Leung, Shafiq R. Joty, Eng Siong Chng, Bin Ma |
| 2020 | Unsupervised Acoustic Unit Representation Learning for Voice Conversion Using WaveNet Auto-Encoders. Mingjie Chen, Thomas Hain |
| 2020 | Unsupervised Audio Source Separation Using Generative Priors. Vivek Sivaraman Narayanaswamy, Jayaraman J. Thiagarajan, Rushil Anirudh, Andreas Spanias |
| 2020 | Unsupervised Cross-Domain Singing Voice Conversion. Adam Polyak, Lior Wolf, Yossi Adi, Yaniv Taigman |
| 2020 | Unsupervised Discovery of Recurring Speech Patterns Using Probabilistic Adaptive Metrics. Okko Räsänen, María Andrea Cruz Blandón |
| 2020 | Unsupervised Domain Adaptation Under Label Space Mismatch for Speech Classification. Akhil Mathur, Nadia Berthouze, Nicholas D. Lane |
| 2020 | Unsupervised Domain Adaptation for Dialogue Sequence Labeling Based on Hierarchical Adversarial Training. Shota Orihashi, Mana Ihori, Tomohiro Tanaka, Ryo Masumura |
| 2020 | Unsupervised Feature Adaptation Using Adversarial Multi-Task Training for Automatic Evaluation of Children's Speech. Richeng Duan, Nancy F. Chen |
| 2020 | Unsupervised Learning for Sequence-to-Sequence Text-to-Speech for Low-Resource Languages. Haitong Zhang, Yue Lin |
| 2020 | Unsupervised Methods for Evaluating Speech Representations. Michael Gump, Wei-Ning Hsu, James R. Glass |
| 2020 | Unsupervised Regularization-Based Adaptive Training for Speech Recognition. Fenglin Ding, Wu Guo, Bin Gu, Zhen-Hua Ling, Jun Du |
| 2020 | Unsupervised Robust Speech Enhancement Based on Alpha-Stable Fast Multichannel Nonnegative Matrix Factorization. Mathieu Fontaine, Kouhei Sekiguchi, Aditya Arie Nugraha, Kazuyoshi Yoshii |
| 2020 | Unsupervised Subword Modeling Using Autoregressive Pretraining and Cross-Lingual Phone-Aware Modeling. Siyuan Feng, Odette Scharenborg |
| 2020 | Unsupervised Training of Siamese Networks for Speaker Verification. Umair Khan, Javier Hernando |
| 2020 | Unsupervised vs. Transfer Learning for Multimodal One-Shot Matching of Speech and Images. Leanne Nortje, Herman Kamper |
| 2020 | Using Cyclic Noise as the Source Signal for Neural Source-Filter-Based Speech Waveform Model. Xin Wang, Junichi Yamagishi |
| 2020 | Using Silence MR Image to Synthesise Dynamic MRI Vocal Tract Data of CV. Ioannis K. Douros, Ajinkya Kulkarni, Chrysanthi Dourou, Yu Xie, Jacques Felblinger, Karyna Isaieva, Pierre-André Vuissoz, Yves Laprie |
| 2020 | Using Speaker-Aligned Graph Memory Block in Multimodally Attentive Emotion Recognition Network. Jeng-Lin Li, Chi-Chun Lee |
| 2020 | Using Speech Enhancement Preprocessing for Speech Emotion Recognition in Realistic Noisy Conditions. Hengshun Zhou, Jun Du, Yanhui Tu, Chin-Hui Lee |
| 2020 | Using State of the Art Speaker Recognition and Natural Language Processing Technologies to Detect Alzheimer's Disease and Assess its Severity. Raghavendra Pappagari, Jaejin Cho, Laureano Moro-Velázquez, Najim Dehak |
| 2020 | Utterance Confidence Measure for End-to-End Speech Recognition with Applications to Distributed Speech Recognition Scenarios. Ankur Kumar, Sachin Singh, Dhananjaya Gowda, Abhinav Garg, Shatrughan Singh, Chanwoo Kim |
| 2020 | Utterance Invariant Training for Hybrid Two-Pass End-to-End Speech Recognition. Dhananjaya Gowda, Ankur Kumar, Kwangyoun Kim, Hejung Yang, Abhinav Garg, Sachin Singh, Jiyeon Kim, Mehul Kumar, Sichen Jin, Shatrughan Singh, Chanwoo Kim |
| 2020 | Utterance-Wise Meeting Transcription System Using Asynchronous Distributed Microphones. Shota Horiguchi, Yusuke Fujita, Kenji Nagamatsu |
| 2020 | VCTUBE : A Library for Automatic Speech Data Annotation. Seong Choi, Seunghoon Jeong, Jeewoo Yoon, Migyeong Yang, Minsam Ko, Eunil Park, Jinyoung Han, Munyoung Lee, Seonghee Lee |
| 2020 | VOP Detection in Variable Speech Rate Condition. Ayush Agarwal, Jagabandhu Mishra, S. R. Mahadeva Prasanna |
| 2020 | VQVC+: One-Shot Voice Conversion by Vector Quantization and U-Net Architecture. Da-Yi Wu, Yen-Hao Chen, Hung-yi Lee |
| 2020 | Variable Frame Rate-Based Data Augmentation to Handle Speaking-Style Variability for Automatic Speaker Verification. Amber Afshan, Jinxi Guo, Soo Jin Park, Vijay Ravi, Alan McCree, Abeer Alwan |
| 2020 | Variation in Spectral Slope and Interharmonic Noise in Cantonese Tones. Phil Rose |
| 2020 | Vector Quantized Temporally-Aware Correspondence Sparse Autoencoders for Zero-Resource Acoustic Unit Discovery. Batuhan Gündogdu, Bolaji Yusuf, Mansur Yesilbursa, Murat Saraclar |
| 2020 | Vector-Based Attentive Pooling for Text-Independent Speaker Verification. Yanfeng Wu, Chenkai Guo, Hongcan Gao, Xiaolei Hou, Jing Xu |
| 2020 | Vector-Quantized Autoregressive Predictive Coding. Yu-An Chung, Hao Tang, James R. Glass |
| 2020 | Vector-Quantized Neural Networks for Acoustic Unit Discovery in the ZeroSpeech 2020 Challenge. Benjamin van Niekerk, Leanne Nortje, Herman Kamper |
| 2020 | Very Short-Term Conflict Intensity Estimation Using Fisher Vectors. Gábor Gosztolya |
| 2020 | Virtual Acoustic Channel Expansion Based on Neural Networks for Weighted Prediction Error-Based Speech Dereverberation. Joon-Young Yang, Joon-Hyuk Chang |
| 2020 | Visual Speech In Real Noisy Environments (VISION): A Novel Benchmark Dataset and Deep Learning-Based Baseline System. Mandar Gogate, Kia Dashtipour, Amir Hussain |
| 2020 | VocGAN: A High-Fidelity Real-Time Vocoder with a Hierarchically-Nested Adversarial Network. Jinhyeok Yang, Junmo Lee, Young-Ik Kim, Hoon-Young Cho, Injung Kim |
| 2020 | Vocal Markers from Sustained Phonation in Huntington's Disease. Rachid Riad, Hadrien Titeux, Laurie Lemoine, Justine Montillot, Jennifer Hamet Bagnou, Xuan-Nga Cao, Emmanuel Dupoux, Anne-Catherine Bachoud-Lévi |
| 2020 | Vocoder-Based Speech Synthesis from Silent Videos. Daniel Michelsanti, Olga Slizovskaia, Gloria Haro, Emilia Gómez, Zheng-Hua Tan, Jesper Jensen |
| 2020 | Voice Activity Detection in the Wild via Weakly Supervised Sound Event Detection. Yefei Chen, Heinrich Dinkel, Mengyue Wu, Kai Yu |
| 2020 | Voice Conversion Based Data Augmentation to Improve Children's Speech Recognition in Limited Data Scenario. S. Shahnawazuddin, Nagaraj Adiga, Kunal Kumar, Aayushi Poddar, Waquar Ahmad |
| 2020 | Voice Conversion Using Speech-to-Speech Neuro-Style Transfer. Ehab A. AlBadawy, Siwei Lyu |
| 2020 | Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using Transformer with Text-to-Speech Pretraining. Wen-Chin Huang, Tomoki Hayashi, Yi-Chiao Wu, Hirokazu Kameoka, Tomoki Toda |
| 2020 | VoiceFilter-Lite: Streaming Targeted Voice Separation for On-Device Speech Recognition. Quan Wang, Ignacio López-Moreno, Mert Saglam, Kevin W. Wilson, Alan Chiao, Renjie Liu, Yanzhang He, Wei Li, Jason Pelecanos, Marily Nika, Alexander Gruenstein |
| 2020 | VoiceID on the Fly: A Speaker Recognition System that Learns from Scratch. Baihan Lin, Xinxin Zhang |
| 2020 | Voicing Distinction of Obstruents in the Hangzhou Wu Chinese Dialect. Yang Yue, Fang Hu |
| 2020 | WG-WaveNet: Real-Time High-Fidelity Speech Synthesis Without GPU. Po-Chun Hsu, Hung-yi Lee |
| 2020 | WISE: Word-Level Interaction-Based Multimodal Fusion for Speech Emotion Recognition. Guang Shen, Riwei Lai, Rui Chen, Yu Zhang, Kejia Zhang, Qilong Han, Hongtao Song |
| 2020 | Wake Word Detection with Alignment-Free Lattice-Free MMI. Yiming Wang, Hang Lv, Daniel Povey, Lei Xie, Sanjeev Khudanpur |
| 2020 | Wav2Spk: A Simple DNN Architecture for Learning Speaker Embeddings from Waveforms. Wei-Wei Lin, Man-Wai Mak |
| 2020 | Weak-Attention Suppression for Transformer Based Speech Recognition. Yangyang Shi, Yongqiang Wang, Chunyang Wu, Christian Fuegen, Frank Zhang, Duc Le, Ching-Feng Yeh, Michael L. Seltzer |
| 2020 | Weakly Supervised Training of Hierarchical Attention Networks for Speaker Identification. Yanpei Shi, Qiang Huang, Thomas Hain |
| 2020 | What Does an End-to-End Dialect Identification Model Learn About Non-Dialectal Information? Shammur A. Chowdhury, Ahmed Ali, Suwon Shon, James R. Glass |
| 2020 | What the Future Brings: Investigating the Impact of Lookahead for Incremental Neural TTS. Brooke Stephenson, Laurent Besacier, Laurent Girin, Thomas Hueber |
| 2020 | Whisper Activity Detection Using CNN-LSTM Based Attention Pooling Network Trained for a Speaker Identification Task. Abinay Reddy Naini, Malla Satyapriya, Prasanta Kumar Ghosh |
| 2020 | Whisper Augmented End-to-End/Hybrid Speech Recognition System - CycleGAN Approach. Prithvi R. R. Gudepu, Gowtham P. Vadisetti, Abhishek Niranjan, Kinnera Saranu, Raghava Sarma, M. Ali Basha Shaik, Periyasamy Paramasivam |
| 2020 | Whistled Vowel Identification by French Listeners. Anaïs Tran Ngoc, Julien Meyer, Fanny Meunier |
| 2020 | Why Did the x-Vector System Miss a Target Speaker? Impact of Acoustic Mismatch Upon Target Score on VoxCeleb Data. Rosa González Hautamäki, Tomi Kinnunen |
| 2020 | Word Error Rate Estimation Without ASR Output: e-WER2. Ahmed Ali, Steve Renals |
| 2020 | X-TaSNet: Robust and Accurate Time-Domain Speaker Extraction Network. Zining Zhang, Bingsheng He, Zhenjie Zhang |
| 2020 | X-Vector Singular Value Modification and Statistical-Based Decomposition with Ensemble Regression Modeling for Speaker Anonymization System. Candy Olivia Mawalim, Kasorn Galajit, Jessada Karnjana, Masashi Unoki |
| 2020 | XiaoiceSing: A High-Quality and Integrated Singing Voice Synthesis System. Peiling Lu, Jie Wu, Jian Luan, Xu Tan, Li Zhou |
| 2020 | g2pM: A Neural Grapheme-to-Phoneme Conversion Package for Mandarin Chinese Based on a New Open Benchmark Dataset. Kyubyong Park, Seanie Lee |
| 2020 | iMetricGAN: Intelligibility Enhancement for Speech-in-Noise Using Generative Adversarial Network-Based Metric Learning. Haoyu Li, Szu-Wei Fu, Yu Tsao, Junichi Yamagishi |
| 2020 | x-Vectors Meet Adversarial Attacks: Benchmarking Adversarial Robustness in Speaker Verification. Jesús Villalba, Yuekai Zhang, Najim Dehak |