INTERSPEECH A

1036 papers

YearTitle / Authors
2020"This is Houston. Say again, please". The Behavox System for the Apollo-11 Fearless Steps Challenge (Phase II).
Arseniy Gorin, Daniil Kulko, Steven Grima, Alex Glasman
20201-D Row-Convolution LSTM: Fast Streaming ASR at Accuracy Parity with LC-BLSTM.
Kshitiz Kumar, Chaojun Liu, Yifan Gong, Jian Wu
202021st Annual Conference of the International Speech Communication Association, Interspeech 2020, Virtual Event, Shanghai, China, October 25-29, 2020.
Helen Meng, Bo Xu, Thomas Fang Zheng
2020A 43 Language Multilingual Punctuation Prediction Neural Network Model.
Xinxing Li, Edward Lin
2020A Comparative Re-Assessment of Feature Extractors for Deep Speaker Embeddings.
Xuechen Liu, Md. Sahidullah, Tomi Kinnunen
2020A Comparative Study of Speech Anonymization Metrics.
Mohamed Maouche, Brij Mohan Lal Srivastava, Nathalie Vauquier, Aurélien Bellet, Marc Tommasi, Emmanuel Vincent
2020A Comparison of Acoustic and Linguistics Methodologies for Alzheimer's Dementia Recognition.
Nicholas Cummins, Yilin Pan, Zhao Ren, Julian Fritsch, Venkata Srikanth Nallanthighal, Heidi Christensen, Daniel Blackburn, Björn W. Schuller, Mathew Magimai-Doss, Helmer Strik, Aki Härmä
2020A Comparison of English Rhythm Produced by Native American Speakers and Mandarin ESL Primary School Learners.
Hongwei Ding, Binghuai Lin, Liyuan Wang, Hui Wang, Ruomei Fang
2020A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning.
Sameer Khurana, Antoine Laurent, Wei-Ning Hsu, Jan Chorowski, Adrian Lancucki, Ricard Marxer, James R. Glass
2020A Cross-Channel Attention-Based Wave-U-Net for Multi-Channel Speech Enhancement.
Minh Tri Ho, Jinyoung Lee, Bong-Ki Lee, Dong Hoon Yi, Hong-Goo Kang
2020A Cyclical Post-Filtering Approach to Mismatch Refinement of Neural Vocoder for Text-to-Speech Systems.
Yi-Chiao Wu, Patrick Lumban Tobing, Kazuki Yasuhara, Noriyuki Matsunaga, Yamato Ohtani, Tomoki Toda
2020A DNN-HMM-DNN Hybrid Model for Discovering Word-Like Units from Spoken Captions and Image Regions.
Liming Wang, Mark Hasegawa-Johnson
2020A Deep 2D Convolutional Network for Waveform-Based Speech Recognition.
Dino Oglic, Zoran Cvetkovic, Peter Bell, Steve Renals
2020A Deep Learning Approach to Active Noise Control.
Hao Zhang, DeLiang Wang
2020A Deep Learning-Based Kalman Filter for Speech Enhancement.
Sujan Kumar Roy, Aaron Nicolson, Kuldip K. Paliwal
2020A Differentiable Perceptual Audio Metric Learned from Just Noticeable Differences.
Pranay Manocha, Adam Finkelstein, Richard Zhang, Nicholas J. Bryan, Gautham J. Mysore, Zeyu Jin
2020A Dynamic 3D Pronunciation Teaching Model Based on Pronunciation Attributes and Anatomy.
Xiaoli Feng, Yanlu Xie, Yayue Deng, Boxue Li
2020A Federated Approach in Training Acoustic Models.
Dimitrios Dimitriadis, Ken'ichi Kumatani, Robert Gmyr, Yashesh Gaur, Sefik Emre Eskimez
2020A Hybrid HMM-Waveglow Based Text-to-Speech Synthesizer Using Histogram Equalization for Low Resource Indian Languages.
Mano Ranjith Kumar M., Sudhanshu Srivastava, Anusha Prakash, Hema A. Murthy
2020A Joint Framework for Audio Tagging and Weakly Supervised Acoustic Event Detection Using DenseNet with Global Average Pooling.
Chieh-Chi Kao, Bowen Shi, Ming Sun, Chao Wang
2020A Lightweight Model Based on Separable Convolution for Speech Emotion Recognition.
Ying Zhong, Ying Hu, Hao Huang, Wushour Silamu
2020A Low Latency ASR-Free End to End Spoken Language Understanding System.
Mohamed Mhiri, Samuel Myer, Vikrant Singh Tomar
2020A Machine of Few Words: Interactive Speaker Recognition with Reinforcement Learning.
Mathieu Seurin, Florian Strub, Philippe Preux, Olivier Pietquin
2020A Mandarin L2 Learning APP with Mispronunciation Detection and Feedback.
Yanlu Xie, Xiaoli Feng, Boxue Li, Jinsong Zhang, Yujia Jin
2020A Mask-Based Model for Mandarin Chinese Polyphone Disambiguation.
Haiteng Zhang, Huashan Pan, Xiulin Li
2020A Multi-Scale Fusion Framework for Bimodal Speech Emotion Recognition.
Ming Chen, Xudong Zhao
2020A New Training Pipeline for an Improved Neural Transducer.
Albert Zeyer, André Merboldt, Ralf Schlüter, Hermann Ney
2020A Noise Robust Technique for Detecting Vowels in Speech Signals.
Avinash Kumar, S. Shahnawazuddin, Waquar Ahmad
2020A Noise-Aware Memory-Attention Network Architecture for Regression-Based Speech Enhancement.
Yu-Xuan Wang, Jun Du, Li Chai, Chin-Hui Lee, Jia Pan
2020A Perceptual Study of the Five Level Tones in Hmu (Xinzhai Variety).
Wen Liu
2020A Perceptually-Motivated Approach for Low-Complexity, Real-Time Enhancement of Fullband Speech.
Jean-Marc Valin, Umut Isik, Neerad Phansalkar, Ritwik Giri, Karim Helwani, Arvindh Krishnaswamy
2020A Pyramid Recurrent Network for Predicting Crowdsourced Speech-Quality Ratings of Real-World Signals.
Xuan Dong, Donald S. Williamson
2020A Real-Time Robot-Based Auxiliary System for Risk Evaluation of COVID-19 Infection.
Wenqi Wei, Jianzong Wang, Jiteng Ma, Ning Cheng, Jing Xiao
2020A Recursive Network with Dynamic Attention for Monaural Speech Enhancement.
Andong Li, Chengshi Zheng, Cunhang Fan, Renhua Peng, Xiaodong Li
2020A Robust and Cascaded Acoustic Echo Cancellation Based on Deep Learning.
Chenggang Zhang, Xueliang Zhang
2020A Semi-Blind Source Separation Approach for Speech Dereverberation.
Ziteng Wang, Yueyue Na, Zhang Liu, Yun Li, Biao Tian, Qiang Fu
2020A Sound Engineering Approach to Near End Listening Enhancement.
Carol Chermaz, Simon King
2020A Space-and-Speaker-Aware Iterative Mask Estimation Approach to Multi-Channel Speech Recognition in the CHiME-6 Challenge.
Yanhui Tu, Jun Du, Lei Sun, Feng Ma, Jia Pan, Chin-Hui Lee
2020A Transformer-Based Audio Captioning Model with Keyword Estimation.
Yuma Koizumi, Ryo Masumura, Kyosuke Nishida, Masahiro Yasuda, Shoichiro Saito
2020A Unified Framework for Low-Latency Speaker Extraction in Cocktail Party Environments.
Yunzhe Hao, Jiaming Xu, Jing Shi, Peng Zhang, Lei Qin, Bo Xu
2020ARET: Aggregated Residual Extended Time-Delay Neural Networks for Speaker Verification.
Ruiteng Zhang, Jianguo Wei, Wenhuan Lu, Longbiao Wang, Meng Liu, Lin Zhang, Jiayu Jin, Junhai Xu
2020ARVC: An Auto-Regressive Voice Conversion System Without Parallel Training Data.
Zheng Lian, Zhengqi Wen, Xinyong Zhou, Songbai Pu, Shengkai Zhang, Jianhua Tao
2020ASAPP-ASR: Multistream CNN and Self-Attentive SRU for SOTA Speech Recognition.
Jing Pan, Joshua Shapiro, Jeremy Wohlwend, Kyu Jeong Han, Tao Lei, Tao Ma
2020ASR Error Correction with Augmented Transformer for Entity Retrieval.
Haoyu Wang, Shuyan Dong, Yue Liu, James Logan, Ashish Kumar Agrawal, Yang Liu
2020ASR-Based Evaluation and Feedback for Individualized Reading Practice.
Yu Bai, Ferdy Hubers, Catia Cucchiarini, Helmer Strik
2020ASR-Free Pronunciation Assessment.
Sitong Cheng, Zhixin Liu, Lantian Li, Zhiyuan Tang, Dong Wang, Thomas Fang Zheng
2020ATCSpeech: A Multilingual Pilot-Controller Speech Corpus from Real Air Traffic Control Environment.
Bo Yang, Xianlong Tan, Zhengmao Chen, Bing Wang, Min Ruan, Dan Li, Zhongping Yang, Xiping Wu, Yi Lin
2020ATReSN-Net: Capturing Attentive Temporal Relations in Semantic Neighborhood for Acoustic Scene Classification.
Liwen Zhang, Jiqing Han, Ziqiang Shi
2020Abstractive Spoken Document Summarization Using Hierarchical Model with Multi-Stage Attention Diversity Optimization.
Potsawee Manakul, Mark J. F. Gales, Linlin Wang
2020Accurate Detection of Wake Word Start and End Using a CNN.
Christin Jose, Yuriy Mishchenko, Thibaud Sénéchal, Anish Shah, Alex Escott, Shiv Naga Prasad Vitaladevuni
2020Achieving Multi-Accent ASR via Unsupervised Acoustic Model Adaptation.
M. A. Tugtekin Turan, Emmanuel Vincent, Denis Jouvet
2020Acoustic Feature Extraction with Interpretable Deep Neural Network for Neurodegenerative Related Disorder Classification.
Yilin Pan, Bahman Mirheidari, Zehai Tu, Ronan O'Malley, Traci Walker, Annalena Venneri, Markus Reuber, Daniel Blackburn, Heidi Christensen
2020Acoustic Properties of Strident Fricatives at the Edges: Implications for Consonant Discrimination.
Louis-Marie Lorin, Lorenzo Maselli, Léo Varnet, Maria Giavazzi
2020Acoustic Scene Analysis with Multi-Head Attention Networks.
Weimin Wang, Weiran Wang, Ming Sun, Chao Wang
2020Acoustic Scene Classification Using Audio Tagging.
Jee-weon Jung, Hye-jin Shim, Ju-ho Kim, Seung-Bin Kim, Ha-Jin Yu
2020Acoustic Signal Enhancement Using Relative Harmonic Coefficients: Spherical Harmonics Domain Approach.
Yonggang Hu, Prasanga N. Samarasinghe, Thushara D. Abhayapala
2020Acoustic-Based Articulatory Phenotypes of Amyotrophic Lateral Sclerosis and Parkinson's Disease: Towards an Interpretable, Hypothesis-Driven Framework of Motor Control.
Hannah P. Rowe, Sarah E. Gutz, Marc F. Maffei, Jordan R. Green
2020Acoustic-to-Articulatory Inversion with Deep Autoregressive Articulatory-WaveNet.
Narjes Bozorg, Michael T. Johnson
2020Adaptive Compressive Onset-Enhancement for Improved Speech Intelligibility in Noise and Reverberation.
Felicitas Bederna, Henning F. Schepker, Christian Rollwage, Simon Doclo, Arne Pusch, Jörg Bitzer, Jan Rennies
2020Adaptive Domain-Aware Representation Learning for Speech Emotion Recognition.
Weiquan Fan, Xiangmin Xu, Xiaofen Xing, Dongyan Huang
2020Adaptive Neural Speech Enhancement with a Denoising Variational Autoencoder.
Yoshiaki Bando, Kouhei Sekiguchi, Kazuyoshi Yoshii
2020Adaptive Speaker Normalization for CTC-Based Speech Recognition.
Fenglin Ding, Wu Guo, Bin Gu, Zhen-Hua Ling, Jun Du
2020Advancing Multiple Instance Learning with Attention Modeling for Categorical Speech Emotion Recognition.
Shuiyang Mao, P. C. Ching, C.-C. Jay Kuo, Tan Lee
2020Adventitious Respiratory Classification Using Attentive Residual Neural Networks.
Zijiang Yang, Shuo Liu, Meishu Song, Emilia Parada-Cabaleiro, Björn W. Schuller
2020Adversarial Audio: A New Information Hiding Method.
Yehao Kong, Jiliang Zhang
2020Adversarial Dictionary Learning for Monaural Speech Enhancement.
Yunyun Ji, Longting Xu, Wei-Ping Zhu
2020Adversarial Domain Adaptation for Speaker Verification Using Partially Shared Network.
Zhengyang Chen, Shuai Wang, Yanmin Qian
2020Adversarial Latent Representation Learning for Speech Enhancement.
Yuanhang Qiu, Ruili Wang
2020Adversarial Separation Network for Speaker Recognition.
Hanyi Zhang, Longbiao Wang, Yunchun Zhang, Meng Liu, Kong Aik Lee, Jianguo Wei
2020Adversarial Separation and Adaptation Network for Far-Field Speaker Verification.
Lu Yi, Man-Wai Mak
2020Adversarially Trained Multi-Singer Sequence-to-Sequence Singing Synthesizer.
Jie Wu, Jian Luan
2020Affective Conditioning on Hierarchical Attention Networks Applied to Depression Detection from Transcribed Clinical Interviews.
Danai Xezonaki, Georgios Paraskevopoulos, Alexandros Potamianos, Shrikanth Narayanan
2020Age-Related Differences of Tone Perception in Mandarin-Speaking Seniors.
Yan Feng, Gang Peng, William Shi-Yuan Wang
2020Air-Tissue Boundary Segmentation in Real Time Magnetic Resonance Imaging Video Using 3-D Convolutional Neural Network.
Renuka Mannem, Navaneetha Gaddam, Prasanta Kumar Ghosh
2020All-in-One Transformer: Unifying Speech Recognition, Audio Tagging, and Event Detection.
Niko Moritz, Gordon Wichern, Takaaki Hori, Jonathan Le Roux
2020Alzheimer's Dementia Recognition Through Spontaneous Speech: The ADReSS Challenge.
Saturnino Luz, Fasih Haider, Sofia de la Fuente, Davida Fromm, Brian MacWhinney
2020An Acoustic Segment Model Based Segment Unit Selection Approach to Acoustic Scene Classification with Partial Utterances.
Hu Hu, Sabato Marco Siniscalchi, Yannan Wang, Xue Bai, Jun Du, Chin-Hui Lee
2020An Adaptive X-Vector Model for Text-Independent Speaker Verification.
Bin Gu, Wu Guo, Fenglin Ding, Zhen-Hua Ling, Jun Du
2020An Alternative to MFCCs for ASR.
Pegah Ghahramani, Hossein Hadian, Daniel Povey, Hynek Hermansky, Sanjeev Khudanpur
2020An Analysis of Prosodic Prominence Cues to Information Structure in Egyptian Arabic.
Dina El Zarka, Anneliese Kelterer, Barbara Schuppler
2020An Audio-Based Wakeword-Independent Verification System.
Joe Wang, Rajath Kumar, Mike Rodehorst, Brian Kulis, Shiv Naga Prasad Vitaladevuni
2020An Audio-Enriched BERT-Based Framework for Spoken Multiple-Choice Question Answering.
Chia-Chih Kuo, Shang-Bao Luo, Kuan-Yu Chen
2020An Early Study on Intelligent Analysis of Speech Under COVID-19: Severity, Sleep Quality, Fatigue, and Anxiety.
Jing Han, Kun Qian, Meishu Song, Zijiang Yang, Zhao Ren, Shuo Liu, Juan Liu, Huaiyuan Zheng, Wei Ji, Tomoya Koike, Xiao Li, Zixing Zhang, Yoshiharu Yamamoto, Björn W. Schuller
2020An Effective Domain Adaptive Post-Training Method for BERT in Response Selection.
Taesun Whang, Dongyub Lee, Chanhee Lee, Kisu Yang, Dongsuk Oh, Heuiseok Lim
2020An Effective End-to-End Modeling Approach for Mispronunciation Detection.
Tien-Hong Lo, Shi-Yan Weng, Hsiu-Jui Chang, Berlin Chen
2020An Effective Perturbation Based Semi-Supervised Learning Method for Sound Event Detection.
Xu Zheng, Yan Song, Jie Yan, Li-Rong Dai, Ian McLoughlin, Lin Liu
2020An Effective Speaker Recognition Method Based on Joint Identification and Verification Supervisions.
Ying Liu, Yan Song, Yiheng Jiang, Ian McLoughlin, Lin Liu, Li-Rong Dai
2020An Efficient Subband Linear Prediction for LPCNet-Based Neural Synthesis.
Yang Cui, Xi Wang, Lei He, Frank K. Soong
2020An Efficient Temporal Modeling Approach for Speech Emotion Recognition by Mapping Varied Duration Sentences into Fixed Number of Chunks.
Wei-Cheng Lin, Carlos Busso
2020An End-to-End Architecture of Online Multi-Channel Speech Separation.
Jian Wu, Zhuo Chen, Jinyu Li, Takuya Yoshioka, Zhili Tan, Ed Lin, Yi Luo, Lei Xie
2020An End-to-End Mispronunciation Detection System for L2 English Speech Leveraging Novel Anti-Phone Modeling.
Bi-Cheng Yan, Meng-Che Wu, Hsiao-Tsung Hung, Berlin Chen
2020An Evaluation of Manual and Semi-Automatic Laughter Annotation.
Bogdan Ludusan, Petra Wagner
2020An Evaluation of the Effect of Anxiety on Speech - Computational Prediction of Anxiety from Sustained Vowels.
Alice Baird, Nicholas Cummins, Sebastian Schnieder, Jarek Krajewski, Björn W. Schuller
2020An Interactive Adversarial Reward Learning-Based Spoken Language Understanding System.
Yu Wang, Yilin Shen, Hongxia Jin
2020An Investigation of Cross-Cultural Semi-Supervised Learning for Continuous Affect Recognition.
Adria Mallol-Ragolta, Nicholas Cummins, Björn W. Schuller
2020An Investigation of Few-Shot Learning in Spoken Term Classification.
Yangbin Chen, Tom Ko, Lifeng Shang, Xiao Chen, Xin Jiang, Qing Li
2020An Investigation of Phone-Based Subword Units for End-to-End Speech Recognition.
Weiran Wang, Guangsen Wang, Aadyot Bhatnagar, Yingbo Zhou, Caiming Xiong, Richard Socher
2020An Investigation of the Target Approximation Model for Tone Modeling and Recognition in Continuous Mandarin Speech.
Yingming Gao, Xinyu Zhang, Yi Xu, Jinsong Zhang, Peter Birkholz
2020An Investigation of the Virtual Lip Trajectories During the Production of Bilabial Stops and Nasal at Different Speaking Rates.
Tilak Purohit, Prasanta Kumar Ghosh
2020An NMF-HMM Speech Enhancement Method Based on Kullback-Leibler Divergence.
Yang Xiang, Liming Shi, Jesper Lisby Højvang, Morten Højfeldt Rasmussen, Mads Græsbøll Christensen
2020An Objective Voice Gender Scoring System and Identification of the Salient Acoustic Measures.
Fuling Chen, Roberto Togneri, Murray Maybery, Diana Tan
2020An Open Source Implementation of ITU-T Recommendation P.808 with Validation.
Babak Naderi, Ross Cutler
2020An Open-Source Voice Type Classifier for Child-Centered Daylong Recordings.
Marvin Lavechin, Ruben Bousbib, Hervé Bredin, Emmanuel Dupoux, Alejandrina Cristià
2020An Unsupervised Method to Select a Speaker Subset from Large Multi-Speaker Speech Synthesis Datasets.
Pilar Oplustil Gallegos, Jennifer Williams, Joanna Rownicka, Simon King
2020An Utterance Verification System for Word Naming Therapy in Aphasia.
David S. Barbera, Mark A. Huckvale, Victoria Fleming, Emily Upton, Henry Coley-Fisher, Ian Shaw, William H. Latham, Alexander P. Leff, Jenny Crinion
2020Analysis of Disfluency in Children's Speech.
Trang Tran, Morgan Tinkler, Gary Yeung, Abeer Alwan, Mari Ostendorf
2020Analyzing Breath Signals for the Interspeech 2020 ComParE Challenge.
John Mendonça, Francisco Teixeira, Isabel Trancoso, Alberto Abad
2020Analyzing Read Aloud Speech by Primary School Pupils: Insights for Research and Development.
S. Limonard, Catia Cucchiarini, R. W. N. M. van Hout, Helmer Strik
2020Analyzing the Quality and Stability of a Streaming End-to-End On-Device Speech Recognizer.
Yuan Shangguan, Kate Knister, Yanzhang He, Ian McGraw, Françoise Beaufays
2020Angular Margin Centroid Loss for Text-Independent Speaker Recognition.
Yuheng Wei, Junzhao Du, Hui Liu
2020Anti-Aliasing Regularization in Stacking Layers.
Antoine Bruguier, Ananya Misra, Arun Narayanan, Rohit Prabhavalkar
2020Aphasic Speech Recognition Using a Mixture of Speech Intelligibility Experts.
Matthew Perez, Zakaria Aldeneh, Emily Mower Provost
2020Are Germans Better Haters Than Danes? Language-Specific Implicit Prosodies of Types of Hate Speech and How They Relate to Perceived Severity and Societal Rules.
Jana Neitsch, Oliver Niebuhr
2020Are Neural Open-Domain Dialog Systems Robust to Speech Recognition Errors in the Dialog History? An Empirical Study.
Karthik Gopalakrishnan, Behnam Hedayatnia, Longshaokan Wang, Yang Liu, Dilek Hakkani-Tür
2020Are you Wearing a Mask? Improving Mask Detection from Speech Using Augmentation by Cycle-Consistent GANs.
Nicolae-Catalin Ristea, Radu Tudor Ionescu
2020Assessment of Parkinson's Disease Medication State Through Automatic Speech Analysis.
Anna Pompili, Rubén Solera-Ureña, Alberto Abad, Rita Cardoso, Isabel Guimarães, Margherita Fabbri, Isabel P. Martins, Joaquim J. Ferreira
2020Asteroid: The PyTorch-Based Audio Source Separation Toolkit for Researchers.
Manuel Pariente, Samuele Cornell, Joris Cosentino, Sunit Sivasankaran, Efthymios Tzinis, Jens Heitkaemper, Michel Olvera, Fabian-Robert Stöter, Mathieu Hu, Juan M. Martín-Doñas, David Ditter, Ariel Frank, Antoine Deleforge, Emmanuel Vincent
2020Atss-Net: Target Speaker Separation via Attention-Based Neural Network.
Tingle Li, Qingjian Lin, Yuanyuan Bao, Ming Li
2020Attention Forcing for Speech Synthesis.
Qingyun Dou, Joshua Efiong, Mark J. F. Gales
2020Attention Wave-U-Net for Acoustic Echo Cancellation.
Jung-Hee Kim, Joon-Hyuk Chang
2020Attention and Encoder-Decoder Based Models for Transforming Articulatory Movements at Different Speaking Rates.
Abhayjeet Singh, Aravind Illa, Prasanta Kumar Ghosh
2020Attention to Indexical Information Improves Voice Recall.
Grant L. McGuire, Molly Babel
2020Attention-Based Speaker Embeddings for One-Shot Voice Conversion.
Tatsuma Ishihara, Daisuke Saito
2020Attention-Driven Projections for Soundscape Classification.
Dhanunjaya Varma Devalraju, Muralikrishna H, Padmanabhan Rajan, Dileep Aroor Dinesh
2020Attentive Convolutional Recurrent Neural Network Using Phoneme-Level Acoustic Representation for Rare Sound Event Detection.
Shreya G. Upadhyay, Bo-Hao Su, Chi-Chun Lee
2020Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based Variable-Length Embedding.
Seungwoo Choi, Seungju Han, Dongyoung Kim, Sungjoo Ha
2020Audio Dequantization for High Fidelity Audio Generation in Flow-Based Neural Vocoder.
Hyun-Wook Yoon, Sang-Hoon Lee, Hyeong-Rae Noh, Seong-Whan Lee
2020Audio-Visual Multi-Channel Recognition of Overlapped Speech.
Jianwei Yu, Bo Wu, Rongzhi Gu, Shi-Xiong Zhang, Lianwu Chen, Yong Xu, Meng Yu, Dan Su, Dong Yu, Xunying Liu, Helen Meng
2020Audio-Visual Multi-Speaker Tracking Based on the GLMB Framework.
Shoufeng Lin, Xinyuan Qian
2020Audio-Visual Speaker Recognition with a Cross-Modal Discriminative Network.
Ruijie Tao, Rohan Kumar Das, Haizhou Li
2020Audiovisual Correspondence Learning in Humans and Machines.
Venkat Krishnamohan, Akshara Soman, Anshul Gupta, Sriram Ganapathy
2020Augmenting Generative Adversarial Networks for Speech Emotion Recognition.
Siddique Latif, Muhammad Asim, Rajib Rana, Sara Khalifa, Raja Jurdak, Björn W. Schuller
2020Augmenting Images for ASR and TTS Through Single-Loop and Dual-Loop Multimodal Chain Framework.
Johanes Effendi, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
2020Augmenting Turn-Taking Prediction with Wearable Eye Activity During Conversation.
Hang Li, Siyuan Chen, Julien Epps
2020AutoSpeech 2020: The Second Automated Machine Learning Challenge for Speech Classification.
Jingsong Wang, Tom Ko, Zhen Xu, Xiawei Guo, Souxiang Liu, Wei-Wei Tu, Lei Xie
2020AutoSpeech: Neural Architecture Search for Speaker Recognition.
Shaojin Ding, Tianlong Chen, Xinyu Gong, Weiwei Zha, Zhangyang Wang
2020Autoencoder Bottleneck Features with Multi-Task Optimisation for Improved Continuous Dysarthric Speech Recognition.
Zhengjun Yue, Heidi Christensen, Jon Barker
2020Automated Screening for Alzheimer's Dementia Through Spontaneous Speech.
Muhammad Shehram Shah Syed, Zafi Sherhan Syed, Margaret Lech, Elena Pirogova
2020Automatic Analysis of Speech Prosody in Dutch.
Na Hu, Berit Janssen, Judith Hanssen, Carlos Gussenhoven, Aoju Chen
2020Automatic Assessment of Dysarthric Severity Level Using Audio-Video Cross-Modal Approach in Deep Learning.
Han Tong, Hamid R. Sharifzadeh, Ian McLoughlin
2020Automatic Detection of Accent and Lexical Pronunciation Errors in Spontaneous Non-Native English Speech.
Konstantinos Kyriakopoulos, Kate M. Knill, Mark J. F. Gales
2020Automatic Detection of Phonological Errors in Child Speech Using Siamese Recurrent Autoencoder.
Si Ioi Ng, Tan Lee
2020Automatic Discrimination of Apraxia of Speech and Dysarthria Using a Minimalistic Set of Handcrafted Features.
Ina Kodrasi, Michaela Pernon, Marina Laganaro, Hervé Bourlard
2020Automatic Estimation of Intelligibility Measure for Consonants in Speech.
Ali Abavisani, Mark Hasegawa-Johnson
2020Automatic Estimation of Pathological Voice Quality Based on Recurrent Neural Network Using Amplitude and Phase Spectrogram.
Shunsuke Hidaka, Yogaku Lee, Kohei Wakamiya, Takashi Nakagawa, Tokihiko Kaburagi
2020Automatic Glottis Detection and Segmentation in Stroboscopic Videos Using Convolutional Networks.
Divya Degala, M. V. Achuth Rao, Rahul Krishnamurthy, Pebbili Gopikishore, Veeramani Priyadharshini, Prakash T. K., Prasanta Kumar Ghosh
2020Automatic Prediction of Confidence Level from Children's Oral Reading Recordings.
Kamini Sabu, Preeti Rao
2020Automatic Prediction of Speech Intelligibility Based on X-Vectors in the Context of Head and Neck Cancer.
Sebastião Quintas, Julie Mauclair, Virginie Woisard, Julien Pinquier
2020Automatic Quality Assessment for Audio-Visual Verification Systems. The LOVe Submission to NIST SRE Challenge 2019.
Grigory Antipov, Nicolas Gengembre, Olivier Le Blouch, Gaël Le Lan
2020Automatic Scoring at Multi-Granularity for L2 Pronunciation.
Binghuai Lin, Liyuan Wang, Xiaoli Feng, Jinsong Zhang
2020Automatic Speech Recognition Benchmark for Air-Traffic Communications.
Juan Zuluaga-Gomez, Petr Motlícek, Qingran Zhan, Karel Veselý, Rudolf A. Braun
2020Automatic Speech Recognition for ILSE-Interviews: Longitudinal Conversational Speech Recordings Covering Aging and Cognitive Decline.
Ayimunishagu Abulimiti, Jochen Weiner, Tanja Schultz
2020Autosegmental Neural Nets: Should Phones and Tones be Synchronous or Asynchronous?
Jialu Li, Mark Hasegawa-Johnson
2020BLSTM-Driven Stream Fusion for Automatic Speech Recognition: Novel Methods and a Multi-Size Window Fusion Example.
Timo Lohrenz, Tim Fingscheidt
2020BUT Text-Dependent Speaker Verification System for SdSV Challenge 2020.
Alicia Lozano-Diez, Anna Silnova, Bhargav Pulugundla, Johan Rohdin, Karel Veselý, Lukás Burget, Oldrich Plchot, Ondrej Glembek, Ondrej Novotný, Pavel Matejka
2020Bandpass Noise Generation and Augmentation for Unified ASR.
Kshitiz Kumar, Bo Ren, Yifan Gong, Jian Wu
2020Bi-Encoder Transformer Network for Mandarin-English Code-Switching Speech Recognition Using Mixture of Experts.
Yizhou Lu, Mingkun Huang, Hao Li, Jiaqi Guo, Yanmin Qian
2020Bi-Level Speaker Supervision for One-Shot Speech Synthesis.
Tao Wang, Jianhua Tao, Ruibo Fu, Jiangyan Yi, Zhengqi Wen, Chunyu Qiang
2020Bidirectional LSTM Network with Ordered Neurons for Speech Enhancement.
Xiaoqi Li, Yaxing Li, Yuanjie Dong, Shan Xu, Zhihui Zhang, Dan Wang, Shengwu Xiong
2020Bilingual Acoustic Voice Variation is Similarly Structured Across Languages.
Khia A. Johnson, Molly Babel, Robert A. Fuhrman
2020BlaBla: Linguistic Feature Extraction for Clinical Analysis in Multiple Languages.
Abhishek Shivkumar, Jack Weston, Raphael Lenain, Emil Fristed
2020Black-Box Adaptation of ASR for Accented Speech.
Kartik Khandelwal, Preethi Jyothi, Abhijeet Awasthi, Sunita Sarawagi
2020Black-Box Attacks on Spoofing Countermeasures Using Transferability of Adversarial Examples.
Yuekai Zhang, Ziyan Jiang, Jesús Villalba, Najim Dehak
2020Blind Speech Signal Quality Estimation for Speaker Verification Systems.
Galina Lavrentyeva, Marina Volkova, Anastasia Avdeeva, Sergey Novoselov, Artem Gorlanov, Tseren Andzhukaev, Artem Ivanov, Alexander Kozlov
2020Brain networks enabling speech perception in everyday settings.
Barbara G. Shinn-Cunningham
2020Building a Robust Word-Level Wakeword Verification Network.
Rajath Kumar, Mike Rodehorst, Joe Wang, Jiacheng Gu, Brian Kulis
2020Bunched LPCNet: Vocoder for Low-Cost Neural Text-To-Speech Systems.
Ravichander Vipperla, Sangjun Park, Kihyun Choo, Samin Ishtiaq, Kyoungbo Min, Sourav Bhattacharya, Abhinav Mehrotra, Alberto Gil C. P. Ramos, Nicholas D. Lane
2020CAM: Uninteresting Speech Detector.
Weiyi Lu, Yi Xu, Peng Yang, Belinda Zeng
2020CAT: A CTC-CRF Based ASR Toolkit Bridging the Hybrid and the End-to-End Approaches Towards Data Efficiency and Low Latency.
Keyu An, Hongyu Xiang, Zhijian Ou
2020CATOTRON - A Neural Text-to-Speech System in Catalan.
Baybars Külebi, Alp Öktem, Alex Peiró Lilja, Santiago Pascual, Mireia Farrús
2020CSL-EMG_Array: An Open Access Corpus for EMG-to-Speech Conversion.
Lorenz Diener, Mehrdad Roustay Vishkasougheh, Tanja Schultz
2020CTC-Synchronous Training for Monotonic Attention Model.
Hirofumi Inaguma, Masato Mimura, Tatsuya Kawahara
2020CUCHILD: A Large-Scale Cantonese Corpus of Child Speech for Phonology and Articulation Assessment.
Si Ioi Ng, Cymie Wing-Yee Ng, Jiarui Wang, Tan Lee, Kathy Yuet-Sheung Lee, Michael Chi-Fai Tong
2020Can Auditory Nerve Models Tell us What's Different About WaveNet Vocoded Speech?
Sébastien Le Maguer, Naomi Harte
2020Can Speaker Augmentation Improve Multi-Speaker End-to-End TTS?
Erica Cooper, Cheng-I Lai, Yusuke Yasuda, Junichi Yamagishi
2020Caption Alignment for Low Resource Audio-Visual Data.
Vighnesh Reddy Konda, Mayur Warialani, Rakesh Prasanth Achari, Varad Bhatnagar, Jayaprakash Akula, Preethi Jyothi, Ganesh Ramakrishnan, Gholamreza Haffari, Pankaj Singh
2020Categorization of Whistled Consonants by French Speakers.
Anaïs Tran Ngoc, Julien Meyer, Fanny Meunier
2020Channel-Wise Subband Input for Better Voice and Accompaniment Separation on High Resolution Music.
Haohe Liu, Lei Xie, Jian Wu, Geng Yang
2020Characterization of Singaporean Children's English: Comparisons to American and British Counterparts Using Archetypal Analysis.
Yuling Gu, Nancy F. Chen
2020Class LM and Word Mapping for Contextual Biasing in End-to-End ASR.
Rongqing Huang, Ossama Abdel-Hamid, Xinwei Li, Gunnar Evermann
2020Classification of Manifest Huntington Disease Using Vowel Distortion Measures.
Amrit Romana, John Bandon, Noelle Carlozzi, Angela Roberts, Emily Mower Provost
2020Classify Imaginary Mandarin Tones with Cortical EEG Signals.
Hua Li, Fei Chen
2020ClovaCall: Korean Goal-Oriented Dialog Speech Corpus for Automatic Speech Recognition of Contact Centers.
Jung-Woo Ha, Kihyun Nam, Jingu Kang, Sang-Woo Lee, Sohee Yang, Hyunhoon Jung, Hyeji Kim, Eunmi Kim, Soojin Kim, Hyun Ah Kim, Kyoungtae Doh, Chan Kyu Lee, Nako Sung, Sunghun Kim
2020Coarticulation as Synchronised Sequential Target Approximation: An EMA Study.
Zirui Liu, Yi Xu, Feng-fan Hsieh
2020Combination of End-to-End and Hybrid Models for Speech Recognition.
Jeremy Heng Meng Wong, Yashesh Gaur, Rui Zhao, Liang Lu, Eric Sun, Jinyu Li, Yifan Gong
2020Combining Audio and Brain Activity for Predicting Speech Quality.
Ivan Halim Parmonangan, Hiroki Tanaka, Sakriani Sakti, Satoshi Nakamura
2020Compact Speaker Embedding: lrx-Vector.
Munir Georges, Jonathan Huang, Tobias Bocklet
2020Comparing EEG Analyses with Different Epoch Alignments in an Auditory Lexical Decision Experiment.
Louis ten Bosch, Kimberley Mulder, Lou Boves
2020Comparing Natural Language Processing Techniques for Alzheimer's Dementia Prediction in Spontaneous Speech.
Thomas Searle, Zina M. Ibrahim, Richard J. B. Dobson
2020Comparison of Glottal Source Parameter Values in Emotional Vowels.
Yongwei Li, Jianhua Tao, Bin Liu, Donna Erickson, Masato Akagi
2020Competency Evaluation in Voice Mimicking Using Acoustic Cues.
Abhijith Girish, Adharsh Sabu, Akshay Prasannan Latha, Rajeev Rajan
2020Competing Speaker Count Estimation on the Fusion of the Spectral and Spatial Embedding Space.
Chao Peng, Xihong Wu, Tianshu Qu
2020Complementary Language Model and Parallel Bi-LRNN for False Trigger Mitigation.
Rishika Agarwal, Xiaochuan Niu, Pranay Dighe, Srikanth Vishnubhotla, Sameer Badaskar, Devang Naik
2020Complex-Valued Variational Autoencoder: A Novel Deep Generative Model for Direct Representation of Complex Spectra.
Toru Nakashika
2020Compressing LSTM Networks with Hierarchical Coarse-Grain Sparsity.
Deepak Kadetotad, Jian Meng, Visar Berisha, Chaitali Chakrabarti, Jae-sun Seo
2020Computationally Efficient and Versatile Framework for Joint Optimization of Blind Speech Separation and Dereverberation.
Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Hiroshi Sawada, Shoko Araki
2020Computer Audition for Continuous Rainforest Occupancy Monitoring: The Case of Bornean Gibbons' Call Detection.
Panagiotis Tzirakis, Alexander Shiarella, Robert M. Ewers, Björn W. Schuller
2020Computer-Assisted Language Learning System: Automatic Speech Evaluation for Children Learning Malay and Tamil.
Ke Shi, Kye Min Tan, Richeng Duan, Siti Umairah Md. Salleh, Nur Farah Ain Suhaimi, Rajan Vellu, Ngoc Thuy Huong Helen Thai, Nancy F. Chen
2020Conditional Response Augmentation for Dialogue Using Knowledge Distillation.
Myeongho Jeong, Seungtaek Choi, Hojae Han, Kyungho Kim, Seung-won Hwang
2020Conditional Spoken Digit Generation with StyleGAN.
Kasperi Palkama, Lauri Juvela, Alexander Ilin
2020Confidence Measure for Speech-to-Concept End-to-End Spoken Language Understanding.
Antoine Caubrière, Yannick Estève, Antoine Laurent, Emmanuel Morin
2020Confidence Measures in Encoder-Decoder Models for Speech Recognition.
Alejandro Woodward, Clara Bonnín, Issey Masuda, David Varas, Elisenda Bou-Balust, Juan Carlos Riveiro
2020Conformer: Convolution-augmented Transformer for Speech Recognition.
Anmol Gulati, James Qin, Chung-Cheng Chiu, Niki Parmar, Yu Zhang, Jiahui Yu, Wei Han, Shibo Wang, Zhengdong Zhang, Yonghui Wu, Ruoming Pang
2020Congruent Audiovisual Speech Enhances Cortical Envelope Tracking During Auditory Selective Attention.
Zhen Fu, Jing Chen
2020Constrained Ratio Mask for Speech Enhancement Using DNN.
Hongjiang Yu, Wei-Ping Zhu, Yuhong Yang
2020Contemporary Polish Language Model (Version 2) Using Big Data and Sub-Word Approach.
Krzysztof Wolk
2020Context Dependent RNNLM for Automatic Transcription of Conversations.
Srikanth Raj Chetupalli, Sriram Ganapathy
2020Context-Aware Goodness of Pronunciation for Computer-Assisted Pronunciation Training.
Jiatong Shi, Nan Huo, Qin Jin
2020Context-Dependent Acoustic Modeling Without Explicit Phone Clustering.
Tina Raissi, Eugen Beck, Ralf Schlüter, Hermann Ney
2020Context-Dependent Domain Adversarial Neural Network for Multimodal Emotion Recognition.
Zheng Lian, Jianhua Tao, Bin Liu, Jian Huang, Zhanlei Yang, Rongjun Li
2020ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context.
Wei Han, Zhengdong Zhang, Yu Zhang, Jiahui Yu, Chung-Cheng Chiu, James Qin, Anmol Gulati, Ruoming Pang, Yonghui Wu
2020Contextual RNN-T for Open Domain ASR.
Mahaveer Jain, Gil Keren, Jay Mahadeokar, Geoffrey Zweig, Florian Metze, Yatharth Saraf
2020Contextualized Translation of Automatically Segmented Speech.
Marco Gaido, Mattia Antonino Di Gangi, Matteo Negri, Mauro Cettolo, Marco Turchi
2020Contextualizing ASR Lattice Rescoring with Hybrid Pointer Network Language Model.
Da-Rong Liu, Chunxi Liu, Frank Zhang, Gabriel Synnaeve, Yatharth Saraf, Geoffrey Zweig
2020Continual Learning for Multi-Dialect Acoustic Models.
Brady Houston, Katrin Kirchhoff
2020Continual Learning in Automatic Speech Recognition.
Samik Sadhu, Hynek Hermansky
2020Contrastive Predictive Coding of Audio with an Adversary.
Luyu Wang, Kazuya Kawakami, Aäron van den Oord
2020Contribution of RMS-Level-Based Speech Segments to Target Speech Decoding Under Noisy Conditions.
Lei Wang, Ed X. Wu, Fei Chen
2020Controllable Neural Prosody Synthesis.
Max Morrison, Zeyu Jin, Justin Salamon, Nicholas J. Bryan, Gautham J. Mysore
2020Controllable Neural Text-to-Speech Synthesis Using Intuitive Prosodic Features.
Tuomo Raitio, Ramya Rasipuram, Dan Castellani
2020Controlling the Strength of Emotions in Speech-Like Emotional Sound Generated by WaveNet.
Kento Matsumoto, Sunao Hara, Masanobu Abe
2020Conv-TasSAN: Separative Adversarial Network Based on Conv-TasNet.
Chengyun Deng, Yi Zhang, Shiqian Ma, Yongtao Sha, Hui Song, Xiangang Li
2020Conv-Transformer Transducer: Low Latency, Low Frame Rate, Streamable End-to-End Speech Recognition.
Wenyong Huang, Wenchao Hu, Yu Ting Yeung, Xiao Chen
2020Conversational Emotion Recognition Using Self-Attention Mechanisms and Graph Neural Networks.
Zheng Lian, Jianhua Tao, Bin Liu, Jian Huang, Zhanlei Yang, Rongjun Li
2020Converting Anyone's Emotion: Towards Speaker-Independent Emotional Voice Conversion.
Kun Zhou, Berrak Sisman, Mingyang Zhang, Haizhou Li
2020CopyCat: Many-to-Many Fine-Grained Prosody Transfer for Neural Text-to-Speech.
Sri Karlapati, Alexis Moinet, Arnaud Joly, Viacheslav Klimkov, Daniel Sáez-Trigueros, Thomas Drugman
2020Correlating Cepstra with Formant Frequencies: Implications for Phonetically-Informed Forensic Voice Comparison.
Vincent Hughes, Frantz Clermont, Philip Harrison
2020Correlation Between Prosody and Pragmatics: Case Study of Discourse Markers in French and English.
Lou Lee, Denis Jouvet, Katarina Bartkova, Yvon Keromnes, Mathilde Dargnat
2020Cortical Oscillatory Hierarchy for Natural Sentence Processing.
Bin Zhao, Jianwu Dang, Gaoyan Zhang, Masashi Unoki
2020Cosine-Distance Virtual Adversarial Training for Semi-Supervised Speaker-Discriminative Acoustic Embeddings.
Florian L. Kreyssig, Philip C. Woodland
2020Coswara - A Database of Breathing, Cough, and Voice Sounds for COVID-19 Diagnosis.
Neeraj Kumar Sharma, Prashant Krishnan V, Rohit Kumar, Shreyas Ramoji, Srikanth Raj Chetupalli, Nirmala R., Prasanta Kumar Ghosh, Sriram Ganapathy
2020Cotatron: Transcription-Guided Speech Encoder for Any-to-Many Voice Conversion Without Parallel Data.
Seung Won Park, Doo-young Kim, Myun-chul Joe
2020Cross Attention with Monotonic Alignment for Speech Transformer.
Yingzhu Zhao, Chongjia Ni, Cheung-Chi Leung, Shafiq R. Joty, Eng Siong Chng, Bin Ma
2020Cross-Domain Adaptation of Spoken Language Identification for Related Languages: The Curious Case of Slavic Languages.
Badr M. Abdullah, Tania Avgustinova, Bernd Möbius, Dietrich Klakow
2020Cross-Domain Adaptation with Discrepancy Minimization for Text-Independent Forensic Speaker Verification.
Zhenyu Wang, Wei Xia, John H. L. Hansen
2020Cross-Lingual Speaker Verification with Domain-Balanced Hard Prototype Mining and Language-Dependent Score Normalization.
Jenthe Thienpondt, Brecht Desplanques, Kris Demuynck
2020Cross-Lingual Text-To-Speech Synthesis via Domain Adaptation and Perceptual Similarity Regression in Speaker Space.
Detai Xin, Yuki Saito, Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari
2020Cross-Linguistic Interaction Between Phonological Categorization and Orthography Predicts Prosodic Effects in the Acquisition of Portuguese Liquids by L1-Mandarin Learners.
Chao Zhou, Silke Hamann
2020Cross-Linguistic Perception of Utterances with Willingness and Reluctance in Mandarin by Korean L2 Learners.
Wenqian Li, Jung-Yueh Tu
2020Crossmodal Sound Retrieval Based on Specific Target Co-Occurrence Denoted with Weak Labels.
Masahiro Yasuda, Yasunori Ohishi, Yuma Koizumi, Noboru Harada
2020Cues for Perception of Gender in Synthetic Voices and the Role of Identity.
Maxwell Hope, Jason Lilley
2020CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-Spectrogram Conversion.
Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo
2020Cyclic Spectral Modeling for Unsupervised Unit Discovery into Voice Conversion with Excitation and Waveform Modeling.
Patrick Lumban Tobing, Tomoki Hayashi, Yi-Chiao Wu, Kazuhiro Kobayashi, Tomoki Toda
2020DARTS-ASR: Differentiable Architecture Search for Multilingual Speech Recognition and Adaptation.
Yi-Chen Chen, Jui-Yang Hsu, Cheng-Kuang Lee, Hung-yi Lee
2020DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement.
Yanxin Hu, Yun Liu, Shubo Lv, Mengtao Xing, Shimin Zhang, Yihui Fu, Jian Wu, Bihong Zhang, Lei Xie
2020DNN No-Reference PSTN Speech Quality Prediction.
Gabriel Mittag, Ross Cutler, Yasaman Hosseinkashi, Michael Revow, Sriram Srinivasan, Naglakshmi Chande, Robert Aichner
2020Data Augmentation Using Prosody and False Starts to Recognize Non-Native Children's Speech.
Hemant Kumar Kathania, Mittul Singh, Tamás Grósz, Mikko Kurimo
2020Data Augmentation for Code-Switch Language Modeling by Fusing Multiple Text Generation Methods.
Xinhui Hu, Qi Zhang, Lei Yang, Binbin Gu, Xinkang Xu
2020Data Balancing for Boosting Performance of Low-Frequency Classes in Spoken Language Understanding.
Judith Gaspers, Quynh Ngoc Thi Do, Fabian Triefenbach
2020Data Efficient Voice Cloning from Noisy Samples with Domain Adversarial Training.
Jian Cong, Shan Yang, Lei Xie, Guoqiao Yu, Guanglu Wan
2020Datasets and Benchmarks for Task-Oriented Log Dialogue Ranking Task.
Xinnuo Xu, Yizhe Zhang, Lars Liden, Sungjin Lee
2020Decoding Imagined, Heard, and Spoken Speech: Classification and Regression of EEG Using a 14-Channel Dry-Contact Mobile Headset.
Jonathan Clayton, Scott Wellington, Cassia Valentini-Botinhao, Oliver Watts
2020Deep Architecture Enhancing Robustness to Noise, Adversarial Attacks, and Cross-Corpus Setting for Speech Emotion Recognition.
Siddique Latif, Rajib Rana, Sara Khalifa, Raja Jurdak, Björn W. Schuller
2020Deep Attentive End-to-End Continuous Breath Sensing from Speech.
Alexis Deighton MacIntyre, Georgios Rizos, Anton Batliner, Alice Baird, Shahin Amiriparian, Antonia F. de C. Hamilton, Björn W. Schuller
2020Deep Convolutional Spiking Neural Networks for Keyword Spotting.
Emre Yilmaz, Özgür Bora Gevrek, Jibin Wu, Yuxiang Chen, Xuanbo Meng, Haizhou Li
2020Deep Embedding Learning for Text-Dependent Speaker Verification.
Peng Zhang, Peng Hu, Xueliang Zhang
2020Deep F-Measure Maximization for End-to-End Speech Understanding.
Leda Sari, Mark Hasegawa-Johnson
2020Deep Learning Based Assessment of Synthetic Speech Naturalness.
Gabriel Mittag, Sebastian Möller
2020Deep Learning Based Dereverberation of Temporal Envelopes for Robust Speech Recognition.
Anurenjan Purushothaman, Anirudh Sreeram, Rohit Kumar, Sriram Ganapathy
2020Deep Learning Based Open Set Acoustic Scene Classification.
Zuzanna Kwiatkowska, Beniamin Kalinowski, Michal Kosmider, Krzysztof Rykaczewski
2020Deep MOS Predictor for Synthetic Speech Using Cluster-Based Modeling.
Yeunju Choi, Youngmoon Jung, Hoirin Kim
2020Deep Neural Network-Based Generalized Sidelobe Canceller for Robust Multi-Channel Speech Recognition.
Guanjun Li, Shan Liang, Shuai Nie, Wenju Liu, Zhanlei Yang, Longshuai Xiao
2020Deep Scattering Power Spectrum Features for Robust Speech Recognition.
Neethu M. Joy, Dino Oglic, Zoran Cvetkovic, Peter Bell, Steve Renals
2020Deep Self-Supervised Hierarchical Clustering for Speaker Diarization.
Prachi Singh, Sriram Ganapathy
2020Deep Speaker Embedding with Long Short Term Centroid Learning for Text-Independent Speaker Verification.
Junyi Peng, Rongzhi Gu, Yuexian Zou
2020Deep Speech Inpainting of Time-Frequency Masks.
Mikolaj Kegler, Pierre Beckmann, Milos Cernak
2020Deep Template Matching for Small-Footprint and Configurable Keyword Spotting.
Peng Zhang, Xueliang Zhang
2020Defense for Black-Box Attacks on Anti-Spoofing Models by Self-Supervised Learning.
Haibin Wu, Andy T. Liu, Hung-yi Lee
2020Densely Connected Time Delay Neural Network for Speaker Verification.
Ya-Qi Yu, Wu-Jun Li
2020Depthwise Separable Convolutional ResNet with Squeeze-and-Excitation Blocks for Small-Footprint Keyword Spotting.
Menglong Xu, Xiao-Lei Zhang
2020Design Choices for X-Vector Based Speaker Anonymization.
Brij Mohan Lal Srivastava, Natalia A. Tomashenko, Xin Wang, Emmanuel Vincent, Junichi Yamagishi, Mohamed Maouche, Aurélien Bellet, Marc Tommasi
2020Design and Development of a Human-Machine Dialog Corpus for the Automated Assessment of Conversational English Proficiency.
Vikram Ramanarayanan
2020Detecting Adversarial Examples for Speech Recognition via Uncertainty Quantification.
Sina Däubener, Lea Schönherr, Asja Fischer, Dorothea Kolossa
2020Detecting Audio Attacks on ASR Systems with Dropout Uncertainty.
Tejas Jayashankar, Jonathan Le Roux, Pierre Moulin
2020Detecting Domain-Specific Credibility and Expertise in Text and Speech.
Shengli Hu
2020Detecting and Analysing Spontaneous Oral Cancer Speech in the Wild.
Bence Mark Halpern, Rob van Son, Michiel W. M. van den Brekel, Odette Scharenborg
2020Detecting and Counting Overlapping Speakers in Distant Speech Scenarios.
Samuele Cornell, Maurizio Omologo, Stefano Squartini, Emmanuel Vincent
2020Detection of Subclinical Mild Traumatic Brain Injury (mTBI) Through Speech and Gait.
Tanya Talkar, Sophia Yuditskaya, James R. Williamson, Adam C. Lammert, Hrishikesh Rao, Daniel J. Hannon, Anne T. O'Brien, Gloria Vergara-Diaz, Richard DeLaura, Douglas E. Sturim, Gregory A. Ciccarelli, Ross Zafonte, Jeff Palmer, Paolo Bonato, Thomas F. Quatieri
2020Detection of Voicing and Place of Articulation of Fricatives with Deep Learning in a Virtual Speech and Language Therapy Tutor.
Ivo Anjos, Maxine Eskénazi, Nuno Marques, Margarida Grilo, Isabel Guimarães, João Magalhães, Sofia Cavaco
2020Developing RNN-T Models Surpassing High-Performance Hybrid Models with Customization Capability.
Jinyu Li, Rui Zhao, Zhong Meng, Yanqing Liu, Wenning Wei, Sarangarajan Parthasarathy, Vadim Mazalov, Zhenghao Wang, Lei He, Sheng Zhao, Yifan Gong
2020Developing an Open-Source Corpus of Yoruba Speech.
Alexander Gutkin, Isin Demirsahin, Oddur Kjartansson, Clara Rivera, Kólá Túbosún
2020Development of Multilingual ASR Using GlobalPhone for Less-Resourced Languages: The Case of Ethiopian Languages.
Martha Yifiru Tachbelie, Solomon Teferra Abate, Tanja Schultz
2020Development of a Speech Quality Database Under Uncontrolled Conditions.
Alessandro Ragano, Emmanouil Benetos, Andrew Hines
2020DiPCo - Dinner Party Corpus.
Maarten Van Segbroeck, Ahmed Zaid, Ksenia Kutsenko, Cirenia Huerta, Tinh Nguyen, Xuewen Luo, Björn Hoffmeister, Jan Trmal, Maurizio Omologo, Roland Maas
2020Differences in Gradient Emotion Perception: Human vs. Alexa Voices.
Michelle Cohn, Eran Raveh, Kristin Predeck, Iona Gessinger, Bernd Möbius, Georgia Zellou
2020Differential Beamforming for Uniform Circular Array with Directional Microphones.
Weilong Huang, Jinwei Feng
2020Dimensional Emotion Prediction Based on Interactive Context in Conversation.
Xiaohan Shi, Sixia Li, Jianwu Dang
2020Discovering Articulatory Speech Targets from Synthesized Random Babble.
Heikki Rasilo, Yannick Jadoul
2020Discriminative Method to Extract Coarse Prosodic Structure and its Application for Statistical Phrase/Accent Command Estimation.
Yuma Shirahata, Daisuke Saito, Nobuaki Minematsu
2020Discriminative Singular Spectrum Analysis for Bioacoustic Classification.
Bernardo B. Gatto, Eulanda Miranda dos Santos, Juan Gabriel Colonna, Naoya Sogi, Lincon Sales de Souza, Kazuhiro Fukui
2020Discriminative Transfer Learning for Optimizing ASR and Semantic Labeling in Task-Oriented Spoken Dialog.
Yao Qian, Yu Shi, Michael Zeng
2020Disfluencies and Fine-Tuning Pre-Trained Language Models for Detection of Alzheimer's Disease.
Jiahong Yuan, Yuchen Bian, Xingyu Cai, Jiaji Huang, Zheng Ye, Kenneth Church
2020Distant Supervision for Polyphone Disambiguation in Mandarin Chinese.
Jiawen Zhang, Yuanyuan Zhao, Jiaqi Zhu, Jinba Xiao
2020Distilling the Knowledge of BERT for Sequence-to-Sequence ASR.
Hayato Futami, Hirofumi Inaguma, Sei Ueno, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara
2020Distributed Summation Privacy for Speech Enhancement.
Matthew O'Connor, W. Bastiaan Kleijn
2020Do End-to-End Speech Recognition Models Care About Context?
Lasse Borgholt, Jakob D. Havtorn, Zeljko Agic, Anders Søgaard, Lars Maaløe, Christian Igel
2020Do Face Masks Introduce Bias in Speech Technologies? The Case of Automated Scoring of Speaking Proficiency.
Anastassia Loukina, Keelan Evanini, Matthew Mulholland, Ian Blood, Klaus Zechner
2020Does French Listeners' Ability to Use Accentual Information at the Word Level Depend on the Ear of Presentation?
Amandine Michelas, Sophie Dufour
2020Does Lexical Retrieval Deteriorate in Patients with Mild Cognitive Impairment? Analysis of Brain Functional Network Will Tell.
Chongyuan Lian, Tianqi Wang, Mingxiao Gu, Manwa L. Ng, Feiqi Zhu, Lan Wang, Nan Yan
2020Doing Something we Never could with Spoken Language Technologies-from early days to the era of deep learning.
Lin-Shan Lee
2020Domain Adaptation Using Class Similarity for Robust Speech Recognition.
Han Zhu, Jiangjiang Zhao, Yuling Ren, Li Wang, Pengyuan Zhang
2020Domain Adaptation for Enhancing Speech-Based Depression Detection in Natural Environmental Conditions Using Dilated CNNs.
Zhaocheng Huang, Julien Epps, Dale Joachim, Brian Stasak, James R. Williamson, Thomas F. Quatieri
2020Domain Adversarial Neural Networks for Dysarthric Speech Recognition.
Dominika Woszczyk, Stavros Petridis, David E. Millard
2020Domain Aware Training for Far-Field Small-Footprint Keyword Spotting.
Haiwei Wu, Yan Jia, Yuanfei Nie, Ming Li
2020Domain-Invariant Speaker Vector Projection by Model-Agnostic Meta-Learning.
Jiawen Kang, Ruiqi Liu, Lantian Li, Yunqi Cai, Dong Wang, Thomas Fang Zheng
2020Double Adversarial Network Based Monaural Speech Enhancement for Robust Speech Recognition.
Zhihao Du, Jiqing Han, Xueliang Zhang
2020Dual Attention in Time and Frequency Domain for Voice Activity Detection.
Joohyung Lee, Youngmoon Jung, Hoirin Kim
2020Dual Stage Learning Based Dynamic Time-Frequency Mask Generation for Audio Event Classification.
Donghyeon Kim, Jaihyun Park, David K. Han, Hanseok Ko
2020Dual-Adversarial Domain Adaptation for Generalized Replay Attack Detection.
Hongji Wang, Heinrich Dinkel, Shuai Wang, Yanmin Qian, Kai Yu
2020Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation.
Jingjing Chen, Qirong Mao, Dong Liu
2020Dual-Signal Transformation LSTM Network for Real-Time Noise Suppression.
Nils L. Westhausen, Bernd T. Meyer
2020DurIAN-SC: Duration Informed Attention Network Based Singing Voice Conversion System.
Liqiang Zhang, Chengzhu Yu, Heng Lu, Chao Weng, Chunlei Zhang, Yusong Wu, Xiang Xie, Zijin Li, Dong Yu
2020DurIAN: Duration Informed Attention Network for Speech Synthesis.
Chengzhu Yu, Heng Lu, Na Hu, Meng Yu, Chao Weng, Kun Xu, Peng Liu, Deyi Tuo, Shiyin Kang, Guangzhi Lei, Dan Su, Dong Yu
2020Dynamic Margin Softmax Loss for Speaker Verification.
Dao Zhou, Longbiao Wang, Kong Aik Lee, Yibo Wu, Meng Liu, Jianwu Dang, Jianguo Wei
2020Dynamic Prosody Generation for Speech Synthesis Using Linguistics-Driven Acoustic Embedding Selection.
Shubhi Tyagi, Marco Nicolis, Jonas Rohnke, Thomas Drugman, Jaime Lorenzo-Trueba
2020Dynamic Soft Windowing and Language Dependent Style Token for Code-Switching End-to-End Speech Synthesis.
Ruibo Fu, Jianhua Tao, Zhengqi Wen, Jiangyan Yi, Chunyu Qiang, Tao Wang
2020Dynamic Speaker Representations Adjustment and Decoder Factorization for Speaker Adaptation in End-to-End Speech Synthesis.
Ruibo Fu, Jianhua Tao, Zhengqi Wen, Jiangyan Yi, Tao Wang, Chunyu Qiang
2020Dysarthria Detection and Severity Assessment Using Rhythm-Based Metrics.
Abner Hernandez, Eun Jung Yeo, Sunhee Kim, Minhwa Chung
2020Dysarthric Speech Recognition Based on Deep Metric Learning.
Yuki Takashima, Ryoichi Takashima, Tetsuya Takiguchi, Yasuo Ariki
2020ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification.
Brecht Desplanques, Jenthe Thienpondt, Kris Demuynck
2020EEG-Based Short-Time Auditory Attention Detection Using Multi-Task Deep Learning.
Zhuo Zhang, Gaoyan Zhang, Jianwu Dang, Shuang Wu, Di Zhou, Longbiao Wang
2020Early Stage LM Integration Using Local and Global Log-Linear Combination.
Wilfried Michel, Ralf Schlüter, Hermann Ney
2020Effect of Adding Positional Information on Convolutional Neural Networks for End-to-End Speech Recognition.
Jinhwan Park, Wonyong Sung
2020Effect of Microphone Position Measurement Error on RIR and its Impact on Speech Intelligibility and Quality.
Aditya Raikar, Karan Nathwani, Ashish Panda, Sunil Kumar Kopparapu
2020Effect of Spectral Complexity Reduction and Number of Instruments on Musical Enjoyment with Cochlear Implants.
Avamarie Brueggeman, John H. L. Hansen
2020Effects of Communication Channels and Actor's Gender on Emotion Identification by Native Mandarin Speakers.
Yi Lin, Hongwei Ding
2020Effects of Dialectal Code-Switching on Speech Modules: A Study Using Egyptian Arabic Broadcast Speech.
Shammur A. Chowdhury, Younes Samih, Mohamed Eldesouki, Ahmed Ali
2020Efficient Low-Latency Speech Enhancement with Mobile Audio Streaming Networks.
Michal Romaniuk, Piotr Masztalski, Karol Piaskowski, Mateusz Matuszewski
2020Efficient MDI Adaptation for n-Gram Language Models.
Ruizhe Huang, Ke Li, Ashish Arora, Daniel Povey, Sanjeev Khudanpur
2020Efficient Minimum Word Error Rate Training of RNN-Transducer for End-to-End Speech Recognition.
Jinxi Guo, Gautam Tiwari, Jasha Droppo, Maarten Van Segbroeck, Che-Wei Huang, Andreas Stolcke, Roland Maas
2020Efficient Neural Speech Synthesis for Low-Resource Languages Through Multilingual Modeling.
Marcel de Korte, Jaebok Kim, Esther Klabbers
2020Efficient Wait-k Models for Simultaneous Machine Translation.
Maha Elbayad, Laurent Besacier, Jakob Verbeek
2020Efficient WaveGlow: An Improved WaveGlow Vocoder with Enhanced Speed.
Wei Song, Guanghui Xu, Zhengchen Zhang, Chao Zhang, Xiaodong He, Bowen Zhou
2020EigenEmo: Spectral Utterance Representation Using Dynamic Mode Decomposition for Speech Emotion Classification.
Shuiyang Mao, P. C. Ching, Tan Lee
2020Electroglottographic-Phonetic Study on Korean Phonation Induced by Tripartite Plosives in Yanbian Korean.
Yinghao Li, Jinghua Zhang
2020Emitting Word Timings with End-to-End Models.
Tara N. Sainath, Ruoming Pang, David Rybach, Basi García, Trevor Strohman
2020Emotion Profile Refinery for Speech Emotion Classification.
Shuiyang Mao, Pak-Chung Ching, Tan Lee
2020Empirical Interpretation of Speech Emotion Perception with Attention Based Model for Speech Emotion Recognition.
Md. Asif Jalal, Rosanna Milner, Thomas Hain
2020End-to-End ASR with Adaptive Span Self-Attention.
Xuankai Chang, Aswin Shanmugam Subramanian, Pengcheng Guo, Shinji Watanabe, Yuya Fujita, Motoi Omachi
2020End-to-End Deep Learning Speech Recognition Model for Silent Speech Challenge.
Naoki Kimura, Zixiong Su, Takaaki Saeki
2020End-to-End Domain-Adversarial Voice Activity Detection.
Marvin Lavechin, Marie-Philippe Gill, Ruben Bousbib, Hervé Bredin, Leibny Paola García-Perera
2020End-to-End Far-Field Speech Recognition with Unified Dereverberation and Beamforming.
Wangyou Zhang, Aswin Shanmugam Subramanian, Xuankai Chang, Shinji Watanabe, Yanmin Qian
2020End-to-End Keyword Search Based on Attention and Energy Scorer for Low Resource Languages.
Zeyu Zhao, Weiqiang Zhang
2020End-to-End Multi-Look Keyword Spotting.
Meng Yu, Xuan Ji, Bo Wu, Dan Su, Dong Yu
2020End-to-End Named Entity Recognition from English Speech.
Hemant Yadav, Sreyan Ghosh, Yi Yu, Rajiv Ratn Shah
2020End-to-End Neural Transformer Based Spoken Language Understanding.
Martin Radfar, Athanasios Mouchtaris, Siegfried Kunzmann
2020End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors.
Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Yawen Xue, Kenji Nagamatsu
2020End-to-End Speech Emotion Recognition Combined with Acoustic-to-Word ASR Model.
Han Feng, Sei Ueno, Tatsuya Kawahara
2020End-to-End Speech Intelligibility Prediction Using Time-Domain Fully Convolutional Neural Networks.
Mathias Bach Pedersen, Morten Kolbæk, Asger Heidemann Andersen, Søren Holdt Jensen, Jesper Jensen
2020End-to-End Speech-to-Dialog-Act Recognition.
Viet-Trung Dang, Tianyu Zhao, Sei Ueno, Hirofumi Inaguma, Tatsuya Kawahara
2020End-to-End Spoken Language Understanding Without Full Transcripts.
Hong-Kwang Jeff Kuo, Zoltán Tüske, Samuel Thomas, Yinghui Huang, Kartik Audhkhasi, Brian Kingsbury, Gakuto Kurata, Zvi Kons, Ron Hoory, Luis A. Lastras
2020End-to-End Task-Oriented Dialog System Through Template Slot Value Generation.
Teakgyu Hong, Oh-Woog Kwon, Young-Kil Kim
2020End-to-End Text-to-Speech Synthesis with Unaligned Multiple Language Units Based on Attention.
Masashi Aso, Shinnosuke Takamichi, Hiroshi Saruwatari
2020Enhancing Formant Information in Spectrographic Display of Speech.
B. Yegnanarayana, Joseph M. Anand, Vishala Pannala
2020Enhancing Intelligibility of Dysarthric Speech Using Gated Convolutional-Based Voice Conversion System.
Chen-Yu Chen, Wei-Zhong Zheng, Syu-Siang Wang, Yu Tsao, Pei-Chun Li, Ying-Hui Lai
2020Enhancing Monotonic Multihead Attention for Streaming ASR.
Hirofumi Inaguma, Masato Mimura, Tatsuya Kawahara
2020Enhancing Monotonicity for Robust Autoregressive Transformer TTS.
Xiangyu Liang, Zhiyong Wu, Runnan Li, Yanqing Liu, Sheng Zhao, Helen Meng
2020Enhancing Sequence-to-Sequence Text-to-Speech with Morphology.
Jason Taylor, Korin Richmond
2020Enhancing Speech Intelligibility in Text-To-Speech Synthesis Using Speaking Style Conversion.
Dipjyoti Paul, P. V. Muhammed Shifas, Yannis Pantazis, Yannis Stylianou
2020Enhancing Transferability of Black-Box Adversarial Attacks via Lifelong Learning for Speech Emotion Recognition Models.
Zhao Ren, Jing Han, Nicholas Cummins, Björn W. Schuller
2020Enhancing the Interaural Time Difference of Bilateral Cochlear Implants with the Temporal Limits Encoder.
Yangyang Wan, Huali Zhou, Qinglin Meng, Nengheng Zheng
2020Ensemble Approaches for Uncertainty in Spoken Language Assessment.
Xixin Wu, Kate M. Knill, Mark J. F. Gales, Andrey Malinin
2020Ensemble of Students Taught by Probabilistic Teachers to Improve Speech Emotion Recognition.
Kusha Sridhar, Carlos Busso
2020Ensembling End-to-End Deep Models for Computational Paralinguistics Tasks: ComParE 2020 Mask and Breathing Sub-Challenges.
Maxim Markitantov, Denis Dresvyanskiy, Danila Mamontov, Heysem Kaya, Wolfgang Minker, Alexey Karpov
2020Entity Linking for Short Text Using Structured Knowledge Graph via Multi-Grained Text Matching.
Binxuan Huang, Han Wang, Tong Wang, Yue Liu, Yang Liu
2020Environment Sound Classification Using Multiple Feature Channels and Attention Based Deep Convolutional Neural Network.
Jivitesh Sharma, Ole-Christoffer Granmo, Morten Goodwin
2020Environmental Sound Classification with Parallel Temporal-Spectral Attention.
Helin Wang, Yuexian Zou, Dading Chong, Wenwu Wang
2020Er-Suffixation in Southwestern Mandarin: An EMA and Ultrasound Study.
Jing Huang, Feng-fan Hsieh, Yueh-Chin Chang
2020Evaluating Automatically Generated Phoneme Captions for Images.
Justin van der Hout, Zoltán D'Haese, Mark Hasegawa-Johnson, Odette Scharenborg
2020Evaluating and Optimizing Prosodic Alignment for Automatic Dubbing.
Marcello Federico, Yogesh Virkar, Robert Enyedi, Roberto Barra-Chicote
2020Evaluating the Reliability of Acoustic Speech Embeddings.
Robin Algayres, Mohamed Salah Zaïem, Benoît Sagot, Emmanuel Dupoux
2020Evolutionary Algorithm Enhanced Neural Architecture Search for Text-Independent Speaker Verification.
Xiaoyang Qu, Jianzong Wang, Jing Xiao
2020Evolved Speech-Transformer: Applying Neural Architecture Search to End-to-End Automatic Speech Recognition.
Jihwan Kim, Jisung Wang, Sangki Kim, Yeha Lee
2020Exploiting Conic Affinity Measures to Design Speech Enhancement Systems Operating in Unseen Noise Conditions.
Pavlos Papadopoulos, Shrikanth Narayanan
2020Exploiting Cross-Domain Visual Feature Generation for Disordered Speech Recognition.
Shansong Liu, Xurong Xie, Jianwei Yu, Shoukang Hu, Mengzhe Geng, Rongfeng Su, Shi-Xiong Zhang, Xunying Liu, Helen Meng
2020Exploiting Deep Sentential Context for Expressive End-to-End Speech Synthesis.
Fengyu Yang, Shan Yang, Qinghua Wu, Yujun Wang, Lei Xie
2020Exploiting Multi-Modal Features from Pre-Trained Networks for Alzheimer's Dementia Recognition.
Junghyun Koo, Jie Hwan Lee, Jaewoo Pyo, Yujin Jo, Kyogu Lee
2020Exploration of Acoustic and Lexical Cues for the INTERSPEECH 2020 Computational Paralinguistic Challenge.
Ziqing Yang, Zifan An, Zehao Fan, Chengye Jing, Houwei Cao
2020Exploration of Audio Quality Assessment and Anomaly Localisation Using Attention Models.
Qiang Huang, Thomas Hain
2020Exploration of End-to-End Synthesisers for Zero Resource Speech Challenge 2020.
Karthik Pandia D. S, Anusha Prakash, Mano Ranjith Kumar M., Hema A. Murthy
2020Exploring Deep Hybrid Tensor-to-Vector Network Architectures for Regression Based Speech Enhancement.
Jun Qi, Hu Hu, Yannan Wang, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee
2020Exploring Lexicon-Free Modeling Units for End-to-End Korean and Korean-English Code-Switching Speech Recognition.
Jisung Wang, Jihwan Kim, Sangki Kim, Yeha Lee
2020Exploring Listeners' Speech Rate Preferences.
Olympia Simantiraki, Martin Cooke
2020Exploring MMSE Score Prediction Using Verbal and Non-Verbal Cues.
Shahla Farzana, Natalie Parde
2020Exploring TTS Without T Using Biologically/Psychologically Motivated Neural Network Modules (ZeroSpeech 2020).
Takashi Morita, Hiroki Koda
2020Exploring Text and Audio Embeddings for Multi-Dimension Elderly Emotion Recognition.
Mariana Julião, Alberto Abad, Helena Moniz
2020Exploring Transformers for Large-Scale Speech Recognition.
Liang Lu, Changliang Liu, Jinyu Li, Yifan Gong
2020Exploring the Use of an Artificial Accent of English to Assess Phonetic Learning in Monolingual and Bilingual Speakers.
Laura Spinu, Jiwon Hwang, Nadya Pincus, Mariana Vasilita
2020Exploring the Use of an Unsupervised Autoregressive Model as a Shared Encoder for Text-Dependent Speaker Verification.
Vijay Ravi, Ruchao Fan, Amber Afshan, Huanhua Lu, Abeer Alwan
2020Extended Study on the Use of Vocal Tract Variables to Quantify Neuromotor Coordination in Depression.
Nadee Seneviratne, James R. Williamson, Adam C. Lammert, Thomas F. Quatieri, Carol Y. Espy-Wilson
2020Extrapolating False Alarm Rates in Automatic Speaker Verification.
Alexey Sholokhov, Tomi Kinnunen, Ville Vestman, Kong Aik Lee
2020F0 Patterns in Mandarin Statements of Mandarin and Cantonese Speakers.
Yike Yang, Si Chen, Xi Chen
2020F0 Slope and Mean: Cues to Speech Segmentation in French.
Maria del Mar Cordero, Fanny Meunier, Nicolas Grimault, Stéphane Pota, Elsa Spinelli
2020FEARLESS STEPS Challenge (FS-2): Supervised Learning with Massive Naturalistic Apollo Data.
Aditya Joglekar, John H. L. Hansen, Meena Chandra Shekhar, Abhijeet Sangwan
2020FT Speech: Danish Parliament Speech Corpus.
Andreas Kirkedal, Marija Stepanovic, Barbara Plank
2020Face2Speech: Towards Multi-Speaker Text-to-Speech Synthesis Using an Embedding Vector Predicted from a Face Image.
Shunsuke Goto, Kotaro Onishi, Yuki Saito, Kentaro Tachibana, Koichiro Mori
2020FaceFilter: Audio-Visual Speech Separation Using Still Images.
Soo-Whan Chung, Soyeon Choe, Joon Son Chung, Hong-Goo Kang
2020Fast and Lightweight On-Device TTS with Tacotron2 and LPCNet.
Vadim Popov, Stanislav Kamenev, Mikhail A. Kudinov, Sergey Repyevsky, Tasnima Sadekova, Vitalii Bushaev, Vladimir Kryzhanovskiy, Denis Parkhomenko
2020Fast and Slow Acoustic Model.
Kshitiz Kumar, Emilian Stoimenov, Hosam Khalil, Jian Wu
2020Faster, Simpler and More Accurate Hybrid ASR Systems Using Wordpieces.
Frank Zhang, Yongqiang Wang, Xiaohui Zhang, Chunxi Liu, Yatharth Saraf, Geoffrey Zweig
2020FeatherWave: An Efficient High-Fidelity Neural Vocoder with Multi-Band Linear Prediction.
Qiao Tian, Zewang Zhang, Heng Lu, Ling-Hui Chen, Shan Liu
2020FinChat: Corpus and Evaluation Setup for Finnish Chat Conversations on Everyday Topics.
Katri Leino, Juho Leinonen, Mittul Singh, Sami Virpioja, Mikko Kurimo
2020Finding Intelligible Consonant-Vowel Sounds Using High-Quality Articulatory Synthesis.
Daniel R. van Niekerk, Anqi Xu, Branislav Gerazov, Paul Konstantin Krug, Peter Birkholz, Yi Xu
2020Finnish ASR with Deep Transformer Models.
Abhilash Jain, Aku Rouhe, Stig-Arne Grönroos, Mikko Kurimo
2020Focal Loss for Punctuation Prediction.
Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Ye Bai, Cunhang Fan
2020Formant Tracking Using Dilated Convolutional Networks Through Dense Connection with Gating Mechanism.
Wang Dai, Jinsong Zhang, Yingming Gao, Wei Wei, Dengfeng Ke, Binghuai Lin, Yanlu Xie
2020Frame-Level Signal-to-Noise Ratio Estimation Using Deep Learning.
Hao Li, DeLiang Wang, Xueliang Zhang, Guanglai Gao
2020Frame-Wise Online Unsupervised Adaptation of DNN-HMM Acoustic Model from Perspective of Robust Adaptive Filtering.
Ryu Takeda, Kazunori Komatani
2020From Speaker Verification to Multispeaker Speech Synthesis, Deep Transfer with Feedback Constraint.
Zexin Cai, Chuxiong Zhang, Ming Li
2020Fundamental Frequency Model for Postfiltering at Low Bitrates in a Transform-Domain Speech and Audio Codec.
Sneha Das, Tom Bäckström, Guillaume Fuchs
2020Fusion Architectures for Word-Based Audiovisual Speech Recognition.
Michael Wand, Jürgen Schmidhuber
2020FusionRNN: Shared Neural Parameters for Multi-Channel Distant Speech Recognition.
Titouan Parcollet, Xinchi Qiu, Nicholas D. Lane
2020GAN-Based Data Generation for Speech Emotion Recognition.
Sefik Emre Eskimez, Dimitrios Dimitriadis, Robert Gmyr, Kenichi Kumanati
2020GAZEV: GAN-Based Zero-Shot Voice Conversion Over Non-Parallel Speech Corpus.
Zining Zhang, Bingsheng He, Zhenjie Zhang
2020GEV Beamforming Supported by DOA-Based Masks Generated on Pairs of Microphones.
François Grondin, Jean-Samuel Lauzon, Jonathan Vincent, François Michaud
2020Gaming Corpus for Studying Social Screams.
Hiroki Mori, Yuki Kikuchi
2020Gated Multi-Head Attention Pooling for Weakly Labelled Audio Tagging.
Sixin Hong, Yuexian Zou, Wenwu Wang
2020Gated Recurrent Fusion of Spatial and Spectral Features for Multi-Channel Speech Separation with Deep Embedding Representations.
Cunhang Fan, Jianhua Tao, Bin Liu, Jiangyan Yi, Zhengqi Wen
2020Generalized Minimal Distortion Principle for Blind Source Separation.
Robin Scheibler
2020Generative Adversarial Network Based Acoustic Echo Cancellation.
Yi Zhang, Chengyun Deng, Shiqian Ma, Yongtao Sha, Hui Song, Xiangang Li
2020Generative Adversarial Training Data Adaptation for Very Low-Resource Automatic Speech Recognition.
Kohei Matsuura, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara
2020Generic Indic Text-to-Speech Synthesisers with Rapid Adaptation in an End-to-End Framework.
Anusha Prakash, Hema A. Murthy
2020Glottal Closure Instants Detection from EGG Signal by Classification Approach.
Gurunath Reddy M., K. Sreenivasa Rao, Partha Pratim Das
2020Group Gated Fusion on Attention-Based Bidirectional Alignment for Multimodal Emotion Recognition.
Pengfei Liu, Kun Li, Helen Meng
2020HRI-RNN: A User-Robot Dynamics-Oriented RNN for Engagement Decrease Detection.
Asma Atamna, Chloé Clavel
2020Harmonic Lowering for Accelerating Harmonic Convolution for Audio Signals.
Hirotoshi Takeuchi, Kunio Kashino, Yasunori Ohishi, Hiroshi Saruwatari
2020Hearing-Impaired Bio-Inspired Cochlear Models for Real-Time Auditory Applications.
Arthur Van Den Broucke, Deepak Baby, Sarah Verhulst
2020HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks.
Jiaqi Su, Zeyu Jin, Adam Finkelstein
2020Hide and Speak: Towards Deep Neural Networks for Speech Steganography.
Felix Kreuk, Yossi Adi, Bhiksha Raj, Rita Singh, Joseph Keshet
2020Hider-Finder-Combiner: An Adversarial Architecture for General Speech Signal Modification.
Jacob J. Webber, Olivier Perrotin, Simon King
2020Hierarchical Multi-Grained Generative Model for Expressive Speech Synthesis.
Yukiya Hono, Kazuna Tsuboi, Kei Sawada, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda
2020Hierarchical Multi-Stage Word-to-Grapheme Named Entity Corrector for Automatic Speech Recognition.
Abhinav Garg, Ashutosh Gupta, Dhananjaya Gowda, Shatrughan Singh, Chanwoo Kim
2020High Performance Sequence-to-Sequence Model for Streaming Speech Recognition.
Thai-Son Nguyen, Ngoc-Quan Pham, Sebastian Stüker, Alex Waibel
2020High Quality Streaming Speech Synthesis with Low, Sentence-Length-Independent Latency.
Nikolaos Ellinas, Georgios Vamvoukakis, Konstantinos Markopoulos, Aimilios Chalamandaris, Georgia Maniati, Panos Kakoulidis, Spyros Raptis, June Sig Sung, Hyoungmin Park, Pirros Tsiakoulis
2020How Does Label Noise Affect the Quality of Speaker Embeddings?
Minh Pham, Zeqian Li, Jacob Whitehill
2020How Ordinal Are Your Data?
Sadari Jayawardena, Julien Epps, Zhaocheng Huang
2020How Rhythm and Timbre Encode Mooré Language in Bendré Drummed Speech.
Laure Dentel, Julien Meyer
2020Hybrid Network Feature Extraction for Depression Assessment from Speech.
Ziping Zhao, Qifei Li, Nicholas Cummins, Bin Liu, Haishuai Wang, Jianhua Tao, Björn W. Schuller
2020Hybrid Transformer/CTC Networks for Hardware Efficient Voice Triggering.
Saurabh Adya, Vineet Garg, Siddharth Sigtia, Pramod Simha, Chandra Dhir
2020ICE-Talk: An Interface for a Controllable Expressive Talking Machine.
Noé Tits, Kevin El Haddad, Thierry Dutoit
2020INTERSPEECH 2020 Deep Noise Suppression Challenge: A Fully Convolutional Recurrent Network (FCRN) for Joint Dereverberation and Denoising.
Maximilian Strake, Bruno Defraene, Kristoff Fluyt, Wouter Tirry, Tim Fingscheidt
2020Identify Speakers in Cocktail Parties with End-to-End Attention.
Junzhe Zhu, Mark Hasegawa-Johnson, Leda Sari
2020Identifying Causal Relationships Between Behavior and Local Brain Activity During Natural Conversation.
Youssef Hmamouche, Laurent Prévot, Magalie Ochs, Thierry Chaminade
2020Identifying Important Time-Frequency Locations in Continuous Speech Utterances.
Hassan Salami Kavaki, Michael I. Mandel
2020Implicit Transfer of Privileged Acoustic Information in a Generalized Knowledge Distillation Framework.
Takashi Fukuda, Samuel Thomas
2020Improved Guided Source Separation Integrated with a Strong Back-End for the CHiME-6 Dinner Party Scenario.
Hangting Chen, Pengyuan Zhang, Qian Shi, Zuozhen Liu
2020Improved Hybrid Streaming ASR with Transformer Language Models.
Pau Baquero-Arnal, Javier Jorge, Adrià Giménez, Joan Albert Silvestre-Cerdà, Javier Iranzo-Sánchez, Albert Sanchís, Jorge Civera, Alfons Juan
2020Improved Learning of Word Embeddings with Word Definitions and Semantic Injection.
Yichi Zhang, Yinpei Dai, Zhijian Ou, Huixin Wang, Junlan Feng
2020Improved Model for Vocal Folds with a Polyp with Potential Application.
Jônatas Santos, Jugurta Montalvão, Israel Santos
2020Improved Noisy Student Training for Automatic Speech Recognition.
Daniel S. Park, Yu Zhang, Ye Jia, Wei Han, Chung-Cheng Chiu, Bo Li, Yonghui Wu, Quoc V. Le
2020Improved Prosody from Learned F0 Codebook Representations for VQ-VAE Speech Waveform Reconstruction.
Yi Zhao, Haoyu Li, Cheng-I Lai, Jennifer Williams, Erica Cooper, Junichi Yamagishi
2020Improved RawNet with Feature Map Scaling for Text-Independent Speaker Verification Using Raw Waveforms.
Jee-weon Jung, Seung-Bin Kim, Hye-jin Shim, Ju-ho Kim, Ha-Jin Yu
2020Improved Speech Enhancement Using TCN with Multiple Encoder-Decoder Layers.
Vinith Kishore, Nitya Tiwari, Periyasamy Paramasivam
2020Improved Speech Enhancement Using a Time-Domain GAN with Mask Learning.
Ju Lin, Sufeng Niu, Adriaan J. de Lind van Wijngaarden, Jerome L. McClendon, Melissa C. Smith, Kuang-Ching Wang
2020Improved Training Strategies for End-to-End Speech Recognition in Digital Voice Assistants.
Hitesh Tulsiani, Ashtosh Sapru, Harish Arsikere, Surabhi Punjabi, Sri Garimella
2020Improved Zero-Shot Voice Conversion Using Explicit Conditioning Signals.
Shahan Nercessian
2020Improving Code-Switching Language Modeling with Artificially Generated Texts Using Cycle-Consistent Adversarial Networks.
Chia-Yu Li, Ngoc Thang Vu
2020Improving Cognitive Impairment Classification by Generative Neural Network-Based Feature Augmentation.
Bahman Mirheidari, Daniel Blackburn, Ronan O'Malley, Annalena Venneri, Traci Walker, Markus Reuber, Heidi Christensen
2020Improving Cross-Lingual Transfer Learning for End-to-End Speech Recognition with Speech Translation.
Changhan Wang, Juan Miguel Pino, Jiatao Gu
2020Improving Detection of Alzheimer's Disease Using Automatic Speech Recognition to Identify High-Quality Segments for More Robust Feature Extraction.
Yilin Pan, Bahman Mirheidari, Markus Reuber, Annalena Venneri, Daniel Blackburn, Heidi Christensen
2020Improving End-to-End Speech-to-Intent Classification with Reptile.
Yusheng Tian, Philip John Gorinski
2020Improving Low Resource Code-Switched ASR Using Augmented Code-Switched TTS.
Yash Sharma, Basil Abraham, Karan Taneja, Preethi Jyothi
2020Improving Multi-Scale Aggregation Using Feature Pyramid Module for Robust Speaker Verification of Variable-Duration Utterances.
Youngmoon Jung, Seong Min Kye, Yeunju Choi, Myunghun Jung, Hoirin Kim
2020Improving On-Device Speaker Verification Using Federated Learning with Privacy.
Filip Granqvist, Matt Seigel, Rogier C. van Dalen, Áine Cahill, Stephen Shum, Matthias Paulik
2020Improving Opus Low Bit Rate Quality with Neural Speech Synthesis.
Jan Skoglund, Jean-Marc Valin
2020Improving Partition-Block-Based Acoustic Echo Canceler in Under-Modeling Scenarios.
Wenzhi Fan, Jing Lu
2020Improving Replay Detection System with Channel Consistency DenseNeXt for the ASVspoof 2019 Challenge.
Chao Zhang, Junjie Cheng, Yanmei Gu, Huacan Wang, Jun Ma, Shaojun Wang, Jing Xiao
2020Improving Speech Emotion Recognition Using Graph Attentive Bi-Directional Gated Recurrent Unit Network.
Bo-Hao Su, Chun-Min Chang, Yun-Shao Lin, Chi-Chun Lee
2020Improving Speech Intelligibility Through Speaker Dependent and Independent Spectral Style Conversion.
Tuan Dinh, Alexander Kain, Kris Tjaden
2020Improving Speech Recognition Using GAN-Based Speech Synthesis and Contrastive Unspoken Text Selection.
Zhehuai Chen, Andrew Rosenberg, Yu Zhang, Gary Wang, Bhuvana Ramabhadran, Pedro J. Moreno
2020Improving Speech Recognition of Compound-Rich Languages.
Prabhat Pandey, Volker Leutnant, Simon Wiesler, Jahn Heymann, Daniel Willett
2020Improving Tail Performance of a Deliberation E2E ASR Model Using a Large Text Corpus.
Cal Peyser, Sepand Mavandadi, Tara N. Sainath, James Apfel, Ruoming Pang, Shankar Kumar
2020Improving Transformer-Based Speech Recognition with Unsupervised Pre-Training and Multi-Task Semantic Knowledge Learning.
Song Li, Lin Li, Qingyang Hong, Lingling Liu
2020Improving Unsupervised Sparsespeech Acoustic Models with Categorical Reparameterization.
Benjamin Milde, Chris Biemann
2020Improving Vietnamese Named Entity Recognition from Speech Using Word Capitalization and Punctuation Recovery Models.
Thai Binh Nguyen, Quang Minh Nguyen, Thi Thu Hien Nguyen, Quoc Truong Do, Luong Chi Mai
2020Improving X-Vector and PLDA for Text-Dependent Speaker Verification.
Zhuxin Chen, Yue Lin
2020Improving the Performance of Acoustic-to-Articulatory Inversion by Removing the Training Loss of Noncritical Portions of Articulatory Channels Dynamically.
Qiang Fang
2020Improving the Prosody of RNN-Based English Text-To-Speech Synthesis by Incorporating a BERT Model.
Tom Kenter, Manish Sharma, Rob Clark
2020Improving the Speaker Identity of Non-Parallel Many-to-Many Voice Conversion with Adversarial Speaker Recognition.
Shaojin Ding, Guanlong Zhao, Ricardo Gutierrez-Osuna
2020In Defence of Metric Learning for Speaker Recognition.
Joon Son Chung, Jaesung Huh, Seongkyu Mun, Minjae Lee, Hee-Soo Heo, Soyeon Choe, Chiheon Ham, Sunghwan Jung, Bong-Jin Lee, Icksang Han
2020Inaudible Adversarial Perturbations for Targeted Attack in Speaker Recognition.
Qing Wang, Pengcheng Guo, Lei Xie
2020Incorporating Broad Phonetic Information for Speech Enhancement.
Yen-Ju Lu, Chien-Feng Liao, Xugang Lu, Jeih-weih Hung, Yu Tsao
2020Increasing the Intelligibility and Naturalness of Alaryngeal Speech Using Voice Conversion and Synthetic Fundamental Frequency.
Tuan Dinh, Alexander Kain, Robin Samlan, Beiming Cao, Jun Wang
2020Incremental Machine Speech Chain Towards Enabling Listening While Speaking in Real-Time.
Sashi Novitasari, Andros Tjandra, Tomoya Yanagita, Sakriani Sakti, Satoshi Nakamura
2020Incremental Text to Speech for Neural Sequence-to-Sequence Models Using Reinforcement Learning.
Devang S. Ram Mohan, Raphael Lenain, Lorenzo Foglianti, Tian Huey Teh, Marlene Staib, Alexandra Torresquintero, Jiameng Gao
2020Independent Echo Path Modeling for Stereophonic Acoustic Echo Cancellation.
Yi Gao, Ian Liu, J. Zheng, Cheng Luo, Bin Li
2020Independent and Automatic Evaluation of Speaker-Independent Acoustic-to-Articulatory Reconstruction.
Maud Parrot, Juliette Millet, Ewan Dunbar
2020Individual Variation in Language Attitudes Toward Voice-AI: The Role of Listeners' Autistic-Like Traits.
Michelle Cohn, Melina Sarian, Kristin Predeck, Georgia Zellou
2020Insertion-Based Modeling for End-to-End Automatic Speech Recognition.
Yuya Fujita, Shinji Watanabe, Motoi Omachi, Xuankai Chang
2020Instantaneous Time Delay Estimation of Broadband Signals.
B. H. V. S. Narayana Murthy, J. V. Satyanarayana, Nivedita Chennupati, B. Yegnanarayana
2020Integrating the Application and Realization of Mandarin 3rd Tone Sandhi in the Resolution of Sentence Ambiguity.
Wei Lai, Aini Li
2020Intelligibility Enhancement Based on Speech Waveform Modification Using Hearing Impairment.
Shu Hikosaka, Shogo Seki, Tomoki Hayashi, Kazuhiro Kobayashi, Kazuya Takeda, Hideki Banno, Tomoki Toda
2020Intelligibility-Enhancing Speech Modifications - The Hurricane Challenge 2.0.
Jan Rennies, Henning F. Schepker, Cassia Valentini-Botinhao, Martin Cooke
2020Interaction of Tone and Voicing in Mizo.
Wendy Lalhminghlui, Priyankoo Sarmah
2020Interactive Text-to-Speech System via Joint Style Analysis.
Yang Gao, Weiyi Zheng, Zhaojun Yang, Thilo Köhler, Christian Fuegen, Qing He
2020Intra-Class Variation Reduction of Speaker Representation in Disentanglement Framework.
Yoohwan Kwon, Soo-Whan Chung, Hong-Goo Kang
2020Intra-Utterance Similarity Preserving Knowledge Distillation for Audio Tagging.
Chun-Chieh Chang, Chieh-Chi Kao, Ming Sun, Chao Wang
2020Introducing the VoicePrivacy Initiative.
Natalia A. Tomashenko, Brij Mohan Lal Srivastava, Xin Wang, Emmanuel Vincent, Andreas Nautsch, Junichi Yamagishi, Nicholas W. D. Evans, Jose Patino, Jean-François Bonastre, Paul-Gauthier Noé, Massimiliano Todisco
2020Investigating Effective Additional Contextual Factors in DNN-Based Spontaneous Speech Synthesis.
Yuki Yamashita, Tomoki Koriyama, Yuki Saito, Shinnosuke Takamichi, Yusuke Ijima, Ryo Masumura, Hiroshi Saruwatari
2020Investigating Light-ResNet Architecture for Spoofing Detection Under Mismatched Conditions.
Prasanth Parasu, Julien Epps, Kaavya Sriskandaraja, Gajan Suthokumar
2020Investigating Robustness of Adversarial Samples Detection for Automatic Speaker Verification.
Xu Li, Na Li, Jinghua Zhong, Xixin Wu, Xunying Liu, Dan Su, Dong Yu, Helen Meng
2020Investigating Self-Supervised Pre-Training for End-to-End Speech Translation.
Ha Nguyen, Fethi Bougares, Natalia A. Tomashenko, Yannick Estève, Laurent Besacier
2020Investigating the Visual Lombard Effect with Gabor Based Features.
Waito Chiu, Yan Xu, Andrew Abel, Chun Lin, Zhengzheng Tu
2020Investigation of Data Augmentation Techniques for Disordered Speech Recognition.
Mengzhe Geng, Xurong Xie, Shansong Liu, Jianwei Yu, Shoukang Hu, Xunying Liu, Helen Meng
2020Investigation of Large-Margin Softmax in Neural Language Modeling.
Jingjing Huo, Yingbo Gao, Weiyue Wang, Ralf Schlüter, Hermann Ney
2020Investigation of NICT Submission for Short-Duration Speaker Verification Challenge 2020.
Peng Shen, Xugang Lu, Hisashi Kawai
2020Investigation of Phase Distortion on Perceived Speech Quality for Hearing-Impaired Listeners.
Zhuohuang Zhang, Donald S. Williamson, Yi Shen
2020Is Everything Fine, Grandma? Acoustic and Linguistic Modeling for Robust Elderly Speech Emotion Recognition.
Gizem Sogancioglu, Oxana Verkholyak, Heysem Kaya, Dmitrii Fedotov, Tobias Cadèe, Albert Ali Salah, Alexey Karpov
2020Iterative Compression of End-to-End ASR Model Using AutoML.
Abhinav Mehrotra, Lukasz Dudziak, Jinsu Yeo, Young-Yoon Lee, Ravichander Vipperla, Mohamed S. Abdelfattah, Sourav Bhattacharya, Samin Ishtiaq, Alberto Gil C. P. Ramos, SangJeong Lee, Daehyun Kim, Nicholas D. Lane
2020Iterative Pseudo-Labeling for Speech Recognition.
Qiantong Xu, Tatiana Likhomanenko, Jacob Kahn, Awni Y. Hannun, Gabriel Synnaeve, Ronan Collobert
2020JDI-T: Jointly Trained Duration Informed Transformer for Text-To-Speech without Explicit Alignment.
Dan Lim, Won Jang, Gyeonghwan O, Heayoung Park, Bongwan Kim, Jaesam Yoon
2020Joint Detection of Sentence Stress and Phrase Boundary for Prosody.
Binghuai Lin, Liyuan Wang, Xiaoli Feng, Jinsong Zhang
2020Joint Prediction of Punctuation and Disfluency in Speech Transcripts.
Binghuai Lin, Liyuan Wang
2020Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of any Number of Speakers.
Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Tianyan Zhou, Takuya Yoshioka
2020Joint Training for Simultaneous Speech Denoising and Dereverberation with Deep Embedding Representations.
Cunhang Fan, Jianhua Tao, Bin Liu, Jiangyan Yi, Zhengqi Wen
2020Jointly Encoding Word Confusion Network and Dialogue Context with BERT for Spoken Language Understanding.
Chen Liu, Su Zhu, Zijian Zhao, Ruisheng Cao, Lu Chen, Kai Yu
2020Jointly Fine-Tuning "BERT-Like" Self Supervised Models to Improve Multimodal Speech Emotion Recognition.
Shamane Siriwardhana, Andrew Reis, Rivindu Weerasekera, Suranga Nanayakkara
2020JukeBox: A Multilingual Singer Recognition Dataset.
Anurag Chowdhury, Austin Cozzo, Arun Ross
2020Kaldi-Web: An Installation-Free, On-Device Speech Recognition System.
Mathieu Hu, Laurent Pierron, Emmanuel Vincent, Denis Jouvet
2020Knowledge Distillation from Offline to Streaming RNN Transducer for End-to-End Speech Recognition.
Gakuto Kurata, George Saon
2020Knowledge-and-Data-Driven Amplitude Spectrum Prediction for Hierarchical Neural Vocoders.
Yang Ai, Zhen-Hua Ling
2020LAIX Corpus of Chinese Learner English: Towards a Benchmark for L2 English ASR.
Yanhong Wang, Huan Luan, Jiahong Yuan, Bin Wang, Hui Lin
2020LVCSR with Transformer Language Models.
Eugen Beck, Ralf Schlüter, Hermann Ney
2020Language Model Data Augmentation Based on Text Domain Transfer.
Atsunori Ogawa, Naohiro Tawara, Marc Delcroix
2020Language Modeling for Speech Analytics in Under-Resourced Languages.
Simone Wills, Pieter Uys, Charl Johannes van Heerden, Etienne Barnard
2020Large Scale Evaluation of Importance Maps in Automatic Speech Recognition.
Viet Anh Trinh, Michael I. Mandel
2020Large Scale Weakly and Semi-Supervised Learning for Low-Resource Video ASR.
Kritika Singh, Vimal Manohar, Alex Xiao, Sergey Edunov, Ross B. Girshick, Vitaliy Liptchinsky, Christian Fuegen, Yatharth Saraf, Geoffrey Zweig, Abdelrahman Mohamed
2020Large-Scale End-to-End Multilingual Speech Recognition and Language Identification with Multi-Task Learning.
Wenxin Hou, Yue Dong, Bairong Zhuang, Longfei Yang, Jiatong Shi, Takahiro Shinozaki
2020Large-Scale Transfer Learning for Low-Resource Spoken Language Understanding.
Xueli Jia, Jianzong Wang, Zhiyong Zhang, Ning Cheng, Jing Xiao
2020Lattice-Free Maximum Mutual Information Training of Multilingual Speech Recognition Systems.
Srikanth R. Madikeri, Banriskhem K. Khonglah, Sibo Tong, Petr Motlícek, Hervé Bourlard, Daniel Povey
2020Laughter Synthesis: Combining Seq2seq Modeling with Transfer Learning.
Noé Tits, Kevin El Haddad, Thierry Dutoit
2020Learnable Spectro-Temporal Receptive Fields for Robust Voice Type Discrimination.
Tyler Vuong, Yangyang Xia, Richard M. Stern
2020Learning Better Speech Representations by Worsening Interference.
Jun Wang
2020Learning Complex Spectral Mapping for Speech Enhancement with Improved Cross-Corpus Generalization.
Ashutosh Pandey, DeLiang Wang
2020Learning Contextual Language Embeddings for Monaural Multi-Talker Speech Recognition.
Wangyou Zhang, Yanmin Qian
2020Learning Fast Adaptation on Cross-Accented Speech Recognition.
Genta Indra Winata, Samuel Cahyawijaya, Zihan Liu, Zhaojiang Lin, Andrea Madotto, Peng Xu, Pascale Fung
2020Learning Higher Representations from Pre-Trained Deep Models with Data Augmentation for the COMPARE 2020 Challenge Mask Task.
Tomoya Koike, Kun Qian, Björn W. Schuller, Yoshiharu Yamamoto
2020Learning Intonation Pattern Embeddings for Arabic Dialect Identification.
Aitor Arronte Alvarez, Elsayed Sabry Abdelaal Issa
2020Learning Joint Articulatory-Acoustic Representations with Normalizing Flows.
Pramit Saha, Sidney S. Fels
2020Learning Speaker Embedding from Text-to-Speech.
Jaejin Cho, Piotr Zelasko, Jesús Villalba, Shinji Watanabe, Najim Dehak
2020Learning Syllable-Level Discrete Prosodic Representation for Expressive Speech Generation.
Guangyan Zhang, Ying Qin, Tan Lee
2020Learning Utterance-Level Representations with Label Smoothing for Speech Emotion Recognition.
Jian Huang, Jianhua Tao, Bin Liu, Zheng Lian
2020Learning Voice Representation Using Knowledge Distillation for Automatic Voice Casting.
Adrien Gresse, Mathias Quillot, Richard Dufour, Jean-François Bonastre
2020Learning to Detect Bipolar Disorder and Borderline Personality Disorder with Language and Speech in Non-Clinical Interviews.
Bo Wang, Yue Wu, Niall Taylor, Terry J. Lyons, Maria Liakata, Alejo J. Nevado-Holgado, Kate E. A. Saunders
2020Learning to Recognize Per-Rater's Emotion Perception Using Co-Rater Training Strategy with Soft and Hard Labels.
Huang-Cheng Chou, Chi-Chun Lee
2020Length- and Noise-Aware Training Techniques for Short-Utterance Speaker Recognition.
Wenda Chen, Jonathan Huang, Tobias Bocklet
2020Leveraging Unlabeled Speech for Sequence Discriminative Training of Acoustic Models.
Ashtosh Sapru, Sri Garimella
2020Lexical Stress in Urdu.
Benazir Mumtaz, Tina Bögel, Miriam Butt
2020Light Convolutional Neural Network with Feature Genuinization for Detection of Synthetic Speech Attacks.
Zhenzong Wu, Rohan Kumar Das, Jichen Yang, Haizhou Li
2020Lightweight End-to-End Speech Recognition from Raw Audio Data Using Sinc-Convolutions.
Ludwig Kürzinger, Nicolas Lindae, Palle Klewitz, Gerhard Rigoll
2020Lightweight LPCNet-Based Neural Vocoder with Tensor Decomposition.
Hiroki Kanagawa, Yusuke Ijima
2020Lightweight Online Noise Reduction on Embedded Devices Using Hierarchical Recurrent Neural Networks.
Hendrik Schröter, Tobias Rosenkranz, Alberto N. Escalante-B., Pascal Zobel, Andreas Maier
2020Links Between Production and Perception of Glottalisation in Individual Australian English Speaker/Listeners.
Joshua Penney, Felicity Cox, Anita Szakay
2020Lip Graph Assisted Audio-Visual Speech Recognition Using Bidirectional Synchronous Fusion.
Hong Liu, Zhan Chen, Bing Yang
2020Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition.
Ye Bai, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Zhengqi Wen, Shuai Zhang
2020Listen to What You Want: Neural Network-Based Universal Sound Selector.
Tsubasa Ochiai, Marc Delcroix, Yuma Koizumi, Hiroaki Ito, Keisuke Kinoshita, Shoko Araki
2020Listen, Watch and Understand at the Cocktail Party: Audio-Visual-Contextual Speech Separation.
Chenda Li, Yanmin Qian
2020Lite Audio-Visual Speech Enhancement.
Shang-Yi Chuang, Yu Tsao, Chen-Chou Lo, Hsin-Min Wang
2020Low Latency Auditory Attention Detection with Common Spatial Pattern Analysis of EEG Signals.
Siqi Cai, Enze Su, Yonghao Song, Longhan Xie, Haizhou Li
2020Low Latency End-to-End Streaming Speech Recognition with a Scout Network.
Chengyi Wang, Yu Wu, Liang Lu, Shujie Liu, Jinyu Li, Guoli Ye, Ming Zhou
2020Low Latency Speech Recognition Using End-to-End Prefetching.
Shuo-Yiin Chang, Bo Li, David Rybach, Yanzhang He, Wei Li, Tara N. Sainath, Trevor Strohman
2020Low-Latency Sequence-to-Sequence Speech Recognition and Translation by Partial Hypothesis Selection.
Danni Liu, Gerasimos Spanakis, Jan Niehues
2020Low-Latency Single Channel Speech Dereverberation Using U-Net Convolutional Neural Networks.
Ahmet Emin Bulut, Kazuhito Koishida
2020LungRN+NL: An Improved Adventitious Lung Sound Classification Using Non-Local Block ResNet Neural Network with Mixup Data Augmentation.
Yi Ma, Xinzi Xu, Yongfu Li
2020MIRNet: Learning Multiple Identities Representations in Overlapped Speech.
Hyewon Han, Soo-Whan Chung, Hong-Goo Kang
2020MLNET: An Adaptive Multiple Receptive-Field Attention Neural Network for Voice Activity Detection.
Zhenpeng Zheng, Jianzong Wang, Ning Cheng, Jian Luo, Jing Xiao
2020MLS: A Large-Scale Multilingual Dataset for Speech Research.
Vineel Pratap, Qiantong Xu, Anuroop Sriram, Gabriel Synnaeve, Ronan Collobert
2020Making a Distinction Between Schizophrenia and Bipolar Disorder Based on Temporal Parameters in Spontaneous Speech.
Gábor Gosztolya, Anita Bagi, Szilvia Szalóki, István Szendi, Ildikó Hoffmann
2020Malayalam-English Code-Switched: Grapheme to Phoneme System.
Sreeja Manghat, Sreeram Manghat, Tanja Schultz
2020Mandarin Lexical Tones: A Corpus-Based Study of Word Length, Syllable Position and Prosodic Position on Duration.
Yaru Wu, Martine Adda-Decker, Lori Lamel
2020Mandarin and English Adults' Cue-Weighting of Lexical Stress.
Zhen Zeng, Karen Mattock, Liquan Liu, Varghese Peter, Alba Tuninetti, Feng-Ming Tsao
2020Mask CTC: Non-Autoregressive End-to-End ASR with CTC and Mask Predict.
Yosuke Higuchi, Shinji Watanabe, Nanxin Chen, Tetsuji Ogawa, Tetsunori Kobayashi
2020Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters.
Vineel Pratap, Anuroop Sriram, Paden Tomasello, Awni Y. Hannun, Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert
2020MatchboxNet: 1D Time-Channel Separable Convolutional Neural Network Architecture for Speech Commands Recognition.
Somshubra Majumdar, Boris Ginsburg
2020Memory Controlled Sequential Self Attention for Sound Recognition.
Arjun Pankajakshan, Helen L. Bear, Vinod Subramanian, Emmanouil Benetos
2020Mentoring-Reverse Mentoring for Unsupervised Multi-Channel Speech Source Separation.
Yu Nakagome, Masahito Togami, Tetsuji Ogawa, Tetsunori Kobayashi
2020Meta Multi-Task Learning for Speech Emotion Recognition.
Ruichu Cai, Kaibin Guo, Boyan Xu, Xiaoyan Yang, Zhenjie Zhang
2020Meta-Learning for Short Utterance Speaker Recognition with Imbalance Length Pairs.
Seong Min Kye, Youngmoon Jung, Haebeom Lee, Sung Ju Hwang, Hoirin Kim
2020Meta-Learning for Speech Emotion Recognition Considering Ambiguity of Emotion Labels.
Takuya Fujioka, Takeshi Homma, Kenji Nagamatsu
2020Metadata-Aware End-to-End Keyword Spotting.
Hongyi Liu, Apurva Abhyankar, Yuriy Mishchenko, Thibaud Sénéchal, Gengshen Fu, Brian Kulis, Noah D. Stein, Anish Shah, Shiv Naga Prasad Vitaladevuni
2020Metric Learning Loss Functions to Reduce Domain Mismatch in the x-Vector Space for Language Recognition.
Raphaël Duroselle, Denis Jouvet, Irina Illina
2020Microphone Array Post-Filter for Target Speech Enhancement Without a Prior Information of Point Interferers.
Guanjun Li, Shan Liang, Shuai Nie, Wenju Liu, Zhanlei Yang, Longshuai Xiao
2020Microprosodic Variability in Plosives in German and Austrian German.
Margaret Zellers, Barbara Schuppler
2020Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition.
Chao Weng, Chengzhu Yu, Jia Cui, Chunlei Zhang, Dong Yu
2020Mixed Case Contextual ASR Using Capitalization Masks.
Diamantino Caseiro, Pat Rondon, Quoc-Nam Le The, Petar S. Aleksic
2020Mixtures of Deep Neural Experts for Automated Speech Scoring.
Sara Papi, Edmondo Trentin, Roberto Gretter, Marco Matassoni, Daniele Falavigna
2020MoBoAligner: A Neural Alignment Model for Non-Autoregressive TTS with Monotonic Boundary Search.
Naihan Li, Shujie Liu, Yanqing Liu, Sheng Zhao, Ming Liu, Ming Zhou
2020Mobile-Assisted Prosody Training for Limited English Proficiency: Learner Background and Speech Learning Pattern.
Kevin Hirschi, Okim Kang, Catia Cucchiarini, John H. L. Hansen, Keelan Evanini, Helmer Strik
2020Modeling ASR Ambiguity for Neural Dialogue State Tracking.
Vaishali Pal, Fabien Guillot, Manish Shrivastava, Jean-Michel Renders, Laurent Besacier
2020Modeling Global Body Configurations in American Sign Language.
Nicholas Wilkins, Max Cordes Galbraith, Ifeoma Nwogu
2020Monolingual Data Selection Analysis for English-Mandarin Hybrid Code-Switching Speech Recognition.
Haobo Zhang, Haihua Xu, Van Tung Pham, Hao Huang, Eng Siong Chng
2020Multi-Encoder-Decoder Transformer for Code-Switching Speech Recognition.
Xinyuan Zhou, Emre Yilmaz, Yanhua Long, Yijie Li, Haizhou Li
2020Multi-Lingual Multi-Speaker Text-to-Speech Synthesis for Voice Cloning with Online Speaker Enrollment.
Zhaoyu Liu, Brian Mak
2020Multi-Modal Attention for Speech Emotion Recognition.
Zexu Pan, Zhaojie Luo, Jichen Yang, Haizhou Li
2020Multi-Modal Embeddings Using Multi-Task Learning for Emotion Recognition.
Aparna Khare, Srinivas Parthasarathy, Shiva Sundaram
2020Multi-Modal Fusion with Gating Using Audio, Lexical and Disfluency Features for Alzheimer's Dementia Recognition from Spontaneous Speech.
Morteza Rohanian, Julian Hough, Matthew Purver
2020Multi-Modality Matters: A Performance Leap on VoxCeleb.
Zhengyang Chen, Shuai Wang, Yanmin Qian
2020Multi-Path RNN for Hierarchical Modeling of Long Sequential Data and its Application to Speaker Stream Separation.
Keisuke Kinoshita, Thilo von Neumann, Marc Delcroix, Tomohiro Nakatani, Reinhold Haeb-Umbach
2020Multi-Reference Neural TTS Stylization with Adversarial Cycle Consistency.
Matt Whitehill, Shuang Ma, Daniel McDuff, Yale Song
2020Multi-Scale Convolution for Robust Keyword Spotting.
Chen Yang, Xue Wen, Liming Song
2020Multi-Scale TCN: Exploring Better Temporal DNN Model for Causal Speech Enhancement.
Lu Zhang, Mingjiang Wang
2020Multi-Speaker Emotion Conversion via Latent Variable Regularization and a Chained Encoder-Decoder-Predictor Network.
Ravi Shankar, Hsi-Wei Hsieh, Nicolas Charon, Archana Venkataraman
2020Multi-Speaker Text-to-Speech Synthesis Using Deep Gaussian Processes.
Kentaro Mitsui, Tomoki Koriyama, Hiroshi Saruwatari
2020Multi-Stream Attention-Based BLSTM with Feature Segmentation for Speech Emotion Recognition.
Yuya Chiba, Takashi Nose, Akinori Ito
2020Multi-Talker ASR for an Unknown Number of Sources: Joint Training of Source Counting, Separation and ASR.
Thilo von Neumann, Christoph Böddeker, Lukas Drude, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani, Reinhold Haeb-Umbach
2020Multi-Task Learning for End-to-End Noise-Robust Bandwidth Extension.
Nana Hou, Chenglin Xu, Joey Tianyi Zhou, Eng Siong Chng, Haizhou Li
2020Multi-Task Learning for Voice Related Recognition Tasks.
Ana Montalvo, José R. Calvo, Jean-François Bonastre
2020Multi-Task Network for Noise-Robust Keyword Spotting and Speaker Verification Using CTC-Based Soft VAD and Global Query Attention.
Myunghun Jung, Youngmoon Jung, Jahyun Goo, Hoirin Kim
2020Multi-Task Siamese Neural Network for Improving Replay Attack Detection.
Patrick von Platen, Fei Tao, Gökhan Tür
2020MultiSpeech: Multi-Speaker Text to Speech with Transformer.
Mingjian Chen, Xu Tan, Yi Ren, Jin Xu, Hao Sun, Sheng Zhao, Tao Qin
2020Multilingual Acoustic and Language Modeling for Ethio-Semitic Languages.
Solomon Teferra Abate, Martha Yifiru Tachbelie, Tanja Schultz
2020Multilingual Jointly Trained Acoustic and Written Word Embeddings.
Yushi Hu, Shane Settle, Karen Livescu
2020Multilingual Speech Recognition Using Language-Specific Phoneme Recognition as Auxiliary Task for Indian Languages.
Hardik B. Sailor, Thomas Hain
2020Multilingual Speech Recognition with Self-Attention Structured Parameterization.
Yun Zhu, Parisa Haghani, Anshuman Tripathi, Bhuvana Ramabhadran, Brian Farris, Hainan Xu, Han Lu, Hasim Sak, Isabel Leal, Neeraj Gaur, Pedro J. Moreno, Qian Zhang
2020Multimodal Association for Speaker Verification.
Suwon Shon, James R. Glass
2020Multimodal Deception Detection Using Automatically Extracted Acoustic, Visual, and Lexical Features.
Jiaxuan Zhang, Sarah Ita Levitan, Julia Hirschberg
2020Multimodal Emotion Recognition Using Cross-Modal Attention and 1D Convolutional Neural Networks.
Krishna D. N, Ankita Patil
2020Multimodal Inductive Transfer Learning for Detection of Alzheimer's Dementia and its Severity.
Utkarsh Sarawgi, Wazeer Zulfikar, Nouran Soliman, Pattie Maes
2020Multimodal Semi-Supervised Learning Framework for Punctuation Prediction in Conversational Speech.
Monica Sunkara, Srikanth Ronanki, Dhanush Bekal, Sravan Bodapati, Katrin Kirchhoff
2020Multimodal Sign Language Recognition via Temporal Deformable Convolutional Sequence Learning.
Katerina Papadimitriou, Gerasimos Potamianos
2020Multimodal Speech Emotion Recognition Using Cross Attention with Aligned Audio and Text.
Yoonhyung Lee, Seunghyun Yoon, Kyomin Jung
2020Multimodal Target Speech Separation with Voice and Face References.
Leyuan Qu, Cornelius Weber, Stefan Wermter
2020Multiscale System for Alzheimer's Dementia Recognition Through Spontaneous Speech.
Erik Edwards, Charles Dognin, Bajibabu Bollepalli, Maneesh Kumar Singh
2020NAAGN: Noise-Aware Attention-Gated Network for Speech Enhancement.
Feng Deng, Tao Jiang, Xiaorui Wang, Chen Zhang, Yan Li
2020NEC-TT Speaker Verification System for SRE'19 CTS Challenge.
Kong Aik Lee, Koji Okabe, Hitoshi Yamamoto, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Keisuke Ishikawa, Koichi Shinoda
2020NPU Speaker Verification System for INTERSPEECH 2020 Far-Field Speaker Verification Challenge.
Li Zhang, Jian Wu, Lei Xie
2020Naturalness Enhancement with Linguistic Information in End-to-End TTS Using Unsupervised Parallel Encoding.
Alex Peiró Lilja, Mireia Farrús
2020Neural Architecture Search for Keyword Spotting.
Tong Mo, Yakun Yu, Mohammad Salameh, Di Niu, Shangling Jui
2020Neural Architecture Search on Acoustic Scene Classification.
Jixiang Li, Chuming Liang, Bo Zhang, Zhao Wang, Fei Xiang, Xiangxiang Chu
2020Neural Discriminant Analysis for Deep Speaker Embedding.
Lantian Li, Dong Wang, Thomas Fang Zheng
2020Neural Entrainment to Natural Speech Envelope Based on Subject Aligned EEG Signals.
Di Zhou, Gaoyan Zhang, Jianwu Dang, Shuang Wu, Zhuo Zhang
2020Neural Homomorphic Vocoder.
Zhijun Liu, Kuan Chen, Kai Yu
2020Neural Language Modeling with Implicit Cache Pointers.
Ke Li, Daniel Povey, Sanjeev Khudanpur
2020Neural PLDA Modeling for End-to-End Speaker Verification.
Shreyas Ramoji, Prashant Krishnan V, Sriram Ganapathy
2020Neural Representations of Dialogical History for Improving Upcoming Turn Acoustic Parameters Prediction.
Simone Fuscone, Benoît Favre, Laurent Prévot
2020Neural Spatio-Temporal Beamformer for Target Speech Separation.
Yong Xu, Meng Yu, Shi-Xiong Zhang, Lianwu Chen, Chao Weng, Jianming Liu, Dong Yu
2020Neural Speech Completion.
Kazuki Tsunematsu, Johanes Effendi, Sakriani Sakti, Satoshi Nakamura
2020Neural Speech Decoding for Amyotrophic Lateral Sclerosis.
Debadatta Dash, Paul Ferrari, Angel W. Hernandez-Mulero, Daragh Heitzman, Sara G. Austin, Jun Wang
2020Neural Speech Separation Using Spatially Distributed Microphones.
Dongmei Wang, Zhuo Chen, Takuya Yoshioka
2020Neural Text-to-Speech with a Modeling-by-Generation Excitation Vocoder.
Eunwoo Song, Min-Jae Hwang, Ryuichi Yamamoto, Jin-Seob Kim, Ohsung Kwon, Jae-Min Kim
2020Neural Zero-Inflated Quality Estimation Model for Automatic Speech Recognition System.
Kai Fan, Bo Li, Jiayi Wang, Shiliang Zhang, Boxing Chen, Niyu Ge, Zhijie Yan
2020Neutral Tone in Changde Mandarin.
Zhenrui Zhang, Fang Hu
2020Neutralization of Voicing Distinction of Stops in Tohoku Dialects of Japanese: Field Work and Acoustic Measurements.
Ai Mizoguchi, Ayako Hashimoto, Sanae Matsui, Setsuko Imatomi, Ryunosuke Kobayashi, Mafuyu Kitahara
2020New Advances in Speaker Diarization.
Hagai Aronowitz, Weizhong Zhu, Masayuki Suzuki, Gakuto Kurata, Ron Hoory
2020Noise Tokens: Learning Neural Noise Templates for Environment-Aware Speech Enhancement.
Haoyu Li, Junichi Yamagishi
2020Noisy-Reverberant Speech Enhancement Using DenseUNet with Time-Frequency Attention.
Yan Zhao, DeLiang Wang
2020Non-Autoregressive End-to-End TTS with Coarse-to-Fine Decoding.
Tao Wang, Xuefei Liu, Jianhua Tao, Jiangyan Yi, Ruibo Fu, Zhengqi Wen
2020Non-Intrusive Diagnostic Monitoring of Fullband Speech Quality.
Sebastian Möller, Tobias Hübschen, Thilo Michael, Gabriel Mittag, Gerhard Schmidt
2020Non-Native Children's Automatic Speech Recognition: The INTERSPEECH 2020 Shared Task ALTA Systems.
Kate M. Knill, Linlin Wang, Yu Wang, Xixin Wu, Mark J. F. Gales
2020Non-Parallel Emotion Conversion Using a Deep-Generative Hybrid Network and an Adversarial Pair Discriminator.
Ravi Shankar, Jacob Sager, Archana Venkataraman
2020Non-Parallel Many-to-Many Voice Conversion with PSR-StarGAN.
Yanping Li, Dongxiang Xu, Yan Zhang, Yang Wang, Binbin Chen
2020Non-Parallel Voice Conversion with Fewer Labeled Data by Conditional Generative Adversarial Networks.
Minchuan Chen, Weijian Hou, Jun Ma, Shaojun Wang, Jing Xiao
2020Nonlinear ISA with Auxiliary Variables for Learning Speech Representations.
Amrith Setlur, Barnabás Póczos, Alan W. Black
2020Nonlinear Residual Echo Suppression Based on Multi-Stream Conv-TasNet.
Hongsheng Chen, Teng Xiang, Kai Chen, Jing Lu
2020Nonlinear Residual Echo Suppression Using a Recurrent Neural Network.
Lukas Pfeifenberger, Franz Pernkopf
2020Nonparallel Emotional Speech Conversion Using VAE-GAN.
Yuexin Cao, Zhengchen Liu, Minchuan Chen, Jun Ma, Shaojun Wang, Jing Xiao
2020Nonparallel Training of Exemplar-Based Voice Conversion System Using INCA-Based Alignment Technique.
Hitoshi Suda, Gaku Kotani, Daisuke Saito
2020Now You're Speaking My Language: Visual Language Identification.
Triantafyllos Afouras, Joon Son Chung, Andrew Zisserman
2020ORCA-CLEAN: A Deep Denoising Toolkit for Killer Whale Communication.
Christian Bergler, Manuel Schmitt, Andreas Maier, Simeon Smeele, Volker Barth, Elmar Nöth
2020On Front-End Gain Invariant Modeling for Wake Word Spotting.
Yixin Gao, Noah D. Stein, Chieh-Chi Kao, Yunliang Cai, Ming Sun, Tao Zhang, Shiv Naga Prasad Vitaladevuni
2020On Improving Code Mixed Speech Synthesis with Mixlingual Grapheme-to-Phoneme Model.
Shubham Bansal, Arijit Mukherjee, Sandeepkumar Satpal, Rupesh Kumar Mehta
2020On Loss Functions and Recurrency Training for GAN-Based Speech Enhancement Systems.
Zhuohuang Zhang, Chengyun Deng, Yi Shen, Donald S. Williamson, Yongtao Sha, Yi Zhang, Hui Song, Xiangang Li
2020On Parameter Adaptation in Softmax-Based Cross-Entropy Loss for Improved Convergence Speed and Accuracy in DNN-Based Speaker Recognition.
Magdalena Rybicka, Konrad Kowalczyk
2020On Semi-Supervised LF-MMI Training of Acoustic Models with Limited Data.
Imran A. Sheikh, Emmanuel Vincent, Irina Illina
2020On Synthesis for Supervised Monaural Speech Separation in Time Domain.
Jingjing Chen, Qirong Mao, Dong Liu
2020On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition.
Jinyu Li, Yu Wu, Yashesh Gaur, Chengyi Wang, Rui Zhao, Shujie Liu
2020On the Robustness and Training Dynamics of Raw Waveform Models.
Erfan Loweimi, Peter Bell, Steve Renals
2020On the Usage of Multi-Feature Integration for Speaker Verification and Language Identification.
Zheng Li, Miao Zhao, Jing Li, Lin Li, Qingyang Hong
2020One Model, Many Languages: Meta-Learning for Multilingual Text-to-Speech.
Tomás Nekvinda, Ondrej Dusek
2020Ongoing Phonologization of Word-Final Voicing Alternations in Two Romance Languages: Romanian and French.
Mathilde Hutin, Adèle Jatteau, Ioana Vasilescu, Lori Lamel, Martine Adda-Decker
2020Online Blind Reverberation Time Estimation Using CRNNs.
Shuwen Deng, Wolfgang Mack, Emanuël A. P. Habets
2020Online Directional Speech Enhancement Using Geometrically Constrained Independent Vector Analysis.
Li Li, Kazuhito Koishida, Shoji Makino
2020Online Monaural Speech Enhancement Using Delayed Subband LSTM.
Xiaofei Li, Radu Horaud
2020Open-Set Short Utterance Forensic Speaker Verification Using Teacher-Student Network with Explicit Inductive Bias.
Mufan Sang, Wei Xia, John H. L. Hansen
2020Optimization and Evaluation of an Intelligibility-Improving Signal Processing Approach (IISPA) for the Hurricane Challenge 2.0 with FADE.
Marc René Schädler
2020Overview of the Interspeech TLT2020 Shared Task on ASR for Non-Native Children's Speech.
Roberto Gretter, Marco Matassoni, Daniele Falavigna, Keelan Evanini, Chee Wee Leong
2020POCO: A Voice Spoofing and Liveness Detection Corpus Based on Pop Noise.
Kosuke Akimoto, Seng Pei Liew, Sakiko Mishima, Ryo Mizushima, Kong Aik Lee
2020Pair Expansion for Learning Multilingual Semantic Embeddings Using Disjoint Visually-Grounded Speech Audio Datasets.
Yasunori Ohishi, Akisato Kimura, Takahito Kawanishi, Kunio Kashino, David Harwath, James R. Glass
2020Paralinguistic Classification of Mask Wearing by Image Classifiers and Fusion.
Jeno Szep, Salim Hariri
2020Parallel Rescoring with Transformer for Streaming On-Device Speech Recognition.
Wei Li, James Qin, Chung-Cheng Chiu, Ruoming Pang, Yanzhang He
2020Pardon the Interruption: An Analysis of Gender and Turn-Taking in U.S. Supreme Court Oral Arguments.
Haley Lepp, Gina-Anne Levow
2020Parkinson's Disease Detection from Speech Using Single Frequency Filtering Cepstral Coefficients.
Sudarsana Reddy Kadiri, Rashmi Kethireddy, Paavo Alku
2020Partial AUC Optimisation Using Recurrent Neural Networks for Music Detection with Limited Training Data.
Pablo Gimeno, Victoria Mingote, Alfonso Ortega Giménez, Antonio Miguel, Eduardo Lleida
2020Peking Opera Synthesis via Duration Informed Attention Network.
Yusong Wu, Shengchen Li, Chengzhu Yu, Heng Lu, Chao Weng, Liqiang Zhang, Dong Yu
2020Perceptimatic: A Human Speech Perception Benchmark for Unsupervised Subword Modelling.
Juliette Millet, Ewan Dunbar
2020Perception and Production of Mandarin Initial Stops by Native Urdu Speakers.
Dan Du, Xianjin Zhu, Zhu Li, Jinsong Zhang
2020Perception of Concatenative vs. Neural Text-To-Speech (TTS): Differences in Intelligibility in Noise and Language Attitudes.
Michelle Cohn, Georgia Zellou
2020Perception of English Fricatives and Affricates by Advanced Chinese Learners of English.
Yizhou Lan
2020Perception of Japanese Consonant Length by Native Speakers of Korean Differing in Japanese Learning Experience.
Kimiko Tsukada, Joo-Yeon Kim, Jeong-Im Han
2020Perception of Privacy Measured in the Crowd - Paired Comparison on the Effect of Background Noises.
Anna Leschanowsky, Sneha Das, Tom Bäckström, Pablo Pérez Zarazaga
2020Phase Based Spectro-Temporal Features for Building a Robust ASR System.
Anirban Dutta, Ashishkumar Prabhakar Gudmalwar, Ch. V. Rama Rao
2020Phase-Aware Music Super-Resolution Using Generative Adversarial Networks.
Shichao Hu, Bin Zhang, Beici Liang, Ethan Zhao, Simon Lui
2020Phoneme-to-Grapheme Conversion Based Large-Scale Pre-Training for End-to-End Automatic Speech Recognition.
Ryo Masumura, Naoki Makishima, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Shota Orihashi
2020Phonetic Accommodation of L2 German Speakers to the Virtual Language Learning Tutor Mirabella.
Iona Gessinger, Bernd Möbius, Bistra Andreeva, Eran Raveh, Ingmar Steiner
2020Phonetic Entrainment in Cooperative Dialogues: A Case of Russian.
Alla Menshikova, Daniil Kocharov, Tatiana Kachkovskaia
2020Phonetic, Frame Clustering and Intelligibility Analyses for the INTERSPEECH 2020 ComParE Challenge.
Claude Montacié, Marie-José Caraty
2020Phonetically-Aware Coupled Network For Short Duration Text-Independent Speaker Verification.
Siqi Zheng, Yun Lei, Hongbin Suo
2020Phonological Features for 0-Shot Multilingual Speech Synthesis.
Marlene Staib, Tian Huey Teh, Alexandra Torresquintero, Devang S. Ram Mohan, Lorenzo Foglianti, Raphael Lenain, Jiameng Gao
2020Pitch Declination and Final Lowering in Northeastern Mandarin.
Ping Cui, Jianjing Kuang
2020PoCoNet: Better Speech Enhancement with Frequency-Positional Embeddings, Semi-Supervised Conversational Data, and Biased Loss.
Umut Isik, Ritwik Giri, Neerad Phansalkar, Jean-Marc Valin, Karim Helwani, Arvindh Krishnaswamy
2020Poetic Meter Classification Using i-Vector-MTF Fusion.
Rajeev Rajan, Aiswarya Vinod Kumar, Ben P. Babu
2020Polishing the Classical Likelihood Ratio Test by Supervised Learning for Voice Activity Detection.
Tianjiao Xu, Hui Zhang, Xueliang Zhang
2020Predicting Collaborative Task Performance Using Graph Interlocutor Acoustic Network in Small Group Interaction.
Shun-Chang Zhong, Bo-Hao Su, Wei Huang, Yi-Ching Liu, Chi-Chun Lee
2020Predicting Detection Filters for Small Footprint Open-Vocabulary Keyword Spotting.
Théodore Bluche, Thibault Gisselbrecht
2020Predicting Intelligibility of Enhanced Speech Using Posteriors Derived from DNN-Based ASR System.
Kenichi Arai, Shoko Araki, Atsunori Ogawa, Keisuke Kinoshita, Tomohiro Nakatani, Toshio Irino
2020Prediction of Head Motion from Speech Waveforms with a Canonical-Correlation-Constrained Autoencoder.
JinHong Lu, Hiroshi Shimodaira
2020Prediction of Sleepiness Ratings from Voice by Man and Machine.
Mark A. Huckvale, András Beke, Mirei Ikushima
2020Pretrained Semantic Speech Embeddings for End-to-End Spoken Language Understanding via Cross-Modal Teacher-Student Learning.
Pavel Denisov, Ngoc Thang Vu
2020Principal Style Components: Expressive Style Control and Cross-Speaker Transfer in Neural TTS.
Alexander Sorin, Slava Shechtman, Ron Hoory
2020Privacy Guarantees for De-Identifying Text Transformations.
David Ifeoluwa Adelani, Ali Davody, Thomas Kleinbauer, Dietrich Klakow
2020Processes and Consequences of Co-Articulation in Mandarin V
Mingqiong Luo
2020Pronunciation Erroneous Tendency Detection with Language Adversarial Represent Learning.
Longfei Yang, Kaiqi Fu, Jinsong Zhang, Takahiro Shinozaki
2020Prosodic Characteristics of Genuine and Mock (Im)polite Mandarin Utterances.
Chengwei Xu, Wentao Gu
2020Prosody Learning Mechanism for Speech Synthesis System Without Text Length Limit.
Zhen Zeng, Jianzong Wang, Ning Cheng, Jing Xiao
2020Prosody and Breathing: A Comparison Between Rhetorical and Information-Seeking Questions in German and Brazilian Portuguese.
Jana Neitsch, Plínio A. Barbosa, Oliver Niebuhr
2020Prototypical Q Networks for Automatic Conversational Diagnosis and Few-Shot New Disease Adaption.
Hongyin Luo, Shang-wen Li, James R. Glass
2020Punctuation Prediction in Spontaneous Conversations: Can We Mitigate ASR Errors with Retrofitted Word Embeddings?
Lukasz Augustyniak, Piotr Szymanski, Mikolaj Morzy, Piotr Zelasko, Adrian Szymczak, Jan Mizgajski, Yishay Carmiel, Najim Dehak
2020PyChain: A Fully Parallelized PyTorch Implementation of LF-MMI for End-to-End ASR.
Yiwen Shao, Yiming Wang, Daniel Povey, Sanjeev Khudanpur
2020Quantification of Transducer Misalignment in Ultrasound Tongue Imaging.
Tamás Gábor Csapó, Kele Xu
2020Quantization Aware Training with Absolute-Cosine Regularization for Automatic Speech Recognition.
Hieu Duy Nguyen, Anastasios Alexandridis, Athanasios Mouchtaris
2020Quasi-Periodic Parallel WaveGAN Vocoder: A Non-Autoregressive Pitch-Dependent Dilated Convolution Model for Parametric Speech Generation.
Yi-Chiao Wu, Tomoki Hayashi, Takuma Okamoto, Hisashi Kawai, Tomoki Toda
2020Quaternion Neural Networks for Multi-Channel Distant Speech Recognition.
Xinchi Qiu, Titouan Parcollet, Mirco Ravanelli, Nicholas D. Lane, Mohamed Morchid
2020RECOApy: Data Recording, Pre-Processing and Phonetic Transcription for End-to-End Speech-Based Applications.
Adriana Stan
2020Rapid Enhancement of NLP Systems by Acquisition of Data in Correlated Domains.
Tejas Udayakumar, Kinnera Saranu, Mayuresh Sanjay Oak, Ajit Ashok Saunshikhar, Sandip Shriram Bapat
2020Rapid RNN-T Adaptation Using Personalized Speech Synthesis and Neural Language Generator.
Yan Huang, Jinyu Li, Lei He, Wenning Wei, William Gale, Yifan Gong
2020Raw Sign and Magnitude Spectra for Multi-Head Acoustic Modelling.
Erfan Loweimi, Peter Bell, Steve Renals
2020Raw Speech Waveform Based Classification of Patients with ALS, Parkinson's Disease and Healthy Controls Using CNN-BLSTM.
Jhansi Mallela, Aravind Illa, Yamini Belur, Atchayaram Nalini, Ravi Yadav, Pradeep Reddy, Dipanjan Gope, Prasanta Kumar Ghosh
2020Re-Weighted Interval Loss for Handling Data Imbalance Problem of End-to-End Keyword Spotting.
Kun Zhang, Zhiyong Wu, Daode Yuan, Jian Luan, Jia Jia, Helen Meng, Binheng Song
2020Real Time Speech Enhancement in the Waveform Domain.
Alexandre Défossez, Gabriel Synnaeve, Yossi Adi
2020Real-Time Single-Channel Deep Neural Network-Based Speech Enhancement on Edge Devices.
Nikhil Shankar, Gautam Shreedhar Bhat, Issa M. S. Panahi
2020Real-Time, Full-Band, Online DNN-Based Voice Conversion System Using a Single CPU.
Takaaki Saeki, Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari
2020Recognising Emotions in Dysarthric Speech Using Typical Speech Data.
Lubna Alhinti, Stuart P. Cunningham, Heidi Christensen
2020Recognition-Synthesis Based Non-Parallel Voice Conversion with Adversarial Learning.
Jing-Xuan Zhang, Zhen-Hua Ling, Li-Rong Dai
2020Recognize Mispronunciations to Improve Non-Native Acoustic Modeling Through a Phone Decoder Built from One Edit Distance Finite State Automaton.
Wei Chu, Yang Liu, Jianwei Zhou
2020Reconciliation of Multiple Corpora for Speech Emotion Recognition by Multiple Classifiers with an Adversarial Corpus Discriminator.
Zhi Zhu, Yoshinao Sato
2020Reformer-TTS: Neural Speech Synthesis with Reformer Network.
Hyeong Rae Ihm, Joun Yeop Lee, Byoung Jin Choi, Sung Jun Cheon, Nam Soo Kim
2020Regional Resonance of the Lower Vocal Tract and its Contribution to Speaker Characteristics.
Lin Zhang, Kiyoshi Honda, Jianguo Wei, Seiji Adachi
2020Relational Teacher Student Learning with Neural Label Embedding for Device Adaptation in Acoustic Scene Classification.
Hu Hu, Sabato Marco Siniscalchi, Yannan Wang, Chin-Hui Lee
2020Relative Positional Encoding for Speech Recognition and Direct Translation.
Ngoc-Quan Pham, Thanh-Le Ha, Tuan-Nam Nguyen, Thai-Son Nguyen, Elizabeth Salesky, Sebastian Stüker, Jan Niehues, Alex Waibel
2020Releasing a Toolkit and Comparing the Performance of Language Embeddings Across Various Spoken Language Identification Datasets.
Matias Lindgren, Tommi Jauhiainen, Mikko Kurimo
2020Removing Bias with Residual Mixture of Multi-View Attention for Speech Emotion Recognition.
Md Asif Jalal, Rosanna Milner, Thomas Hain, Roger K. Moore
2020Representation Based Meta-Learning for Few-Shot Spoken Intent Recognition.
Ashish R. Mittal, Samarth Bharadwaj, Shreya Khare, Saneem A. Chemmengath, Karthik Sankaranarayanan, Brian Kingsbury
2020Rescore in a Flash: Compact, Cache Efficient Hashing Data Structures for n-Gram Language Models.
Grant P. Strimel, Ariya Rastrow, Gautam Tiwari, Adrien Piérard, Jon Webb
2020Resource-Adaptive Deep Learning for Visual Speech Recognition.
Alexandros Koumparoulis, Gerasimos Potamianos, Samuel Thomas, Edmilson Da Silva Morais
2020Reverberation Modeling for Source-Filter-Based Neural Vocoder.
Yang Ai, Xin Wang, Junichi Yamagishi, Zhen-Hua Ling
2020Rhythmic Convergence in Canadian French Varieties?
Svetlana Kaminskaïa
2020Risk Forecasting from Earnings Calls Acoustics and Network Correlations.
Ramit Sawhney, Arshiya Aggarwal, Piyush Khanna, Puneet Mathur, Taru Jain, Rajiv Ratn Shah
2020Robust Beam Search for Encoder-Decoder Attention Based Speech Recognition Without Length Bias.
Wei Zhou, Ralf Schlüter, Hermann Ney
2020Robust Pitch Regression with Voiced/Unvoiced Classification in Nonstationary Noise Environments.
Dung N. Tran, Uros Batricevic, Kazuhito Koishida
2020Robust Raw Waveform Speech Recognition Using Relevance Weighted Representations.
Purvi Agrawal, Sriram Ganapathy
2020Robust Text-Dependent Speaker Verification via Character-Level Information Preservation for the SdSV Challenge 2020.
Sung Hwan Mun, Woo Hyun Kang, Min Hyun Han, Nam Soo Kim
2020S2IGAN: Speech-to-Image Generation via Adversarial Learning.
Xinsheng Wang, Tingting Qiao, Jihua Zhu, Alan Hanjalic, Odette Scharenborg
2020SAN-M: Memory Equipped Self-Attention for End-to-End Speech Recognition.
Zhifu Gao, Shiliang Zhang, Ming Lei, Ian McLoughlin
2020SCADA: Stochastic, Consistent and Adversarial Data Augmentation to Improve ASR.
Gary Wang, Andrew Rosenberg, Zhehuai Chen, Yu Zhang, Bhuvana Ramabhadran, Pedro J. Moreno
2020SEANet: A Multi-Modal Speech Enhancement Network.
Marco Tagliasacchi, Yunpeng Li, Karolis Misiunas, Dominik Roblek
2020SERIL: Noise Adaptive Speech Enhancement Using Regularization-Based Incremental Learning.
Chi-Chang Lee, Yu-Chen Lin, Hsuan-Tien Lin, Hsin-Min Wang, Yu Tsao
2020STC-Innovation Speaker Recognition Systems for Far-Field Speaker Verification Challenge 2020.
Aleksei Gusev, Vladimir Volokhov, Alisa Vinogradova, Tseren Andzhukaev, Andrey Shulipa, Sergey Novoselov, Timur Pekhovsky, Alexander Kozlov
2020Scaling Processes of Clause Chains in Pitjantjatjara.
Rebecca Defina, Catalina Torres, Hywel Stoakes
2020Scaling Up Online Speech Recognition Using ConvNets.
Vineel Pratap, Qiantong Xu, Jacob Kahn, Gilad Avidov, Tatiana Likhomanenko, Awni Y. Hannun, Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert
2020SdSV Challenge 2020: Large-Scale Evaluation of Short-Duration Speaker Verification.
Hossein Zeinali, Kong Aik Lee, Jahangir Alam, Lukás Burget
2020Secondary Phonetic Cues in the Production of the Nasal Short-a System in California English.
Georgia Zellou, Rebecca Scarborough, Renee Kemp
2020Seeing Voices and Hearing Voices: Learning Discriminative Embeddings Using Cross-Modal Self-Supervision.
Soo-Whan Chung, Hong-Goo Kang, Joon Son Chung
2020Segment Aggregation for Short Utterances Speaker Verification Using Raw Waveforms.
Seung-Bin Kim, Jee-weon Jung, Hye-jin Shim, Ju-ho Kim, Ha-Jin Yu
2020Segment-Level Effects of Gender, Nationality and Emotion Information on Text-Independent Speaker Verification.
Kai Li, Masato Akagi, Yibo Wu, Jianwu Dang
2020Self-Attention Encoding and Pooling for Speaker Recognition.
Pooyan Safari, Miquel India, Javier Hernando
2020Self-Attentive Similarity Measurement Strategies in Speaker Diarization.
Qingjian Lin, Yu Hou, Ming Li
2020Self-Distillation for Improving CTC-Transformer-Based ASR Systems.
Takafumi Moriya, Tsubasa Ochiai, Shigeki Karita, Hiroshi Sato, Tomohiro Tanaka, Takanori Ashihara, Ryo Masumura, Yusuke Shinohara, Marc Delcroix
2020Self-Expressing Autoencoders for Unsupervised Spoken Term Discovery.
Saurabhchand Bhati, Jesús Villalba, Piotr Zelasko, Najim Dehak
2020Self-Supervised Adversarial Multi-Task Learning for Vocoder-Based Monaural Speech Enhancement.
Zhihao Du, Ming Lei, Jiqing Han, Shiliang Zhang
2020Self-Supervised Contrastive Learning for Unsupervised Phoneme Segmentation.
Felix Kreuk, Joseph Keshet, Yossi Adi
2020Self-Supervised Pre-Training with Acoustic Configurations for Replay Spoofing Detection.
Hye-jin Shim, Hee-Soo Heo, Jee-weon Jung, Ha-Jin Yu
2020Self-Supervised Representations Improve End-to-End Speech Translation.
Anne Wu, Changhan Wang, Juan Miguel Pino, Jiatao Gu
2020Self-Supervised Spoofing Audio Detection Scheme.
Ziyue Jiang, Hongcheng Zhu, Li Peng, Wenbing Ding, Yanzhen Ren
2020Self-Training for End-to-End Speech Translation.
Juan Miguel Pino, Qiantong Xu, Xutai Ma, Mohammad Javad Dousti, Yun Tang
2020Self-and-Mixed Attention Decoder with Deep Acoustic Structure for Transformer-Based LVCSR.
Xinyuan Zhou, Grandee Lee, Emre Yilmaz, Yanhua Long, Jiaen Liang, Haizhou Li
2020Semantic Complexity in End-to-End Spoken Language Understanding.
Joseph P. McKenna, Samridhi Choudhary, Michael Saxon, Grant P. Strimel, Athanasios Mouchtaris
2020Semantic Mask for Transformer Based End-to-End Speech Recognition.
Chengyi Wang, Yu Wu, Yujiao Du, Jinyu Li, Shujie Liu, Liang Lu, Shuo Ren, Guoli Ye, Sheng Zhao, Ming Zhou
2020Semi-Supervised ASR by End-to-End Self-Training.
Yang Chen, Weiran Wang, Chao Wang
2020Semi-Supervised End-to-End ASR via Teacher-Student Learning with Conditional Posterior Distribution.
Zi-qiang Zhang, Yan Song, Jianshu Zhang, Ian McLoughlin, Li-Rong Dai
2020Semi-Supervised Learning for Character Expression of Spoken Dialogue Systems.
Kenta Yamamoto, Koji Inoue, Tatsuya Kawahara
2020Semi-Supervised Learning for Multi-Speaker Text-to-Speech Synthesis Using Discrete Speech Representation.
Tao Tu, Yuan-Jui Chen, Alexander H. Liu, Hung-yi Lee
2020Semi-Supervised Learning with Data Augmentation for End-to-End ASR.
Felix Weninger, Franco Mana, Roberto Gemello, Jesús Andrés-Ferrer, Puming Zhan
2020Semi-Supervised Self-Produced Speech Enhancement and Suppression Based on Joint Source Modeling of Air- and Body-Conducted Signals Using Variational Autoencoder.
Shogo Seki, Moe Takada, Tomoki Toda
2020Sentence Level Estimation of Psycholinguistic Norms Using Joint Multidimensional Annotations.
Anil Ramakrishna, Shrikanth Narayanan
2020Separating Varying Numbers of Sources with Auxiliary Autoencoding Loss.
Yi Luo, Nima Mesgarani
2020Sequence-Level Self-Learning with Multiple Hypotheses.
Ken'ichi Kumatani, Dimitrios Dimitriadis, Yashesh Gaur, Robert Gmyr, Sefik Emre Eskimez, Jinyu Li, Michael Zeng
2020Sequence-to-Sequence Articulatory Inversion Through Time Convolution of Sub-Band Frequency Signals.
Abdolreza Sabzi Shahrebabaki, Sabato Marco Siniscalchi, Giampiero Salvi, Torbjørn Svendsen
2020Serialized Output Training for End-to-End Overlapped Speech Recognition.
Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Takuya Yoshioka
2020Shadowability Annotation with Fine Granularity on L2 Utterances and its Improvement with Native Listeners' Script-Shadowing.
Zhenchao Lin, Ryo Takashima, Daisuke Saito, Nobuaki Minematsu, Noriko Nakanishi
2020Should we Hard-Code the Recurrence Concept or Learn it Instead ? Exploring the Transformer Architecture for Audio-Visual Speech Recognition.
George Sterpu, Christian Saam, Naomi Harte
2020Shouted Speech Compensation for Speaker Verification Robust to Vocal Effort Conditions.
Santi Prieto, Alfonso Ortega Giménez, Iván López-Espejo, Eduardo Lleida
2020Siamese Convolutional Neural Network Using Gaussian Probability Feature for Spoofing Speech Detection.
Zhenchun Lei, Yingen Yang, Changhong Liu, Jihua Ye
2020Siamese X-Vector Reconstruction for Domain Adapted Speaker Recognition.
Shai Rozenberg, Hagai Aronowitz, Ron Hoory
2020Similarity-and-Independence-Aware Beamformer: Method for Target Source Extraction Using Magnitude Spectrogram as Reference.
Atsuo Hiroe
2020Simulating Realistically-Spatialised Simultaneous Speech Using Video-Driven Speaker Detection and the CHiME-5 Dataset.
Jack Deadman, Jon Barker
2020Simultaneous Conversion of Speaker Identity and Emotion Based on Multiple-Domain Adaptive RBM.
Takuya Kishida, Shin Tsukamoto, Toru Nakashika
2020Singing Synthesis: With a Little Help from my Attention.
Orazio Angelini, Alexis Moinet, Kayoko Yanagisawa, Thomas Drugman
2020Singing Voice Extraction with Attention-Based Spectrograms Fusion.
Hao Shi, Longbiao Wang, Sheng Li, Chenchen Ding, Meng Ge, Nan Li, Jianwu Dang, Hiroshi Seki
2020Single Headed Attention Based Sequence-to-Sequence Model for State-of-the-Art Results on Switchboard.
Zoltán Tüske, George Saon, Kartik Audhkhasi, Brian Kingsbury
2020Single-Channel Blind Direct-to-Reverberation Ratio Estimation Using Masking.
Wolfgang Mack, Shuwen Deng, Emanuël A. P. Habets
2020Single-Channel Speech Enhancement by Subspace Affinity Minimization.
Dung N. Tran, Kazuhito Koishida
2020SkipConvNet: Skip Convolutional Neural Network for Speech Dereverberation Using Optimally Smoothed Spectral Mapping.
Vinay Kothapally, Wei Xia, Shahram Ghorbani, John H. L. Hansen, Wei Xue, Jing Huang
2020Small-Footprint Keyword Spotting with Multi-Scale Temporal Convolution.
Ximin Li, Xiaodong Wei, Xiaowei Qin
2020Smart Tube: A Biofeedback System for Vocal Training and Therapy Through Tube Phonation.
Naoko Kawamura, Tatsuya Kitamura, Kenta Hamada
2020SoapBox Labs Fluency Assessment Platform for Child Speech.
Amelia C. Kelly, Eleni Karamichali, Armin Saeb, Karel Veselý, Nicholas Parslow, Gloria Montoya Gomez, Agape Deng, Arnaud Letondor, Niall Mullally, Adrian Hempel, Robert O'Regan, Qiru Zhou
2020Soapbox Labs Verification Platform for Child Speech.
Amelia C. Kelly, Eleni Karamichali, Armin Saeb, Karel Veselý, Nicholas Parslow, Agape Deng, Arnaud Letondor, Robert O'Regan, Qiru Zhou
2020Social and Functional Pressures in Vocal Alignment: Differences for Human and Voice-AI Interlocutors.
Georgia Zellou, Michelle Cohn
2020Sound Event Localization and Detection Based on Multiple DOA Beamforming and Multi-Task Learning.
Wei Xue, Ying Tong, Chao Zhang, Guohong Ding, Xiaodong He, Bowen Zhou
2020Sound-Image Grounding Based Focusing Mechanism for Efficient Automatic Spoken Language Acquisition.
Mingxin Zhang, Tomohiro Tanaka, Wenxin Hou, Shengzhou Gao, Takahiro Shinozaki
2020SpEx+: A Complete Time Domain Speaker Extraction Network.
Meng Ge, Chenglin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang, Haizhou Li
2020Sparse Mixture of Local Experts for Efficient Speech Enhancement.
Aswin Sivaraman, Minje Kim
2020Sparseness-Aware DOA Estimation with Majorization Minimization.
Masahito Togami, Robin Scheibler
2020Spatial Covariance Matrix Estimation for Reverberant Speech with Application to Speech Enhancement.
Ran Weisman, Vladimir Tourbabin, Paul Calamia, Boaz Rafaely
2020Spatial Resolution of Early Reflection for Speech and White Noise.
Xiaoli Zhong, Hao Song, Xuejie Liu
2020Speaker Adaptive Training for Speech Recognition Based on Attention-Over-Attention Mechanism.
Genshun Wan, Jia Pan, Qingran Wang, Jianqing Gao, Zhongfu Ye
2020Speaker Attribution with Voice Profiles by Graph-Based Semi-Supervised Learning.
Jixuan Wang, Xiong Xiao, Jian Wu, Ranjani Ramamurthy, Frank Rudzicz, Michael Brudno
2020Speaker Code Based Speaker Adaptive Training Using Model Agnostic Meta-Learning.
Huaxin Wu, Genshun Wan, Jia Pan
2020Speaker Conditional WaveRNN: Towards Universal Neural Vocoder for Unseen Speaker and Recording Conditions.
Dipjyoti Paul, Yannis Pantazis, Yannis Stylianou
2020Speaker Conditioned Acoustic-to-Articulatory Inversion Using x-Vectors.
Aravind Illa, Prasanta Kumar Ghosh
2020Speaker Dependent Acoustic-to-Articulatory Inversion Using Real-Time MRI of the Vocal Tract.
Tamás Gábor Csapó
2020Speaker Dependent Articulatory-to-Acoustic Mapping Using Real-Time MRI of the Vocal Tract.
Tamás Gábor Csapó
2020Speaker Diarization System Based on DPCA Algorithm for Fearless Steps Challenge Phase-2.
Xueshuai Zhang, Wenchao Wang, Pengyuan Zhang
2020Speaker Discrimination in Humans and Machines: Effects of Speaking Style Variability.
Amber Afshan, Jody Kreiman, Abeer Alwan
2020Speaker Identification for Household Scenarios with Self-Attention and Adversarial Training.
Ruirui Li, Jyun-Yu Jiang, Xian Wu, Chu-Cheng Hsieh, Andreas Stolcke
2020Speaker Re-Identification with Speaker Dependent Speech Enhancement.
Yanpei Shi, Qiang Huang, Thomas Hain
2020Speaker Representation Learning Using Global Context Guided Channel and Time-Frequency Transformations.
Wei Xia, John H. L. Hansen
2020Speaker and Phoneme-Aware Speech Bandwidth Extension with Residual Dual-Path Network.
Nana Hou, Chenglin Xu, Van Tung Pham, Joey Tianyi Zhou, Eng Siong Chng, Haizhou Li
2020Speaker-Aware Linear Discriminant Analysis in Speaker Verification.
Naijun Zheng, Xixin Wu, Jinghua Zhong, Xunying Liu, Helen Meng
2020Speaker-Aware Monaural Speech Separation.
Jiahao Xu, Kun Hu, Chang Xu, Tran Duc Chung, Zhiyong Wang
2020Speaker-Conditional Chain Model for Speech Separation and Extraction.
Jing Shi, Jiaming Xu, Yusuke Fujita, Shinji Watanabe, Bo Xu
2020Speaker-Independent Mel-Cepstrum Estimation from Articulator Movements Using D-Vector Input.
Kouichi Katsurada, Korin Richmond
2020Speaker-Utterance Dual Attention for Speaker and Utterance Verification.
Tianchi Liu, Rohan Kumar Das, Maulik C. Madhavi, Shengmei Shen, Haizhou Li
2020Speaking Speed Control of End-to-End Speech Synthesis Using Sentence-Level Conditioning.
Jae-Sung Bae, Hanbin Bae, Young-Sun Joo, Junmo Lee, Gyeong-Hoon Lee, Hoon-Young Cho
2020SpecMark: A Spectral Watermarking Framework for IP Protection of Speech Recognition Systems.
Huili Chen, Bita Darvish Rouhani, Farinaz Koushanfar
2020SpecSwap: A Simple Data Augmentation Method for End-to-End Speech Recognition.
Xingcheng Song, Zhiyong Wu, Yiheng Huang, Dan Su, Helen Meng
2020Spectral Moment and Duration of Burst of Plosives in Speech of Children with Hearing Impairment and Typically Developing Children - A Comparative Study.
Ajish K. Abraham, M. Pushpavathi, N. Sreedevi, A. Navya, Vikram C. Mathad, S. R. Mahadeva Prasanna
2020Spectrum Correction: Acoustic Scene Classification with Mismatched Recording Devices.
Michal Kosmider
2020Speech Clarity Improvement by Vocal Self-Training Using a Hearing Impairment Simulator and its Correlation with an Auditory Modulation Index.
Toshio Irino, Soichi Higashiyama, Hanako Yoshigi
2020Speech Driven Talking Head Generation via Attentional Landmarks Based Representation.
Wentao Wang, Yan Wang, Jianqing Sun, Qingsong Liu, Jiaen Liang, Teng Li
2020Speech Emotion Recognition 'in the Wild' Using an Autoencoder.
Vipula Dissanayake, Haimo Zhang, Mark Billinghurst, Suranga Nanayakkara
2020Speech Emotion Recognition with Discriminative Feature Learning.
Huan Zhou, Kai Liu
2020Speech Enhancement Based on Beamforming and Post-Filtering by Combining Phase Information.
Rui Cheng, Changchun Bao
2020Speech Enhancement with Stochastic Temporal Convolutional Networks.
Julius Richter, Guillaume Carbajal, Timo Gerkmann
2020Speech Pseudonymisation Assessment Using Voice Similarity Matrices.
Paul-Gauthier Noé, Jean-François Bonastre, Driss Matrouf, Natalia A. Tomashenko, Andreas Nautsch, Nicholas W. D. Evans
2020Speech Rate Task-Specific Representation Learning from Acoustic-Articulatory Data.
Renuka Mannem, Hima Jyothi R., Aravind Illa, Prasanta Kumar Ghosh
2020Speech Recognition and Multi-Speaker Diarization of Long Conversations.
Huanru Henry Mao, Shuyang Li, Julian J. McAuley, Garrison W. Cottrell
2020Speech Representation Learning for Emotion Recognition Using End-to-End ASR with Factorized Adaptation.
Sung-Lin Yeh, Yun-Shao Lin, Chi-Chun Lee
2020Speech Sentiment and Customer Satisfaction Estimation in Socialbot Conversations.
Yelin Kim, Joshua Levy, Yang Liu
2020Speech Separation Based on Multi-Stage Elaborated Dual-Path Deep BiLSTM with Auxiliary Identity Loss.
Ziqiang Shi, Rujie Liu, Jiqing Han
2020Speech Spectrogram Estimation from Intracranial Brain Activity Using a Quantization Approach.
Miguel Angrick, Christian Herff, Garett D. Johnson, Jerry J. Shih, Dean J. Krusienski, Tanja Schultz
2020Speech Transformer with Speaker Aware Persistent Memory.
Yingzhu Zhao, Chongjia Ni, Cheung-Chi Leung, Shafiq R. Joty, Eng Siong Chng, Bin Ma
2020Speech to Semantics: Improve ASR and NLU Jointly via All-Neural Interfaces.
Milind Rao, Anirudh Raju, Pranav Dheram, Bach Bui, Ariya Rastrow
2020Speech to Text Adaptation: Towards an Efficient Cross-Modal Distillation.
Won-Ik Cho, Donghyun Kwak, Ji Won Yoon, Nam Soo Kim
2020Speech-Image Semantic Alignment Does Not Depend on Any Prior Classification Tasks.
Masood S. Mortazavi
2020Speech-XLNet: Unsupervised Acoustic Model Pretraining for Self-Attention Networks.
Xingchen Song, Guangsen Wang, Yiheng Huang, Zhiyong Wu, Dan Su, Helen Meng
2020Speech-to-Singing Conversion Based on Boundary Equilibrium GAN.
Da-Yi Wu, Yi-Hsuan Yang
2020SpeechBERT: An Audio-and-Text Jointly Learned Language Model for End-to-End Spoken Question Answering.
Yung-Sung Chuang, Chi-Liang Liu, Hung-yi Lee, Lin-Shan Lee
2020SpeechMix - Augmenting Deep Sound Recognition Using Hidden Space Interpolations.
Amit Jindal, Narayanan Elavathur Ranganatha, Aniket Didolkar, Arijit Ghosh Chowdhury, Di Jin, Ramit Sawhney, Rajiv Ratn Shah
2020SpeedySpeech: Efficient Neural Speech Synthesis.
Jan Vainer, Ondrej Dusek
2020Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition.
Zhengkun Tian, Jiangyan Yi, Jianhua Tao, Ye Bai, Shuai Zhang, Zhengqi Wen
2020Spoken Content and Voice Factorization for Few-Shot Speaker Adaptation.
Tao Wang, Jianhua Tao, Ruibo Fu, Jiangyan Yi, Zhengqi Wen, Rongxiu Zhong
2020Spoken Language 'Grammatical Error Correction'.
Yiting Lu, Mark J. F. Gales, Yu Wang
2020Spoofing Attack Detection Using the Non-Linear Fusion of Sub-Band Classifiers.
Hemlata Tak, Jose Patino, Andreas Nautsch, Nicholas W. D. Evans, Massimiliano Todisco
2020Spot the Conversation: Speaker Diarisation in the Wild.
Joon Son Chung, Jaesung Huh, Arsha Nagrani, Triantafyllos Afouras, Andrew Zisserman
2020Spotting the Traces of Depression in Read Speech: An Approach Based on Computational Paralinguistics and Social Signal Processing.
Fuxiang Tao, Anna Esposito, Alessandro Vinciarelli
2020Squeeze for Sneeze: Compact Neural Networks for Cold and Flu Recognition.
Merlin Albes, Zhao Ren, Björn W. Schuller, Nicholas Cummins
2020Stacked 1D Convolutional Networks for End-to-End Small Footprint Voice Trigger Detection.
Takuya Higuchi, Mohammad Ghasemzadeh, Kisun You, Chandra Dhir
2020Staged Knowledge Distillation for End-to-End Dysarthric Speech Recognition and Speech Attribute Transcription.
Yuqin Lin, Longbiao Wang, Sheng Li, Jianwu Dang, Chenchen Ding
2020State Sequence Pooling Training of Acoustic Models for Keyword Spotting.
Kuba Lopatka, Tobias Bocklet
2020Statistical Testing on ASR Performance via Blockwise Bootstrap.
Zhe Liu, Fuchun Peng
2020Statistical and Neural Network Based Speech Activity Detection in Non-Stationary Acoustic Environments.
Jens Heitkaemper, Joerg Schmalenstroeer, Reinhold Haeb-Umbach
2020StoRIR: Stochastic Room Impulse Response Generation for Audio Data Augmentation.
Piotr Masztalski, Mateusz Matuszewski, Karol Piaskowski, Michal Romaniuk
2020Stochastic Convolutional Recurrent Networks for Language Modeling.
Jen-Tzung Chien, Yu-Min Huang
2020Stochastic Curiosity Exploration for Dialogue Systems.
Jen-Tzung Chien, Po-Chien Hsu
2020Stochastic Talking Face Generation Using Latent Distribution Matching.
Ravindra Yadav, Ashish Sardana, Vinay P. Namboodiri, Rajesh M. Hegde
2020Strategies for End-to-End Text-Independent Speaker Verification.
Weiwei Lin, Man-Wai Mak, Jen-Tzung Chien
2020StrawNet: Self-Training WaveNet for TTS in Low-Data Regimes.
Manish Sharma, Tom Kenter, Rob Clark
2020Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition.
Shiliang Zhang, Zhifu Gao, Haoneng Luo, Ming Lei, Jie Gao, Zhijie Yan, Lei Xie
2020Streaming Keyword Spotting on Mobile Devices.
Oleg Rybakov, Natasha Kononenko, Niranjan Subrahmanya, Mirkó Visontai, Stella Laurenzo
2020Streaming On-Device End-to-End ASR System for Privacy-Sensitive Voice-Typing.
Abhinav Garg, Gowtham P. Vadisetti, Dhananjaya Gowda, Sichen Jin, Aditya Jayasimha, Youngho Han, Jiyeon Kim, Junmo Park, Kwangyoun Kim, SooYeon Kim, Young-Yoon Lee, Kyungbo Min, Chanwoo Kim
2020Streaming Transformer-Based Acoustic Models Using Self-Attention with Augmented Memory.
Chunyang Wu, Yongqiang Wang, Yangyang Shi, Ching-Feng Yeh, Frank Zhang
2020Style Attuned Pre-Training and Parameter Efficient Fine-Tuning for Spoken Language Understanding.
Jin Cao, Jun Wang, Wael Hamza, Kelly Vanee, Shang-wen Li
2020Style Variation as a Vantage Point for Code-Switching.
Khyathi Raghavi Chandu, Alan W. Black
2020Sub-Band Knowledge Distillation Framework for Speech Enhancement.
Xiang Hao, Shixue Wen, Xiangdong Su, Yun Liu, Guanglai Gao, Xiaofei Li
2020Subband Kalman Filtering with DNN Estimated Parameters for Speech Enhancement.
Hongjiang Yu, Wei-Ping Zhu, Benoît Champagne
2020Subjective Quality Evaluation of Speech Signals Transmitted via BPL-PLC Wired System.
Przemyslaw Falkowski-Gilski, Grzegorz Debita, Marcin Habrych, Bogdan Miedzinski, Przemyslaw Jedlikowski, Bartosz Polnik, Jan Wandzio, Xin Wang
2020Subword Regularization: An Analysis of Scalability and Generalization for End-to-End Automatic Speech Recognition.
Egor Lakomkin, Jahn Heymann, Ilya Sklyar, Simon Wiesler
2020Successes, Challenges and Opportunities for Speech Technology in Conversational Agents.
Shehzad Mevawalla
2020Sum-Product Networks for Robust Automatic Speaker Identification.
Aaron Nicolson, Kuldip K. Paliwal
2020Supervised Domain Adaptation for Text-Independent Speaker Verification Using Limited Data.
Seyyed Saeed Sarfjoo, Srikanth R. Madikeri, Petr Motlícek, Sébastien Marcel
2020Surfboard: Audio Feature Extraction for Modern Machine Learning.
Raphael Lenain, Jack Weston, Abhishek Shivkumar, Emil Fristed
2020Surgical Mask Detection with Convolutional Neural Networks and Data Augmentations on Spectrograms.
Steffen Illium, Robert Müller, Andreas Sedlmeier, Claudia Linnhoff-Popien
2020Surgical Mask Detection with Deep Recurrent Phonetic Models.
Philipp Klumpp, Tomás Arias-Vergara, Juan Camilo Vásquez-Correa, Paula Andrea Pérez-Toro, Florian Hönig, Elmar Nöth, Juan Rafael Orozco-Arroyave
2020THUEE System for NIST SRE19 CTS Challenge.
Ruyun Li, Tianyu Liang, Dandan Song, Yi Liu, Yangcheng Wu, Can Xu, Peng Ouyang, Xianwei Zhang, Xianhong Chen, Weiqiang Zhang, Shouyi Yin, Liang He
2020TMT: A Transformer-Based Modal Translator for Improving Multimodal Sequence Representations in Audio Visual Scene-Aware Dialog.
Wubo Li, Dongwei Jiang, Wei Zou, Xiangang Li
2020TTS Skins: Speaker Conversion via ASR.
Adam Polyak, Lior Wolf, Yaniv Taigman
2020Tackling the ADReSS Challenge: A Multimodal Approach to the Automated Recognition of Alzheimer's Dementia.
Matej Martinc, Senja Pollak
2020Target-Speaker Voice Activity Detection: A Novel Approach for Multi-Speaker Diarization in a Dinner Party Scenario.
Ivan Medennikov, Maxim Korenevsky, Tatiana Prisyach, Yuri Y. Khokhlov, Mariya Korenevskaya, Ivan Sorokin, Tatiana Timofeeva, Anton Mitrofanov, Andrei Andrusenko, Ivan Podluzhny, Aleksandr Laptev, Aleksei Romanenko
2020Targeted Content Feedback in Spoken Language Learning and Assessment.
Xinhao Wang, Klaus Zechner, Christopher Hamill
2020Task-Oriented Dialog Generation with Enhanced Entity Representation.
Zhenhao He, Jiachun Wang, Jian Chen
2020Temporal Attention Convolutional Network for Speech Emotion Recognition with Latent Representation.
Jiaxing Liu, Zhilei Liu, Longbiao Wang, Yuan Gao, Lili Guo, Jianwu Dang
2020Testing the Limits of Representation Mixing for Pronunciation Correction in End-to-End Speech Synthesis.
Jason Fong, Jason Taylor, Simon King
2020Text-Independent Speaker Verification with Dual Attention Network.
Jingyu Li, Tan Lee
2020That Sounds Familiar: An Analysis of Phonetic Representations Transfer Across Languages.
Piotr Zelasko, Laureano Moro-Velázquez, Mark Hasegawa-Johnson, Odette Scharenborg, Najim Dehak
2020The "Sound of Silence" in EEG - Cognitive Voice Activity Detection.
Rini A. Sharon, Hema A. Murthy
2020The Acoustic Realization of Mandarin Tones in Fast Speech.
Ping Tang, Shanpeng Li
2020The Attacker's Perspective on Automatic Speaker Verification: An Overview.
Rohan Kumar Das, Xiaohai Tian, Tomi Kinnunen, Haizhou Li
2020The DKU Speech Activity Detection and Speaker Identification Systems for Fearless Steps Challenge Phase-02.
Qingjian Lin, Tingle Li, Ming Li
2020The Different Enhancement Roles of Covarying Cues in Thai and Mandarin Tones.
Nari Rhee, Jianjing Kuang
2020The Effect of Input on the Production of English Tense and Lax Vowels by Chinese Learners: Evidence from an Elementary School in China.
Mengrou Li, Ying Chen, Jie Cui
2020The Effect of Language Dominance on the Selective Attention of Segments and Tones in Urdu-Cantonese Speakers.
Yi Liu, Jinghong Ning
2020The Effect of Language Proficiency on the Perception of Segmental Foreign Accent.
Rubén Pérez Ramón, María Luisa García Lecumberri, Martin Cooke
2020The INESC-ID Multi-Modal System for the ADReSS 2020 Challenge.
Anna Pompili, Thomas Rolland, Alberto Abad
2020The INTERSPEECH 2020 Computational Paralinguistics Challenge: Elderly Emotion, Breathing & Masks.
Björn W. Schuller, Anton Batliner, Christian Bergler, Eva-Maria Messner, Antonia F. de C. Hamilton, Shahin Amiriparian, Alice Baird, Georgios Rizos, Maximilian Schmitt, Lukas Stappen, Harald Baumeister, Alexis Deighton MacIntyre, Simone Hantke
2020The INTERSPEECH 2020 Deep Noise Suppression Challenge: Datasets, Subjective Testing Framework, and Challenge Results.
Chandan K. A. Reddy, Vishak Gopal, Ross Cutler, Ebrahim Beyrami, Roger Cheng, Harishchandra Dubey, Sergiy Matusevych, Robert Aichner, Ashkan Aazami, Sebastian Braun, Puneet Rana, Sriram Srinivasan, Johannes Gehrke
2020The INTERSPEECH 2020 Far-Field Speaker Verification Challenge.
Xiaoyi Qin, Ming Li, Hui Bu, Wei Rao, Rohan Kumar Das, Shrikanth Narayanan, Haizhou Li
2020The Implication of Sound Level on Spatial Selective Auditory Attention for Cochlear Implant Users: Behavioral and Electrophysiological Measurement.
Sara Akbarzadeh, Sungmin Lee, Chin-Tuan Tan
2020The Importance of Time-Frequency Averaging for Binaural Speaker Localization in Reverberant Environments.
Hanan Beit-On, Vladimir Tourbabin, Boaz Rafaely
2020The JD AI Speaker Verification System for the FFSVC 2020 Challenge.
Ying Tong, Wei Xue, Shanluo Huang, Fan Lu, Chao Zhang, Guohong Ding, Xiaodong He
2020The MSP-Conversation Corpus.
Luz Martinez-Lucas, Mohammed Abdelwahab, Carlos Busso
2020The Method of Random Directions Optimization for Stereo Audio Source Separation.
Oleg Golokolenko, Gerald Schuller
2020The NTNU System at the Interspeech 2020 Non-Native Children's Speech ASR Challenge.
Tien-Hong Lo, Fu-An Chao, Shi-Yan Weng, Berlin Chen
2020The Phonetic Bases of Vocal Expressed Emotion: Natural versus Acted.
Hira Dhamyal, Shahan Ali Memon, Bhiksha Raj, Rita Singh
2020The Phonology and Phonetics of Kaifeng Mandarin Vowels.
Lei Wang
2020The Privacy ZEBRA: Zero Evidence Biometric Recognition Assessment.
Andreas Nautsch, Jose Patino, Natalia A. Tomashenko, Junichi Yamagishi, Paul-Gauthier Noé, Jean-François Bonastre, Massimiliano Todisco, Nicholas W. D. Evans
2020The TalTech Systems for the Short-Duration Speaker Verification Challenge 2020.
Tanel Alumäe, Jörgen Valk
2020The XMUSPEECH System for Short-Duration Speaker Verification Challenge 2020.
Tao Jiang, Miao Zhao, Lin Li, Qingyang Hong
2020The XMUSPEECH System for the AP19-OLR Challenge.
Zheng Li, Miao Zhao, Jing Li, Yiming Zhi, Lin Li, Qingyang Hong
2020The Zero Resource Speech Challenge 2020: Discovering Discrete Subword and Word Units.
Ewan Dunbar, Julien Karadayi, Mathieu Bernard, Xuan-Nga Cao, Robin Algayres, Lucas Ondel, Laurent Besacier, Sakriani Sakti, Emmanuel Dupoux
2020The cognitive status of simple and complex models.
Janet B. Pierrehumbert
2020Time-Domain Target-Speaker Speech Separation with Waveform-Based Speaker Embedding.
Jianshu Zhao, Shengzhou Gao, Takahiro Shinozaki
2020TinyLSTMs: Efficient Neural Speech Enhancement for Hearing Aids.
Igor Fedorov, Marko Stamenovic, Carl Jensen, Li-Chia Yang, Ari Mandell, Yiming Gan, Matthew Mattina, Paul N. Whatmough
2020To BERT or not to BERT: Comparing Speech and Language-Based Approaches for Alzheimer's Disease Detection.
Aparna Balagopalan, Benjamin Eyre, Frank Rudzicz, Jekaterina Novikova
2020Tone Learning in Low-Resource Bilingual TTS.
Ruolan Liu, Xue Wen, Chunhui Lu, Xiao Chen
2020Tone Variations in Regionally Accented Mandarin.
Yanping Li, Catherine T. Best, Michael D. Tyler, Denis Burnham
2020Tongue and Lip Motion Patterns in Alaryngeal Speech.
Kristin J. Teplansky, Alan Wisler, Beiming Cao, Wendy Liang, Chad W. Whited, Ted Mau, Jun Wang
2020Toward Remote Patient Monitoring of Speech, Video, Cognitive and Respiratory Biomarkers Using Multimodal Dialog Technology.
Vikram Ramanarayanan, Oliver Roesler, Michael Neumann, David Pautler, Doug Habberstad, Andrew Cornish, Hardik Kothare, Vignesh Murali, Jackson Liscombe, Dirk Schnelle-Walka, Patrick L. Lange, David Suendermann-Oeft
2020Toward Silent Paralinguistics: Speech-to-EMG - Retrieving Articulatory Muscle Activity from Speech.
Catarina Botelho, Lorenz Diener, Dennis Küster, Kevin Scheck, Shahin Amiriparian, Björn W. Schuller, Tanja Schultz, Alberto Abad, Isabel Trancoso
2020Towards Automatic Assessment of Voice Disorders: A Clinical Approach.
Purva Barche, Krishna Gurugubelli, Anil Kumar Vuppala
2020Towards Context-Aware End-to-End Code-Switching Speech Recognition.
Zimeng Qiu, Yiyuan Li, Xinjian Li, Florian Metze, William M. Campbell
2020Towards Interpreting Deep Learning Models to Understand Loss of Speech Intelligibility in Speech Disorders - Step 1: CNN Model-Based Phone Classification.
Sondes Abderrazek, Corinne Fredouille, Alain Ghio, Muriel Lalain, Christine Meunier, Virginie Woisard
2020Towards Learning a Universal Non-Semantic Representation of Speech.
Joel Shor, Aren Jansen, Ronnie Maor, Oran Lang, Omry Tuval, Félix de Chaumont Quitry, Marco Tagliasacchi, Ira Shavitt, Dotan Emanuel, Yinnon Haviv
2020Towards Natural Bilingual and Code-Switched Speech Synthesis Based on Mix of Monolingual Recordings and Cross-Lingual Voice Conversion.
Shengkui Zhao, Trung Hieu Nguyen, Hao Wang, Bin Ma
2020Towards Silent Paralinguistics: Deriving Speaking Mode and Speaker ID from Electromyographic Signals.
Lorenz Diener, Shahin Amiriparian, Catarina Botelho, Kevin Scheck, Dennis Küster, Isabel Trancoso, Björn W. Schuller, Tanja Schultz
2020Towards Speech Robustness for Acoustic Scene Classification.
Shuo Liu, Andreas Triantafyllopoulos, Zhao Ren, Björn W. Schuller
2020Towards Universal Text-to-Speech.
Jingzhou Yang, Lei He
2020Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner Party Transcription.
Andrei Andrusenko, Aleksandr Laptev, Ivan Medennikov
2020Towards a Comprehensive Assessment of Speech Intelligibility for Pathological Speech.
Wei Xue, Viviana Mendoza Ramos, Wieke Harmsen, Catia Cucchiarini, R. W. N. M. van Hout, Helmer Strik
2020Towards an ASR Error Robust Spoken Language Understanding System.
Weitong Ruan, Yaroslav Nechaev, Luoxin Chen, Chengwei Su, Imre Kiss
2020Training Keyword Spotting Models on Non-IID Data with Federated Learning.
Andrew Hard, Kurt Partridge, Cameron Nguyen, Niranjan Subrahmanya, Aishanee Shah, Pai Zhu, Ignacio López-Moreno, Rajiv Mathews
2020Training Speaker Enrollment Models by Network Optimization.
Victoria Mingote, Antonio Miguel, Alfonso Ortega Giménez, Eduardo Lleida
2020Transfer Learning Approaches for Streaming End-to-End Speech Recognition System.
Vikas Joshi, Rui Zhao, Rupesh R. Mehta, Kshitiz Kumar, Jinyu Li
2020Transfer Learning for Improving Singing-Voice Detection in Polyphonic Instrumental Music.
Yuanbo Hou, Frank K. Soong, Jian Luan, Shengchen Li
2020Transfer Learning of Articulatory Information Through Phone Information.
Abdolreza Sabzi Shahrebabaki, Negar Olfati, Sabato Marco Siniscalchi, Giampiero Salvi, Torbjørn Svendsen
2020Transfer Learning of the Expressivity Using FLOW Metric Learning in Multispeaker Text-to-Speech Synthesis.
Ajinkya Kulkarni, Vincent Colotte, Denis Jouvet
2020Transferring Source Style in Non-Parallel Voice Conversion.
Songxiang Liu, Yuewen Cao, Shiyin Kang, Na Hu, Xunying Liu, Dan Su, Dong Yu, Helen Meng
2020Transformer VQ-VAE for Unsupervised Unit Discovery and Speech Synthesis: ZeroSpeech 2020 Challenge.
Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
2020Transformer with Bidirectional Decoder for Speech Recognition.
Xi Chen, Songyang Zhang, Dandan Song, Peng Ouyang, Shouyi Yin
2020Transformer-Based Long-Context End-to-End Speech Recognition.
Takaaki Hori, Niko Moritz, Chiori Hori, Jonathan Le Roux
2020Transliteration Based Data Augmentation for Training Multilingual ASR Acoustic Models in Low Resource Settings.
Samuel Thomas, Kartik Audhkhasi, Brian Kingsbury
2020Two Different Mechanisms of Movable Mandible for Vocal-Tract Model with Flexible Tongue.
Takayuki Arai
2020Two-Stage Polyphonic Sound Event Detection Based on Faster R-CNN-LSTM with Multi-Token Connectionist Temporal Classification.
In Young Park, Hong Kook Kim
2020U-Net Based Direct-Path Dominance Test for Robust Direction-of-Arrival Estimation.
Hao Wang, Kai Chen, Jing Lu
2020UNSW System Description for the Shared Task on Automatic Speech Recognition for Non-Native Children's Speech.
Mostafa Ali Shahin, Renée Lu, Julien Epps, Beena Ahmed
2020Ultrasound-Based Articulatory-to-Acoustic Mapping with WaveGlow Speech Synthesis.
Tamás Gábor Csapó, Csaba Zainkó, László Tóth, Gábor Gosztolya, Alexandra Markó
2020Uncertainty-Aware Machine Support for Paper Reviewing on the Interspeech 2019 Submission Corpus.
Lukas Stappen, Georgios Rizos, Madina Hasan, Thomas Hain, Björn W. Schuller
2020UncommonVoice: A Crowdsourced Dataset of Dysphonic Speech.
Meredith Moore, Piyush Papreja, Michael Saxon, Visar Berisha, Sethuraman Panchanathan
2020Unconditional Audio Generation with Generative Adversarial Networks and Cycle Regularization.
Jen-Yu Liu, Yu-Hua Chen, Yin-Cheng Yeh, Yi-Hsuan Yang
2020Understanding Racial Disparities in Automatic Speech Recognition: The Case of Habitual "be".
Joshua L. Martin, Kevin Tang
2020Understanding Self-Attention of Self-Supervised Audio Transformers.
Shu-Wen Yang, Andy T. Liu, Hung-yi Lee
2020Understanding the Effect of Voice Quality and Accent on Talker Similarity.
Anurag Das, Guanlong Zhao, John Levis, Evgeny Chukharev-Hudilainen, Ricardo Gutierrez-Osuna
2020Universal Adversarial Attacks on Spoken Language Assessment Systems.
Vyas Raina, Mark J. F. Gales, Kate M. Knill
2020Universal Speech Transformer.
Yingzhu Zhao, Chongjia Ni, Cheung-Chi Leung, Shafiq R. Joty, Eng Siong Chng, Bin Ma
2020Unsupervised Acoustic Unit Representation Learning for Voice Conversion Using WaveNet Auto-Encoders.
Mingjie Chen, Thomas Hain
2020Unsupervised Audio Source Separation Using Generative Priors.
Vivek Sivaraman Narayanaswamy, Jayaraman J. Thiagarajan, Rushil Anirudh, Andreas Spanias
2020Unsupervised Cross-Domain Singing Voice Conversion.
Adam Polyak, Lior Wolf, Yossi Adi, Yaniv Taigman
2020Unsupervised Discovery of Recurring Speech Patterns Using Probabilistic Adaptive Metrics.
Okko Räsänen, María Andrea Cruz Blandón
2020Unsupervised Domain Adaptation Under Label Space Mismatch for Speech Classification.
Akhil Mathur, Nadia Berthouze, Nicholas D. Lane
2020Unsupervised Domain Adaptation for Dialogue Sequence Labeling Based on Hierarchical Adversarial Training.
Shota Orihashi, Mana Ihori, Tomohiro Tanaka, Ryo Masumura
2020Unsupervised Feature Adaptation Using Adversarial Multi-Task Training for Automatic Evaluation of Children's Speech.
Richeng Duan, Nancy F. Chen
2020Unsupervised Learning for Sequence-to-Sequence Text-to-Speech for Low-Resource Languages.
Haitong Zhang, Yue Lin
2020Unsupervised Methods for Evaluating Speech Representations.
Michael Gump, Wei-Ning Hsu, James R. Glass
2020Unsupervised Regularization-Based Adaptive Training for Speech Recognition.
Fenglin Ding, Wu Guo, Bin Gu, Zhen-Hua Ling, Jun Du
2020Unsupervised Robust Speech Enhancement Based on Alpha-Stable Fast Multichannel Nonnegative Matrix Factorization.
Mathieu Fontaine, Kouhei Sekiguchi, Aditya Arie Nugraha, Kazuyoshi Yoshii
2020Unsupervised Subword Modeling Using Autoregressive Pretraining and Cross-Lingual Phone-Aware Modeling.
Siyuan Feng, Odette Scharenborg
2020Unsupervised Training of Siamese Networks for Speaker Verification.
Umair Khan, Javier Hernando
2020Unsupervised vs. Transfer Learning for Multimodal One-Shot Matching of Speech and Images.
Leanne Nortje, Herman Kamper
2020Using Cyclic Noise as the Source Signal for Neural Source-Filter-Based Speech Waveform Model.
Xin Wang, Junichi Yamagishi
2020Using Silence MR Image to Synthesise Dynamic MRI Vocal Tract Data of CV.
Ioannis K. Douros, Ajinkya Kulkarni, Chrysanthi Dourou, Yu Xie, Jacques Felblinger, Karyna Isaieva, Pierre-André Vuissoz, Yves Laprie
2020Using Speaker-Aligned Graph Memory Block in Multimodally Attentive Emotion Recognition Network.
Jeng-Lin Li, Chi-Chun Lee
2020Using Speech Enhancement Preprocessing for Speech Emotion Recognition in Realistic Noisy Conditions.
Hengshun Zhou, Jun Du, Yanhui Tu, Chin-Hui Lee
2020Using State of the Art Speaker Recognition and Natural Language Processing Technologies to Detect Alzheimer's Disease and Assess its Severity.
Raghavendra Pappagari, Jaejin Cho, Laureano Moro-Velázquez, Najim Dehak
2020Utterance Confidence Measure for End-to-End Speech Recognition with Applications to Distributed Speech Recognition Scenarios.
Ankur Kumar, Sachin Singh, Dhananjaya Gowda, Abhinav Garg, Shatrughan Singh, Chanwoo Kim
2020Utterance Invariant Training for Hybrid Two-Pass End-to-End Speech Recognition.
Dhananjaya Gowda, Ankur Kumar, Kwangyoun Kim, Hejung Yang, Abhinav Garg, Sachin Singh, Jiyeon Kim, Mehul Kumar, Sichen Jin, Shatrughan Singh, Chanwoo Kim
2020Utterance-Wise Meeting Transcription System Using Asynchronous Distributed Microphones.
Shota Horiguchi, Yusuke Fujita, Kenji Nagamatsu
2020VCTUBE : A Library for Automatic Speech Data Annotation.
Seong Choi, Seunghoon Jeong, Jeewoo Yoon, Migyeong Yang, Minsam Ko, Eunil Park, Jinyoung Han, Munyoung Lee, Seonghee Lee
2020VOP Detection in Variable Speech Rate Condition.
Ayush Agarwal, Jagabandhu Mishra, S. R. Mahadeva Prasanna
2020VQVC+: One-Shot Voice Conversion by Vector Quantization and U-Net Architecture.
Da-Yi Wu, Yen-Hao Chen, Hung-yi Lee
2020Variable Frame Rate-Based Data Augmentation to Handle Speaking-Style Variability for Automatic Speaker Verification.
Amber Afshan, Jinxi Guo, Soo Jin Park, Vijay Ravi, Alan McCree, Abeer Alwan
2020Variation in Spectral Slope and Interharmonic Noise in Cantonese Tones.
Phil Rose
2020Vector Quantized Temporally-Aware Correspondence Sparse Autoencoders for Zero-Resource Acoustic Unit Discovery.
Batuhan Gündogdu, Bolaji Yusuf, Mansur Yesilbursa, Murat Saraclar
2020Vector-Based Attentive Pooling for Text-Independent Speaker Verification.
Yanfeng Wu, Chenkai Guo, Hongcan Gao, Xiaolei Hou, Jing Xu
2020Vector-Quantized Autoregressive Predictive Coding.
Yu-An Chung, Hao Tang, James R. Glass
2020Vector-Quantized Neural Networks for Acoustic Unit Discovery in the ZeroSpeech 2020 Challenge.
Benjamin van Niekerk, Leanne Nortje, Herman Kamper
2020Very Short-Term Conflict Intensity Estimation Using Fisher Vectors.
Gábor Gosztolya
2020Virtual Acoustic Channel Expansion Based on Neural Networks for Weighted Prediction Error-Based Speech Dereverberation.
Joon-Young Yang, Joon-Hyuk Chang
2020Visual Speech In Real Noisy Environments (VISION): A Novel Benchmark Dataset and Deep Learning-Based Baseline System.
Mandar Gogate, Kia Dashtipour, Amir Hussain
2020VocGAN: A High-Fidelity Real-Time Vocoder with a Hierarchically-Nested Adversarial Network.
Jinhyeok Yang, Junmo Lee, Young-Ik Kim, Hoon-Young Cho, Injung Kim
2020Vocal Markers from Sustained Phonation in Huntington's Disease.
Rachid Riad, Hadrien Titeux, Laurie Lemoine, Justine Montillot, Jennifer Hamet Bagnou, Xuan-Nga Cao, Emmanuel Dupoux, Anne-Catherine Bachoud-Lévi
2020Vocoder-Based Speech Synthesis from Silent Videos.
Daniel Michelsanti, Olga Slizovskaia, Gloria Haro, Emilia Gómez, Zheng-Hua Tan, Jesper Jensen
2020Voice Activity Detection in the Wild via Weakly Supervised Sound Event Detection.
Yefei Chen, Heinrich Dinkel, Mengyue Wu, Kai Yu
2020Voice Conversion Based Data Augmentation to Improve Children's Speech Recognition in Limited Data Scenario.
S. Shahnawazuddin, Nagaraj Adiga, Kunal Kumar, Aayushi Poddar, Waquar Ahmad
2020Voice Conversion Using Speech-to-Speech Neuro-Style Transfer.
Ehab A. AlBadawy, Siwei Lyu
2020Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using Transformer with Text-to-Speech Pretraining.
Wen-Chin Huang, Tomoki Hayashi, Yi-Chiao Wu, Hirokazu Kameoka, Tomoki Toda
2020VoiceFilter-Lite: Streaming Targeted Voice Separation for On-Device Speech Recognition.
Quan Wang, Ignacio López-Moreno, Mert Saglam, Kevin W. Wilson, Alan Chiao, Renjie Liu, Yanzhang He, Wei Li, Jason Pelecanos, Marily Nika, Alexander Gruenstein
2020VoiceID on the Fly: A Speaker Recognition System that Learns from Scratch.
Baihan Lin, Xinxin Zhang
2020Voicing Distinction of Obstruents in the Hangzhou Wu Chinese Dialect.
Yang Yue, Fang Hu
2020WG-WaveNet: Real-Time High-Fidelity Speech Synthesis Without GPU.
Po-Chun Hsu, Hung-yi Lee
2020WISE: Word-Level Interaction-Based Multimodal Fusion for Speech Emotion Recognition.
Guang Shen, Riwei Lai, Rui Chen, Yu Zhang, Kejia Zhang, Qilong Han, Hongtao Song
2020Wake Word Detection with Alignment-Free Lattice-Free MMI.
Yiming Wang, Hang Lv, Daniel Povey, Lei Xie, Sanjeev Khudanpur
2020Wav2Spk: A Simple DNN Architecture for Learning Speaker Embeddings from Waveforms.
Wei-Wei Lin, Man-Wai Mak
2020Weak-Attention Suppression for Transformer Based Speech Recognition.
Yangyang Shi, Yongqiang Wang, Chunyang Wu, Christian Fuegen, Frank Zhang, Duc Le, Ching-Feng Yeh, Michael L. Seltzer
2020Weakly Supervised Training of Hierarchical Attention Networks for Speaker Identification.
Yanpei Shi, Qiang Huang, Thomas Hain
2020What Does an End-to-End Dialect Identification Model Learn About Non-Dialectal Information?
Shammur A. Chowdhury, Ahmed Ali, Suwon Shon, James R. Glass
2020What the Future Brings: Investigating the Impact of Lookahead for Incremental Neural TTS.
Brooke Stephenson, Laurent Besacier, Laurent Girin, Thomas Hueber
2020Whisper Activity Detection Using CNN-LSTM Based Attention Pooling Network Trained for a Speaker Identification Task.
Abinay Reddy Naini, Malla Satyapriya, Prasanta Kumar Ghosh
2020Whisper Augmented End-to-End/Hybrid Speech Recognition System - CycleGAN Approach.
Prithvi R. R. Gudepu, Gowtham P. Vadisetti, Abhishek Niranjan, Kinnera Saranu, Raghava Sarma, M. Ali Basha Shaik, Periyasamy Paramasivam
2020Whistled Vowel Identification by French Listeners.
Anaïs Tran Ngoc, Julien Meyer, Fanny Meunier
2020Why Did the x-Vector System Miss a Target Speaker? Impact of Acoustic Mismatch Upon Target Score on VoxCeleb Data.
Rosa González Hautamäki, Tomi Kinnunen
2020Word Error Rate Estimation Without ASR Output: e-WER2.
Ahmed Ali, Steve Renals
2020X-TaSNet: Robust and Accurate Time-Domain Speaker Extraction Network.
Zining Zhang, Bingsheng He, Zhenjie Zhang
2020X-Vector Singular Value Modification and Statistical-Based Decomposition with Ensemble Regression Modeling for Speaker Anonymization System.
Candy Olivia Mawalim, Kasorn Galajit, Jessada Karnjana, Masashi Unoki
2020XiaoiceSing: A High-Quality and Integrated Singing Voice Synthesis System.
Peiling Lu, Jie Wu, Jian Luan, Xu Tan, Li Zhou
2020g2pM: A Neural Grapheme-to-Phoneme Conversion Package for Mandarin Chinese Based on a New Open Benchmark Dataset.
Kyubyong Park, Seanie Lee
2020iMetricGAN: Intelligibility Enhancement for Speech-in-Noise Using Generative Adversarial Network-Based Metric Learning.
Haoyu Li, Szu-Wei Fu, Yu Tsao, Junichi Yamagishi
2020x-Vectors Meet Adversarial Attacks: Benchmarking Adversarial Robustness in Speaker Verification.
Jesús Villalba, Yuekai Zhang, Najim Dehak