| 2017 | 2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017, Okinawa, Japan, December 16-20, 2017 |
| 2017 | A context-aware speech recognition and understanding system for air traffic control domain. Youssef Oualil, Dietrich Klakow, György Szaszák, Ajay Srinivasamurthy, Hartmut Helmke, Petr Motlícek |
| 2017 | A hierarchical attention based model for off-topic spontaneous spoken response detection. Andrey Malinin, Kate M. Knill, Mark J. F. Gales |
| 2017 | Aalto system for the 2017 Arabic multi-genre broadcast challenge. Peter Smit, Siva Reddy Gangireddy, Seppo Enarvi, Sami Virpioja, Mikko Kurimo |
| 2017 | Acoustic-to-word model without OOV. Jinyu Li, Guoli Ye, Rui Zhao, Jasha Droppo, Yifan Gong |
| 2017 | Adversarial manifold learning for speaker recognition. Jen-Tzung Chien, Kang-Ting Peng |
| 2017 | Adversarial training for data-driven speech enhancement without parallel corpus. Takuya Higuchi, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani |
| 2017 | An embedded segmental K-means model for unsupervised segmentation and clustering of speech. Herman Kamper, Karen Livescu, Sharon Goldwater |
| 2017 | An investigation of multi-speaker training for wavenet vocoder. Tomoki Hayashi, Akira Tamamori, Kazuhiro Kobayashi, Kazuya Takeda, Tomoki Toda |
| 2017 | Attention-based Wav2Text with feature transfer learning. Andros Tjandra, Sakriani Sakti, Satoshi Nakamura |
| 2017 | Automatic speech recognition of Arabic multi-genre broadcast media. Maryam Najafian, Wei-Ning Hsu, Ahmed Ali, James R. Glass |
| 2017 | Binaural processing for robust recognition of degraded speech. Anjali Menon, Chanwoo Kim, Umpei Kurokawa, Richard M. Stern |
| 2017 | Character-based units for unlimited vocabulary continuous speech recognition. Peter Smit, Siva Reddy Gangireddy, Seppo Enarvi, Sami Virpioja, Mikko Kurimo |
| 2017 | Comparison of multiple features and modeling methods for text-dependent speaker verification. Yi Liu, Liang He, Yao Tian, Zhuzi Chen, Jia Liu, Michael T. Johnson |
| 2017 | Composite embedding systems for ZeroSpeech2017 Track1. Hayato Shibata, Taku Kato, Takahiro Shinozaki, Shinji Watanabe |
| 2017 | Computational cost reduction of long short-term memory based on simultaneous compression of input and hidden state. Takashi Masuko |
| 2017 | Consistent DNN uncertainty training and decoding for robust ASR. Karan Nathwani, Emmanuel Vincent, Irina Illina |
| 2017 | Cracking the cocktail party problem by multi-beam deep attractor network. Zhuo Chen, Jinyu Li, Xiong Xiao, Takuya Yoshioka, Huaming Wang, Zhenghao Wang, Yifan Gong |
| 2017 | Cross-domain speech recognition using nonparallel corpora with cycle-consistent adversarial networks. Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara |
| 2017 | DBLSTM based multilingual articulatory feature extraction for language documentation. Markus Müller, Sebastian Stüker, Alex Waibel |
| 2017 | Deep learning methods for unsupervised acoustic modeling - Leap submission to ZeroSpeech challenge 2017. T. K. Ansari, Rajath Kumar, Sonali Singh, Sriram Ganapathy |
| 2017 | Deep quaternion neural networks for spoken language understanding. Titouan Parcollet, Mohamed Morchid, Georges Linarès |
| 2017 | Denotation extraction for interactive learning in dialogue systems. Miroslav Vodolán, Filip Jurcícek |
| 2017 | Direct modeling of raw audio with DNNS for wake word detection. Ken'ichi Kumatani, Sankaran Panchapagesan, Minhua Wu, Minjae Kim, Nikko Strom, Gautam Tiwari, Arindam Mandal |
| 2017 | Dynamic time-aware attention to speaker roles and contexts for spoken language understanding. Po-Chun Chen, Ta-Chung Chi, Shang-Yu Su, Yun-Nung Chen |
| 2017 | Early and late integration of audio features for automatic video description. Chiori Hori, Takaaki Hori, Tim K. Marks, John R. Hershey |
| 2017 | End-to-end text-independent speaker verification with flexibility in utterance duration. Chunlei Zhang, Kazuhito Koishida |
| 2017 | Error detection of grapheme-to-phoneme conversion in text-to-speech synthesis using speech signal and lexical context. Kévin Vythelingum, Yannick Estève, Olivier Rosec |
| 2017 | Exploring ASR-free end-to-end modeling to improve spoken language understanding in a cloud-based dialog system. Yao Qian, Rutuja Ubale, Vikram Ramanarayanan, Patrick L. Lange, David Suendermann-Oeft, Keelan Evanini, Eugene Tsuprun |
| 2017 | Exploring architectures, data and units for streaming end-to-end speech recognition with RNN-transducer. Kanishka Rao, Hasim Sak, Rohit Prabhavalkar |
| 2017 | Exploring neural transducers for end-to-end speech recognition. Eric Battenberg, Jitong Chen, Rewon Child, Adam Coates, Yashesh Gaur, Yi Li, Hairong Liu, Sanjeev Satheesh, Anuroop Sriram, Zhenyao Zhu |
| 2017 | Exploring the use of acoustic embeddings in neural machine translation. Salil Deena, Raymond W. M. Ng, Pranava Swaroop Madhyastha, Lucia Specia, Thomas Hain |
| 2017 | Extracting bottleneck features and word-like pairs from untranscribed speech for feature representation. Yougen Yuan, Cheung-Chi Leung, Lei Xie, Hongjie Chen, Bin Ma, Haizhou Li |
| 2017 | Feature optimized DPGMM clustering for unsupervised subword modeling: A contribution to zerospeech 2017. Michael Heck, Sakriani Sakti, Satoshi Nakamura |
| 2017 | Future vector enhanced LSTM language model for LVCSR. Qi Liu, Yanmin Qian, Kai Yu |
| 2017 | Future word contexts in neural network language models. Xie Chen, X. Liu, Anton Ragni, Y. Wang, Mark J. F. Gales |
| 2017 | Gated convolutional networks based hybrid acoustic models for low resource speech recognition. Jian Kang, Wei-Qiang Zhang, Jia Liu |
| 2017 | Ground truth estimation of spoken english fluency score using decorrelation penalized low-rank matrix factorization. Hoon Chung, Yun-Kyung Lee, Jeon Gue Park |
| 2017 | Grounded language understanding for manipulation instructions using GAN-based classification. Komei Sugiura, Hisashi Kawai |
| 2017 | Hierarchical recurrent neural network for story segmentation using fusion of lexical and acoustic features. Emiru Tsunoo, Ondrej Klejch, Peter Bell, Steve Renals |
| 2017 | Improving native language (L1) identifation with better VAD and TDNN trained separately on native and non-native English corpora. Yao Qian, Keelan Evanini, Patrick L. Lange, Robert A. Pugh, Rutuja Ubale, Frank K. Soong |
| 2017 | Improving separation of overlapped speech for meeting conversations using uncalibrated microphone array. Keisuke Nakamura, Randy Gomez |
| 2017 | Improving the efficiency of forward-backward algorithm using batched computation in TensorFlow. Khe Chai Sim, Arun Narayanan, Tom Bagby, Tara N. Sainath, Michiel Bacchiani |
| 2017 | Incremental training and constructing the very deep convolutional residual network acoustic models. Sheng Li, Xugang Lu, Peng Shen, Ryoichi Takashima, Tatsuya Kawahara, Hisashi Kawai |
| 2017 | Integrated speaker-adaptive speech synthesis. Moquan Wan, Gilles Degottex, Mark J. F. Gales |
| 2017 | Investigating native and non-native English classification and transfer effects using Legendre polynomial coefficient clustering. Rachel Rakov, Andrew Rosenberg |
| 2017 | Investigation of lattice-free maximum mutual information-based acoustic models with sequence-level Kullback-Leibler divergence. Naoyuki Kanda, Yusuke Fujita, Kenji Nagamatsu |
| 2017 | Investigation of transfer learning for ASR using LF-MMI trained neural networks. Pegah Ghahremani, Vimal Manohar, Hossein Hadian, Daniel Povey, Sanjeev Khudanpur |
| 2017 | Iterative policy learning in end-to-end trainable task-oriented neural dialog models. Bing Liu, Ian R. Lane |
| 2017 | JHU Kaldi system for Arabic MGB-3 ASR challenge using diarization, audio-transcript alignment and transfer learning. Vimal Manohar, Daniel Povey, Sanjeev Khudanpur |
| 2017 | Keyword spotting for Google assistant using contextual speech recognition. Assaf Hurwitz Michaely, Xuedong Zhang, Gabor Simko, Carolina Parada, Petar S. Aleksic |
| 2017 | Language diarization for semi-supervised bilingual acoustic model training. Emre Yilmaz, Mitchell McLaren, Henk van den Heuvel, David A. van Leeuwen |
| 2017 | Language independent end-to-end architecture for joint language identification and speech recognition. Shinji Watanabe, Takaaki Hori, John R. Hershey |
| 2017 | Language modeling with highway LSTM. Gakuto Kurata, Bhuvana Ramabhadran, George Saon, Abhinav Sethy |
| 2017 | Language modeling with neural trans-dimensional random fields. Bin Wang, Zhijian Ou |
| 2017 | Lattice rescoring strategies for long short term memory language models in speech recognition. Shankar Kumar, Michael Nirschl, Daniel Niels Holtmann-Rice, Hank Liao, Ananda Theertha Suresh, Felix X. Yu |
| 2017 | Learning modality-invariant representations for speech and images. Kenneth Leidal, David Harwath, James R. Glass |
| 2017 | Learning speaker representation for neural network based multichannel speaker extraction. Katerina Zmolíková, Marc Delcroix, Keisuke Kinoshita, Takuya Higuchi, Atsunori Ogawa, Tomohiro Nakatani |
| 2017 | Leveraging native language speech for accent identification using deep Siamese networks. Aditya Siddhant, Preethi Jyothi, Sriram Ganapathy |
| 2017 | Leveraging side information for speaker identification with the Enron conversational telephone speech collection. Ning Gao, Gregory Sell, Douglas W. Oard, Mark Dredze |
| 2017 | Listening while speaking: Speech chain by deep learning. Andros Tjandra, Sakriani Sakti, Satoshi Nakamura |
| 2017 | MGB-3 but system: Low-resource ASR on Egyptian YouTube data. Karel Veselý, Murali Karthick Baskar, Mireia Díez, Karel Benes |
| 2017 | MIT-QCRI Arabic dialect identification system for the 2017 multi-genre broadcast challenge. Suwon Shon, Ahmed Ali, James R. Glass |
| 2017 | Meeting recognition with asynchronous distributed microphone array. Shoko Araki, Nobutaka Ono, Keisuke Kinoshita, Marc Delcroix |
| 2017 | Minimally supervised written-to-spoken text normalization. Axel H. Ng, Kyle Gorman, Richard Sproat |
| 2017 | Mitigating the impact of speech recognition errors on chatbot using sequence-to-sequence model. Pin-Jung Chen, I-Hung Hsu, Yi Yao Huang, Hung-yi Lee |
| 2017 | Multi-level language modeling and decoding for open vocabulary end-to-end speech recognition. Takaaki Hori, Shinji Watanabe, John R. Hershey |
| 2017 | Multi-task ensembles with teacher-student training. Jeremy Heng Meng Wong, Mark J. F. Gales |
| 2017 | Multi-view (Joint) probability linear discrimination analysis for J-vector based text dependent speaker verification. Ziqiang Shi, Liu Liu, Mengjiao Wang, Rujie Liu |
| 2017 | Multilingual bottle-neck feature learning from untranscribed speech. Hongjie Chen, Cheung-Chi Leung, Lei Xie, Bin Ma, Haizhou Li |
| 2017 | Multitask training with unlabeled data for end-to-end sign language fingerspelling recognition. Bowen Shi, Karen Livescu |
| 2017 | Neural relevance-aware query modeling for spoken document retrieval. Tien-Hong Lo, Ying-Wen Chen, Kuan-Yu Chen, Hsin-Min Wang, Berlin Chen |
| 2017 | Noise-robust exemplar matching for rescoring query-by-example search. Emre Yilmaz, Julien van Hout, Horacio Franco |
| 2017 | ONENET: Joint domain, intent, slot prediction for spoken language understanding. Young-Bum Kim, Sungjin Lee, Karl Stratos |
| 2017 | On lattice generation for large vocabulary speech recognition. David Rybach, Michael Riley, Johan Schalkwyk |
| 2017 | Perceptual quality and modeling accuracy of excitation parameters in DLSTM-based speech synthesis systems. Eunwoo Song, Frank K. Soong, Hong-Goo Kang |
| 2017 | Personalized word representations carrying personalized semantics learned from social network posts. Zih-Wei Lin, Tzu-Wei Sung, Hung-yi Lee, Lin-Shan Lee |
| 2017 | Reducing the computational complexity for whole word models. Hagen Soltau, Hank Liao, Hasim Sak |
| 2017 | Scalable multi-domain dialogue state tracking. Abhinav Rastogi, Dilek Hakkani-Tür, Larry P. Heck |
| 2017 | Seeing and hearing too: Audio representation for video captioning. Shun-Po Chuang, Chia-Hung Wan, Pang-Chi Huang, Chi-Yu Yang, Hung-yi Lee |
| 2017 | Semi-supervised training strategies for deep neural networks. Matthew Gibson, Gary Cook, Puming Zhan |
| 2017 | Sequence training of DNN acoustic models with natural gradient. Adnan Haider, Philip C. Woodland |
| 2017 | Simplifying very deep convolutional neural network architectures for robust speech recognition. Joanna Rownicka, Steve Renals, Peter Bell |
| 2017 | Sparse representation of phonetic features for voice conversion with and without parallel data. Berrak Sisman, Haizhou Li, Kay Chen Tan |
| 2017 | Speaker-sensitive dual memory networks for multi-turn slot tagging. Young-Bum Kim, Sungjin Lee, Ruhi Sarikaya |
| 2017 | Speech recognition challenge in the wild: Arabic MGB-3. Ahmed Ali, Stephan Vogel, Steve Renals |
| 2017 | Spoken language biomarkers for detecting cognitive impairment. Tuka Alhanai, Rhoda Au, James R. Glass |
| 2017 | Spoofing detection via simultaneous verification of audio-visual synchronicity and transcription. Lea Schonherr, Steffen Zeiler, Dorothea Kolossa |
| 2017 | Statistical parametric speech synthesis using generative adversarial networks under a multi-task learning framework. Shan Yang, Lei Xie, Xiao Chen, Xiaoyan Lou, Xuan Zhu, Dongyan Huang, Haizhou Li |
| 2017 | Streaming small-footprint keyword spotting using sequence-to-sequence models. Yanzhang He, Rohit Prabhavalkar, Kanishka Rao, Wei Li, Anton Bakhtin, Ian McGraw |
| 2017 | Subband wavenet with overlapped single-sideband filterbanks. Takuma Okamoto, Kentaro Tachibana, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai |
| 2017 | Syllable-based acoustic modeling with CTC-SMBR-LSTM. Zhongdi Qu, Parisa Haghani, Eugene Weinstein, Pedro J. Moreno |
| 2017 | Tackling unseen acoustic conditions in query-by-example search using time and frequency convolution for multilingual deep bottleneck features. Julien van Hout, Vikramjit Mitra, Horacio Franco, Chris Bartels, Dimitra Vergyri |
| 2017 | The CMU entry to blizzard machine learning challenge. Pallavi Baljekar, Sai Krishna Rallabandi, Alan W. Black |
| 2017 | The USTC system for blizzard machine learning challenge 2017-ES2. Ya-Jun Hu, Li-Juan Liu, Chuang Ding, Zhen-Hua Ling, Li-Rong Dai |
| 2017 | The blizzard machine learning challenge 2017. Kei Sawada, Keiichi Tokuda, Simon King, Alan W. Black |
| 2017 | The iFLYTEK system for blizzard machine learning challenge 2017-ES1. Li-Juan Liu, Chuang Ding, Ya-Jun Hu, Zhen-Hua Ling, Yuan Jiang, Ming Zhou, Si Wei |
| 2017 | The zero resource speech challenge 2017. Ewan Dunbar, Xuan-Nga Cao, Juan Benjumea, Julien Karadayi, Mathieu Bernard, Laurent Besacier, Xavier Anguera, Emmanuel Dupoux |
| 2017 | Topic segmentation in ASR transcripts using bidirectional RNNS for change detection. Imran A. Sheikh, Dominique Fohr, Irina Illina |
| 2017 | Turbo fusion of magnitude and phase information for DNN-based phoneme recognition. Timo Lohrenz, Tim Fingscheidt |
| 2017 | UTD-CRSS submission for MGB-3 Arabic dialect identification: Front-end and back-end advancements on broadcast speech. Ahmet Emin Bulut, Qian Zhang, Chunlei Zhang, Fahimeh Bahmaninezhad, John H. L. Hansen |
| 2017 | Unsupervised HMM posteriograms for language independent acoustic modeling in zero resource conditions. T. K. Ansari, Rajath Kumar, Sonali Singh, Sriram Ganapathy, V. Susheela Devi |
| 2017 | Unsupervised adaptation of student DNNS learned from teacher RNNS for improved ASR performance. Lahiru Samarakoon, Brian Mak |
| 2017 | Unsupervised adaptation with domain separation networks for robust speech recognition. Zhong Meng, Zhuo Chen, Vadim Mazalov, Jinyu Li, Yifan Gong |
| 2017 | Unsupervised domain adaptation for robust speech recognition via variational autoencoder-based data augmentation. Wei-Ning Hsu, Yu Zhang, James R. Glass |
| 2017 | Unwritten languages demand attention too! Word discovery with encoder-decoder models. Marcely Zanon Boito, Alexandre Berard, Aline Villavicencio, Laurent Besacier |
| 2017 | WERD: Using social text spelling variants for evaluating dialectal speech recognition. Ahmed Ali, Preslav Nakov, Peter Bell, Steve Renals |