| 2017 | "Did you laugh enough today?" - Deep Neural Networks for Mobile and Wearable Laughter Trackers. Gerhard Hagerer, Nicholas Cummins, Florian Eyben, Björn W. Schuller |
| 2017 | 18th Annual Conference of the International Speech Communication Association, Interspeech 2017, Stockholm, Sweden, August 20-24, 2017 Francisco Lacerda |
| 2017 | 2016 BUT Babel System: Multilingual BLSTM Acoustic Model with i-Vector Based Adaptation. Martin Karafiát, Murali Karthick Baskar, Pavel Matejka, Karel Veselý, Frantisek Grézl, Lukás Burget, Jan Cernocký |
| 2017 | A Batch Noise Contrastive Estimation Approach for Training Large Vocabulary Language Models. Youssef Oualil, Dietrich Klakow |
| 2017 | A Comparative Evaluation of GMM-Free State Tying Methods for ASR. Tamás Grósz, Gábor Gosztolya, László Tóth |
| 2017 | A Comparison of Danish Listeners' Processing Cost in Judging the Truth Value of Norwegian, Swedish, and English Sentences. Ocke-Schwen Bohn, Trine Askjær-Jørgensen |
| 2017 | A Comparison of Perceptually Motivated Loss Functions for Binary Mask Estimation in Speech Separation. Danny Websdale, Ben Milner |
| 2017 | A Comparison of Sentence-Level Speech Intelligibility Metrics. Alexander Kain, Max Del Giudice, Kris Tjaden |
| 2017 | A Comparison of Sequence-to-Sequence Models for Speech Recognition. Rohit Prabhavalkar, Kanishka Rao, Tara N. Sainath, Bo Li, Leif Johnson, Navdeep Jaitly |
| 2017 | A Computational Model for Phonetically Responsive Spoken Dialogue Systems. Eran Raveh, Ingmar Steiner, Bernd Möbius |
| 2017 | A Contrast Function and Algorithm for Blind Separation of Audio Signals. Wei Gao, Roberto Togneri, Victor Sreeram |
| 2017 | A Data-Driven Approach for Perceptually Validated Acoustic Features for Children's Sibilant Fricative Productions. Patrick F. Reidy, Mary E. Beckman, Jan Edwards, Benjamin Munson |
| 2017 | A Delay-Flexible Stereo Acoustic Echo Cancellation for DFT-Based In-Car Communication (ICC) Systems. Jan Franzen, Tim Fingscheidt |
| 2017 | A Distribution Free Formulation of the Total Variability Model. Ruchir Travadi, Shrikanth S. Narayanan |
| 2017 | A Domain Knowledge-Assisted Nonlinear Model for Head-Related Transfer Functions Based on Bottleneck Deep Neural Network. Xiaoke Qi, Jianhua Tao |
| 2017 | A Dual Source-Filter Model of Snore Audio for Snorer Group Classification. M. V. Achuth Rao, Shivani Yadav, Prasanta Kumar Ghosh |
| 2017 | A Fast Robust 1D Flow Model for a Self-Oscillating Coupled 2D FEM Vocal Fold Simulation. Arvind Vasudevan, Victor Zappi, Peter Anderson, Sidney S. Fels |
| 2017 | A Fully Convolutional Neural Network for Speech Enhancement. Se Rim Park, Jinwon Lee |
| 2017 | A Gender Bias in the Acoustic-Melodic Features of Charismatic Speech? Eszter Novák-Tót, Oliver Niebuhr, Aoju Chen |
| 2017 | A Generative Model for Score Normalization in Speaker Recognition. Albert Swart, Niko Brümmer |
| 2017 | A Hierarchical Encoder-Decoder Model for Statistical Parametric Speech Synthesis. Srikanth Ronanki, Oliver Watts, Simon King |
| 2017 | A Mask Estimation Method Integrating Data Field Model for Speech Enhancement. Xianyun Wang, Changchun Bao, Feng Bao |
| 2017 | A Maximum Likelihood Approach to Deep Neural Network Based Nonlinear Spectral Mapping for Single-Channel Speech Separation. Yannan Wang, Jun Du, Li-Rong Dai, Chin-Hui Lee |
| 2017 | A Modulation Property of Time-Frequency Derivatives of Filtered Phase and its Application to Aperiodicity and f Hideki Kawahara, Ken-Ichi Sakakibara, Masanori Morise, Hideki Banno, Tomoki Toda |
| 2017 | A Mostly Data-Driven Approach to Inverse Text Normalization. Ernest Pusateri, Bharat Ram Ambati, Elizabeth Brooks, Ondrej Plátek, Donald McAllaster, Venki Nagesha |
| 2017 | A Mouth Opening Effect Based on Pole Modification for Expressive Singing Voice Transformation. Luc Ardaillon, Axel Roebel |
| 2017 | A Neural Parametric Singing Synthesizer. Merlijn Blaauw, Jordi Bonada |
| 2017 | A Neuro-Experimental Evidence for the Motor Theory of Speech Perception. Bin Zhao, Jianwu Dang, Gaoyan Zhang |
| 2017 | A New Cosine Series Antialiasing Function and its Application to Aliasing-Free Glottal Source Models for Speech and Singing Synthesis. Hideki Kawahara, Ken-Ichi Sakakibara, Masanori Morise, Hideki Banno, Tomoki Toda, Toshio Irino |
| 2017 | A New Model of Final Lowering in Spontaneous Monologue. Kikuo Maekawa |
| 2017 | A New Workflow for Semi-Automatized Annotations: Tests with Long-Form Naturalistic Recordings of Childrens Language Environments. Marisa Casillas, Elika Bergelson, Anne S. Warlaumont, Alejandrina Cristià, Melanie Soderstrom, Mark VanDam, Han Sloetjes |
| 2017 | A Note Based Query By Humming System Using Convolutional Neural Network. Naziba Mostafa, Pascale Fung |
| 2017 | A Phonological Phrase Sequence Modelling Approach for Resource Efficient and Robust Real-Time Punctuation Recovery. Anna Moró, György Szaszák |
| 2017 | A Post-Filtering Approach Based on Locally Linear Embedding Difference Compensation for Speech Enhancement. Yi-Chiao Wu, Hsin-Te Hwang, Syu-Siang Wang, Chin-Cheng Hsu, Yu Tsao, Hsin-Min Wang |
| 2017 | A Preliminary Phonetic Investigation of Alphabetic Words in Mandarin Chinese. Hongwei Ding, Yuanyuan Zhang, Hongchao Liu, Chu-Ren Huang |
| 2017 | A Preliminary Study of Prosodic Disambiguation by Chinese EFL Learners. Yuanyuan Zhang, Hongwei Ding |
| 2017 | A Quantitative Measure of the Impact of Coarticulation on Phone Discriminability. Thomas Schatz, Rory Turnbull, Francis R. Bach, Emmanuel Dupoux |
| 2017 | A Relevance Score Estimation for Spoken Term Detection Based on RNN-Generated Pronunciation Embeddings. Jan Svec, Josef V. Psutka, Lubos Smídl, Jan Trmal |
| 2017 | A Rescoring Approach for Keyword Search Using Lattice Context Information. Zhipeng Chen, Ji Wu |
| 2017 | A Robust Medical Speech-to-Speech/Speech-to-Sign Phraselator. Farhia Ahmed, Pierrette Bouillon, Chelle Destefano, Johanna Gerlach, Sonia Halimi, Angela Hooper, Manny Rayner, Hervé Spechbach, Irene Strasly, Nikos Tsourakis |
| 2017 | A Robust Voiced/Unvoiced Phoneme Classification from Whispered Speech Using the 'Color' of Whispered Phonemes and Deep Neural Network. G. Nisha Meenakshi, Prasanta Kumar Ghosh |
| 2017 | A Robust and Alternative Approach to Zero Frequency Filtering Method for Epoch Extraction. P. Gangamohan, B. Yegnanarayana |
| 2017 | A Semi-Polar Grid Strategy for the Three-Dimensional Finite Element Simulation of Vowel-Vowel Sequences. Marc Arnela, Saeed Dabbaghchian, Oriol Guasch, Olov Engwall |
| 2017 | A Semi-Supervised Learning Approach for Acoustic-Prosodic Personality Perception in Under-Resourced Domains. Rubén Solera-Ureña, Helena Moniz, Fernando Batista, Vera Cabarrão, Anna Pompili, Ramón Fernandez Astudillo, Joana Campos, Ana Paiva, Isabel Trancoso |
| 2017 | A Signal Processing Approach for Speaker Separation Using SFF Analysis. Nivedita Chennupati, B. H. V. S. Narayana Murthy, B. Yegnanarayana |
| 2017 | A Simulation Study on the Effect of Glottal Boundary Conditions on Vocal Tract Formants. Yasufumi Uezu, Tokihiko Kaburagi |
| 2017 | A Speaker Adaptive DNN Training Approach for Speaker-Independent Acoustic Inversion. Leonardo Badino, Luca Franceschi, Raman Arora, Michele Donini, Massimiliano Pontil |
| 2017 | A Spectro-Temporal Demodulation Technique for Pitch Estimation. Jitendra Kumar Dhiman, Nagaraj Adiga, Chandra Sekhar Seelamantula |
| 2017 | A Stepwise Analysis of Aggregated Crowdsourced Labels Describing Multimodal Emotional Behaviors. Alec Burmania, Carlos Busso |
| 2017 | A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification. Lantian Li, Yixiang Chen, Dong Wang, Thomas Fang Zheng |
| 2017 | A System for Real Time Collaborative Transcription Correction. Peter Bell, Joachim Fainberg, Catherine Lai, Mark Sinclair |
| 2017 | A Thematicity-Based Prosody Enrichment Tool for CTS. Mónica Domínguez, Mireia Farrús, Leo Wanner |
| 2017 | A Time-Warping Pitch Tracking Algorithm Considering Fast f Simon Stone, Peter Steiner, Peter Birkholz |
| 2017 | A Transfer Learning Based Feature Extractor for Polyphonic Sound Event Detection Using Connectionist Temporal Classification. Yun Wang, Florian Metze |
| 2017 | A Triplet Ranking-Based Neural Network for Speaker Diarization and Linking. Gaël Le Lan, Delphine Charlet, Anthony Larcher, Sylvain Meignier |
| 2017 | A Unified Numerical Simulation of Vowel Production That Comprises Phonation and the Emitted Sound. Niyazi Cem Degirmenci, Johan Jansson, Johan Hoffman, Marc Arnela, Patricia Sánchez-Martín, Oriol Guasch, Sten Ternström |
| 2017 | ASR Error Management for Improving Spoken Language Understanding. Edwin Simonnet, Sahar Ghannay, Nathalie Camelin, Yannick Estève, Renato De Mori |
| 2017 | Accurate Synchronization of Speech and EGG Signal Using Phase Information. S. B. Sunil Kumar, K. Sreenivasa Rao, Tanumay Mandal |
| 2017 | Acoustic Analysis of Detailed Three-Dimensional Shape of the Human Nasal Cavity and Paranasal Sinuses. Tatsuya Kitamura, Hironori Takemoto, Hisanori Makinae, Tetsutaro Yamaguchi, Koutaro Maki |
| 2017 | Acoustic Assessment of Disordered Voice with Continuous Speech Based on Utterance-Level ASR Posterior Features. Yuanyuan Liu, Tan Lee, P. C. Ching, Thomas K. T. Law, Kathy Y. S. Lee |
| 2017 | Acoustic Characterization of Word-Final Glottal Stops in Mizo and Assam Sora. Sishir Kalita, Wendy Lalhminghlui, Luke Horo, Priyankoo Sarmah, S. R. Mahadeva Prasanna, Samarendra Dandapat |
| 2017 | Acoustic Correlates of Parental Role and Gender Identity in the Speech of Expecting Parents. Melanie Weirich, Adrian P. Simpson |
| 2017 | Acoustic Cues to the Singleton-Geminate Contrast: The Case of Libyan Arabic Sonorants. Amel Issa |
| 2017 | Acoustic Data-Driven Lexicon Learning Based on a Greedy Pronunciation Selection Framework. Xiaohui Zhang, Vimal Manohar, Daniel Povey, Sanjeev Khudanpur |
| 2017 | Acoustic Evaluation of Nasality in Cerebellar Syndromes. Michal Novotný, Jan Rusz, K. Spálenka, Jirí Klempír, Dana Horáková, Evzen Ruzicka |
| 2017 | Acoustic Feature Learning via Deep Variational Canonical Correlation Analysis. Qingming Tang, Weiran Wang, Karen Livescu |
| 2017 | Acoustic Modeling for Google Home. Bo Li, Tara N. Sainath, Arun Narayanan, Joe Caroselli, Michiel Bacchiani, Ananya Misra, Izhak Shafran, Hasim Sak, Golan Pundak, Kean K. Chin, Khe Chai Sim, Ron J. Weiss, Kevin W. Wilson, Ehsan Variani, Chanwoo Kim, Olivier Siohan, Mitchel Weintraub, Erik McDermott, Richard Rose, Matt Shannon |
| 2017 | Acoustic Pairing of Original and Dubbed Voices in the Context of Video Game Localization. Adrien Gresse, Mickael Rouvier, Richard Dufour, Vincent Labatut, Jean-François Bonastre |
| 2017 | Acoustic Properties of Canonical and Non-Canonical Stress in French, Turkish, Armenian and Brazilian Portuguese. Angeliki Athanasopoulou, Irene Vogel, Hossep Dolatian |
| 2017 | Acoustic Scene Classification Using a CNN-SuperVector System Trained with Auditory and Spectrogram Image Features. Rakib Hyder, Shabnam Ghaffarzadegan, Zhe Feng, John H. L. Hansen, Taufiq Hasan |
| 2017 | Acoustic and Electroglottographic Study of Breathy and Modal Vowels as Produced by Heritage and Native Gujarati Speakers. Kiranpreet Nara |
| 2017 | Acoustic-Prosodic and Physiological Response to Stressful Interactions in Children with Autism Spectrum Disorder. Daniel Bone, Julia Mertens, Emily Zane, Sungbok Lee, Shrikanth S. Narayanan, Ruth B. Grossman |
| 2017 | Acoustic-to-Articulatory Mapping Based on Mixture of Probabilistic Canonical Correlation Analysis. Hidetsugu Uchida, Daisuke Saito, Nobuaki Minematsu |
| 2017 | Acoustics and Articulation of Medial versus Final Coronal Stop Gemination Contrasts in Moroccan Arabic. Mohamed Yassine Frej, Christopher Carignan, Catherine T. Best |
| 2017 | Adaptive Multichannel Dereverberation for Automatic Speech Recognition. Joe Caroselli, Izhak Shafran, Arun Narayanan, Richard Rose |
| 2017 | Addressing Code-Switching in French/Algerian Arabic Speech. Djegdjiga Amazouz, Martine Adda-Decker, Lori Lamel |
| 2017 | Adjusting the Frame: Biphasic Performative Control of Speech Rhythm. Samuel Delalez, Christophe d'Alessandro |
| 2017 | Advances in Joint CTC-Attention Based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM. Takaaki Hori, Shinji Watanabe, Yu Zhang, William Chan |
| 2017 | Adversarial Auto-Encoders for Speech Based Emotion Recognition. Saurabh Sahu, Rahul Gupta, Ganesh Sivaraman, Wael AbdAlmageed, Carol Y. Espy-Wilson |
| 2017 | Adversarial Network Bottleneck Features for Noise Robust Speaker Verification. Hong Yu, Zheng-Hua Tan, Zhanyu Ma, Jun Guo |
| 2017 | Aerodynamic Features of French Fricatives. Rosario Signorello, Sergio Hassid, Didier Demolin |
| 2017 | Alternative Approaches to Neural Network Based Speaker Verification. Anna Silnova, Lukás Burget, Jan Cernocký |
| 2017 | An 'End-to-Evolution' Hybrid Approach for Snore Sound Classification. Michael Freitag, Shahin Amiriparian, Nicholas Cummins, Maurice Gerczuk, Björn W. Schuller |
| 2017 | An Affect Prediction Approach Through Depression Severity Parameter Incorporation in Neural Networks. Rahul Gupta, Saurabh Sahu, Carol Y. Espy-Wilson, Shrikanth S. Narayanan |
| 2017 | An Analysis of "Attention" in Sequence-to-Sequence Models. Rohit Prabhavalkar, Tara N. Sainath, Bo Li, Kanishka Rao, Navdeep Jaitly |
| 2017 | An Apparatus to Investigate Western Opera Singing Skill Learning Using Performance and Result Biofeedback, and Measuring its Neural Correlates. Aurore Jaumard-Hakoun, Samy Chikhi, Takfarinas Medani, Angelika Nair, Gérard Dreyfus, François-Benoît Vialatte |
| 2017 | An Audio Based Piano Performance Evaluation Method Using Deep Neural Network Based Acoustic Modeling. Jing Pan, Ming Li, Zhanmei Song, Xin Li, Xiaolin Liu, Hua Yi, Manman Zhu |
| 2017 | An Auditory Model of Speaker Size Perception for Voiced Speech Sounds. Toshio Irino, Eri Takimoto, Toshie Matsui, Roy D. Patterson |
| 2017 | An Automatically Aligned Corpus of Child-Directed Speech. Micha Elsner, Kiwako Ito |
| 2017 | An Avatar-Based System for Identifying Individuals Likely to Develop Dementia. Bahman Mirheidari, Daniel Blackburn, Kirsty Harkness, Traci Walker, Annalena Venneri, Markus Reuber, Heidi Christensen |
| 2017 | An Efficient Phone N-Gram Forward-Backward Computation Using Dense Matrix Multiplication. Khe Chai Sim, Arun Narayanan |
| 2017 | An End-to-End Trainable Neural Network Model with Belief Tracking for Task-Oriented Dialog. Bing Liu, Ian R. Lane |
| 2017 | An Entrained Rhythm's Frequency, Not Phase, Influences Temporal Sampling of Speech. Hans Rutger Bosker, Anne Kösem |
| 2017 | An Environmental Feature Representation for Robust Speech Recognition and for Environment Identification. Xue Feng, Brigitte Richardson, Scott Amman, James R. Glass |
| 2017 | An Expanded Taxonomy of Semiotic Classes for Text Normalization. Daan van Esch, Richard Sproat |
| 2017 | An Exploration of Dropout with LSTMs. Gaofeng Cheng, Vijayaditya Peddinti, Daniel Povey, Vimal Manohar, Sanjeev Khudanpur, Yonghong Yan |
| 2017 | An HMM/DNN Comparison for Synchronized Text-to-Speech and Tongue Motion Synthesis. Sébastien Le Maguer, Ingmar Steiner, Alexander Hewer |
| 2017 | An Information Theoretic Analysis of the Temporal Synchrony Between Head Gestures and Prosodic Patterns in Spontaneous Speech. Gaurav Fotedar, Prasanta Kumar Ghosh |
| 2017 | An Integrated Solution for Snoring Sound Classification Using Bhattacharyya Distance Based GMM Supervectors with SVM, Feature Selection with Random Forest and Spectrogram with CNN. Tin Lay Nwe, Tran Huy Dat, Wen Zheng Terence Ng, Bin Ma |
| 2017 | An Investigation of Crowd Speech for Room Occupancy Estimation. Siyuan Chen, Julien Epps, Eliathamby Ambikairajah, Phu Ngoc Le |
| 2017 | An Investigation of Deep Neural Networks for Multilingual Speech Recognition Training and Adaptation. Sibo Tong, Philip N. Garner, Hervé Bourlard |
| 2017 | An Investigation of Emotion Dynamics and Kalman Filtering for Speech-Based Emotion Prediction. Zhaocheng Huang, Julien Epps |
| 2017 | An Investigation of Emotion Prediction Uncertainty Using Gaussian Mixture Regression. Ting Dang, Vidhyasaharan Sethu, Julien Epps, Eliathamby Ambikairajah |
| 2017 | An Investigation of Pitch Matching Across Adjacent Turns in a Corpus of Spontaneous German. Margaret Zellers, Antje Schweitzer |
| 2017 | An N-Gram Based Approach to the Automatic Diagnosis of Alzheimer's Disease from Spoken Language. Sebastian Wankerl, Elmar Nöth, Stefan Evert |
| 2017 | An Objective Critical Distance Measure Based on the Relative Level of Spectral Valley. T. V. Ananthapadmanabha, A. G. Ramakrishnan, Shubham Sharma |
| 2017 | An RNN Model of Text Normalization. Richard Sproat, Navdeep Jaitly |
| 2017 | An RNN-Based Quantized F0 Model with Multi-Tier Feedback Links for Text-to-Speech Synthesis. Xin Wang, Shinji Takaki, Junichi Yamagishi |
| 2017 | An Ultrasound Study of Alveolar and Retroflex Consonants in Arrernte: Stressed and Unstressed Syllables. Marija Tabain, Richard Beare |
| 2017 | Analysis and Description of ABC Submission to NIST SRE 2016. Oldrich Plchot, Pavel Matejka, Anna Silnova, Ondrej Novotný, Mireia Díez Sánchez, Johan Rohdin, Ondrej Glembek, Niko Brümmer, Albert Swart, Jesús Jorrín-Prieto, Paola García, Luis Buera, Patrick Kenny, Md. Jahangir Alam, Gautam Bhattacharya |
| 2017 | Analysis of Acoustic-to-Articulatory Speech Inversion Across Different Accents and Languages. Ganesh Sivaraman, Carol Y. Espy-Wilson, Martijn Wieling |
| 2017 | Analysis of Engagement and User Experience with a Laughter Responsive Social Robot. Bekir Berker Türker, Zana Buçinca, Engin Erzin, Yücel Yemez, T. Metin Sezgin |
| 2017 | Analysis of Score Normalization in Multilingual Speaker Recognition. Pavel Matejka, Ondrej Novotný, Oldrich Plchot, Lukás Burget, Mireia Díez Sánchez, Jan Cernocký |
| 2017 | Analysis of the Relationship Between Prosodic Features of Fillers and its Forms or Occurrence Positions. Shizuka Nakamura, Ryosuke Nakanishi, Katsuya Takanashi, Tatsuya Kawahara |
| 2017 | Analytic Filter Bank for Speech Analysis, Feature Extraction and Perceptual Studies. Unto K. Laine |
| 2017 | Annealed f-Smoothing as a Mechanism to Speed up Neural Network Training. Tara N. Sainath, Vijayaditya Peddinti, Olivier Siohan, Arun Narayanan |
| 2017 | Apkinson - A Mobile Monitoring Solution for Parkinson's Disease. Philipp Klumpp, Thomas Janu, Tomás Arias-Vergara, Juan Camilo Vásquez-Correa, Juan Rafael Orozco-Arroyave, Elmar Nöth |
| 2017 | Applications of the BBN Sage Speech Processing Platform. Ralf Meermeier, Sean Colbath |
| 2017 | Approaches for Neural-Network Language Model Adaptation. Min Ma, Michael Nirschl, Fadi Biadsy, Shankar Kumar |
| 2017 | Approaching Human Performance in Behavior Estimation in Couples Therapy Using Deep Sentence Embeddings. Shao-Yen Tseng, Brian R. Baucom, Panayiotis G. Georgiou |
| 2017 | Approximated and Domain-Adapted LSTM Language Models for First-Pass Decoding in Speech Recognition. Mittul Singh, Youssef Oualil, Dietrich Klakow |
| 2017 | Approximating Phonotactic Input in Children's Linguistic Environments from Orthographic Transcripts. Sofia Strömbergsson, Jens Edlund, Jana Götze, Kristina Nilsson Björkenstam |
| 2017 | Areal and Phylogenetic Features for Multilingual Speech Synthesis. Alexander Gutkin, Richard Sproat |
| 2017 | Articulation Rate in Swedish Child-Directed Speech Increases as a Function of the Age of the Child Even When Surprisal is Controlled for. Johan Sjons, Thomas Hörberg, Robert Östling, Johannes Bjerva |
| 2017 | Articulatory Text-to-Speech Synthesis Using the Digital Waveguide Mesh Driven by a Deep Neural Network. Amelia Jane Gully, Takenori Yoshimura, Damian T. Murphy, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda |
| 2017 | Assessing the Tolerance of Neural Machine Translation Systems Against Speech Recognition Errors. Nicholas Ruiz, Mattia Antonino Di Gangi, Nicola Bertoldi, Marcello Federico |
| 2017 | Attention Based CLDNNs for Short-Duration Acoustic Scene Classification. Jinxi Guo, Ning Xu, Li-Jia Li, Abeer Alwan |
| 2017 | Attention Networks for Modeling Behaviors in Addiction Counseling. James Gibson, Dogan Can, Panayiotis G. Georgiou, David C. Atkins, Shrikanth S. Narayanan |
| 2017 | Attention and Localization Based on a Deep Convolutional Recurrent Model for Weakly Supervised Audio Tagging. Yong Xu, Qiuqiang Kong, Qiang Huang, Wenwu Wang, Mark D. Plumbley |
| 2017 | Attention-Based LSTM with Multi-Task Learning for Distant Speech Recognition. Yu Zhang, Pengyuan Zhang, Yonghong Yan |
| 2017 | Attentional Factors in Listeners' Uptake of Gesture Cues During Speech Processing. Raheleh Saryazdi, Craig G. Chambers |
| 2017 | Attentive Convolutional Neural Network Based Speech Emotion Recognition: A Study on the Impact of Input Features, Signal Length, and Acted Speech. Michael Neumann, Ngoc Thang Vu |
| 2017 | Attractiveness of French Voices for German Listeners - Results from Native and Non-Native Read Speech. Jürgen Trouvain, Frank Zimmerer |
| 2017 | Audio Classification Using Class-Specific Learned Descriptors. Sukanya Sonowal, Tushar Sandhan, In Kyu Choi, Nam Soo Kim |
| 2017 | Audio Content Based Geotagging in Multimedia. Anurag Kumar, Benjamin Elizalde, Bhiksha Raj |
| 2017 | Audio Replay Attack Detection Using High-Frequency Features. Marcin Witkowski, Stanislaw Kacprzak, Piotr Zelasko, Konrad Kowalczyk, Jakub Galka |
| 2017 | Audio Replay Attack Detection with Deep Learning Frameworks. Galina Lavrentyeva, Sergey Novoselov, Egor Malykh, Alexander Kozlov, Oleg Kudashev, Vadim Shchemelinin |
| 2017 | Audio Scene Classification with Deep Recurrent Neural Networks. Huy Phan, Philipp Koch, Fabrice Katzberg, Marco Maaß, Radoslaw Mazur, Alfred Mertins |
| 2017 | Audiovisual Recalibration of Vowel Categories. Matthias K. Franken, Frank Eisner, Jan-Mathijs Schoffelen, Daniel J. Acheson, Peter Hagoort, James M. McQueen |
| 2017 | Auditory-Visual Integration of Talker Gender in Cantonese Tone Perception. Wei Lai |
| 2017 | Autoencoder Based Domain Adaptation for Speaker Recognition Under Insufficient Channel Information. Suwon Shon, Seongkyu Mun, Wooil Kim, Hanseok Ko |
| 2017 | Automatic Alignment Between Classroom Lecture Utterances and Slide Components. Masatoshi Tsuchiya, Ryo Minamiguchi |
| 2017 | Automatic Assessment of Non-Native Prosody by Measuring Distances on Prosodic Label Sequences. David Escudero Mancebo, César González Ferreras, Lourdes Aguilar, Eva Estebas-Vilaplana |
| 2017 | Automatic Classification of Autistic Child Vocalisations: A Novel Database and Results. Alice Baird, Shahin Amiriparian, Nicholas Cummins, Alyssa M. Alcorn, Anton Batliner, Sergey Pugachevskiy, Michael Freitag, Maurice Gerczuk, Björn W. Schuller |
| 2017 | Automatic Construction of the Finnish Parliament Speech Corpus. André Mansikkaniemi, Peter Smit, Mikko Kurimo |
| 2017 | Automatic Evaluation of Children Reading Aloud on Sentences and Pseudowords. Jorge Proença, Carla Lopes, Michael Tjalve, Andreas Stolcke, Sara Candeias, Fernando Perdigão |
| 2017 | Automatic Explanation Spot Estimation Method Targeted at Text and Figures in Lecture Slides. Shoko Tsujimura, Kazumasa Yamamoto, Seiichi Nakagawa |
| 2017 | Automatic Labelling of Prosodic Prominence, Phrasing and Disfluencies in French Speech by Simulating the Perception of Naïve and Expert Listeners. George Christodoulides, Mathieu Avanzi, Anne-Catherine Simon |
| 2017 | Automatic Measurement of Pre-Aspiration. Yaniv Sheena, Mísa Hejná, Yossi Adi, Joseph Keshet |
| 2017 | Automatic Paraphasia Detection from Aphasic Speech: A Preliminary Study. Duc Le, Keli Licata, Emily Mower Provost |
| 2017 | Automatic Prediction of Speech Evaluation Metrics for Dysarthric Speech. Imed Laaridh, Waad Ben Kheder, Corinne Fredouille, Christine Meunier |
| 2017 | Automatic Scoring of Shadowing Speech Based on DNN Posteriors and Their DTW. Junwei Yue, Fumiya Shiozawa, Shohei Toyama, Yutaka Yamauchi, Kayoko Ito, Daisuke Saito, Nobuaki Minematsu |
| 2017 | Automatic Time-Frequency Analysis of Echolocation Signals Using the Matched Gaussian Multitaper Spectrogram. Maria Sandsten, Isabella Reinhold, Josefin Starkhammar |
| 2017 | Backstitch: Counteracting Finite-Sample Bias via Negative Steps. Yiming Wang, Vijayaditya Peddinti, Hainan Xu, Xiaohui Zhang, Daniel Povey, Sanjeev Khudanpur |
| 2017 | Beyond the Listening Test: An Interactive Approach to TTS Evaluation. Joseph Mendelson, Matthew P. Aylett |
| 2017 | Bias and Statistical Significance in Evaluating Speech Synthesis with Mean Opinion Scores. Andrew Rosenberg, Bhuvana Ramabhadran |
| 2017 | Bidirectional LSTM-RNN for Improving Automated Assessment of Non-Native Children's Speech. Yao Qian, Keelan Evanini, Xinhao Wang, Chong Min Lee, Matthew Mulholland |
| 2017 | Bidirectional Modelling for Short Duration Language Identification. Sarith Fernando, Vidhyasaharan Sethu, Eliathamby Ambikairajah, Julien Epps |
| 2017 | Big Five vs. Prosodic Features as Cues to Detect Abnormality in SSPNET-Personality Corpus. Cédric Fayet, Arnaud Delhay, Damien Lolive, Pierre-François Marteau |
| 2017 | Bilingual Word Embeddings for Cross-Lingual Personality Recognition Using Convolutional Neural Nets. Farhad Bin Siddique, Pascale Fung |
| 2017 | Bimodal Recurrent Neural Network for Audiovisual Voice Activity Detection. Fei Tao, Carlos Busso |
| 2017 | Binary Deep Neural Networks for Speech Recognition. Xu Xiang, Yanmin Qian, Kai Yu |
| 2017 | Binary Mask Estimation Strategies for Constrained Imputation-Based Speech Enhancement. Ricard Marxer, Jon Barker |
| 2017 | Binaural Reverberant Speech Separation Based on Deep Neural Networks. Xueliang Zhang, DeLiang Wang |
| 2017 | Bob Speaks Kaldi. Milos Cernak, Alain Komaty, Amir Mohammadi, André Anjos, Sébastien Marcel |
| 2017 | Building ASR Corpora Using Eyra. Jón Guðnason, Matthías Pétursson, Róbert Kjaran, Simon Klüpfel, Anna Björk Nikulásdóttir |
| 2017 | Building Audio-Visual Phonetically Annotated Arabic Corpus for Expressive Text to Speech. Omnia Abdo, Sherif M. Abdou, Mervat Fashal |
| 2017 | Building an ASR Corpus Using Althingi's Parliamentary Speeches. Inga Rún Helgadóttir, Róbert Kjaran, Anna Björk Nikulásdóttir, Jón Guðnason |
| 2017 | CAB: An Energy-Based Speaker Clustering Model for Rapid Adaptation in Non-Parallel Voice Conversion. Toru Nakashika |
| 2017 | CALYOU: A Comparable Spoken Algerian Corpus Harvested from YouTube. Karima Abidi, Mohamed Amine Menacer, Kamel Smaïli |
| 2017 | CNN-Based Joint Mapping of Short and Long Utterance i-Vectors for Speaker Verification Using Short Utterances. Jinxi Guo, Usha Amrutha Nookala, Abeer Alwan |
| 2017 | CTC Training of Multi-Phone Acoustic Models for Speech Recognition. Olivier Siohan |
| 2017 | CTC in the Context of Generalized Full-Sum HMM Training. Albert Zeyer, Eugen Beck, Ralf Schlüter, Hermann Ney |
| 2017 | Calibration Approaches for Language Detection. Mitchell McLaren, Luciana Ferrer, Diego Castán, Aaron Lawson |
| 2017 | Call My Net Corpus: A Multilingual Corpus for Evaluation of Speaker Recognition Technology. Karen Jones, Stephanie M. Strassel, Kevin Walker, David Graff, Jonathan Wright |
| 2017 | Canonical Correlation Analysis and Prediction of Perceived Rhythmic Prominences and Pitch Tones in Speech. Elizabeth Godoy, James R. Williamson, Thomas F. Quatieri |
| 2017 | Capturing Long-Term Temporal Dependencies with Convolutional Networks for Continuous Emotion Recognition. Soheil Khorram, Zakaria Aldeneh, Dimitrios Dimitriadis, Melvin G. McInnis, Emily Mower Provost |
| 2017 | Cepstral and Entropy Analyses in Vowels Excerpted from Continuous Speech of Dysphonic and Control Speakers. Antonella Castellana, Andreas Selamtzis, Giampiero Salvi, Alessio Carullo, Arianna Astolfi |
| 2017 | Changes in Early L2 Cue-Weighting of Non-Native Speech: Evidence from Learners of Mandarin Chinese. Seth Wiener |
| 2017 | Channel Compensation in the Generalised Vector Taylor Series Approach to Robust ASR. Erfan Loweimi, Jon Barker, Thomas Hain |
| 2017 | Character-Based Embedding Models and Reranking Strategies for Understanding Natural Language Meal Descriptions. Mandy Korpusik, Zachary Collins, James R. Glass |
| 2017 | ChunkitApp: Investigating the Relevant Units of Online Speech Processing. Svetlana Vetchinnikova, Anna Mauranen, Nina Mikusová |
| 2017 | Classification of Bulbar ALS from Kinematic Features of the Jaw and Lips: Towards Computer-Mediated Assessment. Andrea Bandini, Jordan R. Green, Lorne Zinman, Yana Yunusova |
| 2017 | Classification-Based Detection of Glottal Closure Instants from Speech Signals. Jindrich Matousek, Daniel Tihelka |
| 2017 | Clear Speech - Mere Speech? How Segmental and Prosodic Speech Reduction Shape the Impression That Speakers Create on Listeners. Oliver Niebuhr |
| 2017 | ClockWork-RNN Based Architectures for Slot Filling. Despoina Georgiadou, Vassilios Diakoloukas, Vassilios Tsiaras, Vassilios Digalakis |
| 2017 | Co-Production of Speech and Pointing Gestures in Clear and Perturbed Interactive Tasks: Multimodal Designation Strategies. Marion Dohen, Benjamin Roustan |
| 2017 | Coherence-Based Dual-Channel Noise Reduction Algorithm in a Complex Noisy Environment. Youna Ji, Jun Byun, Young-Cheol Park |
| 2017 | Combined Multi-Channel NMF-Based Robust Beamforming for Noisy Speech Recognition. Masato Mimura, Yoshiaki Bando, Kazuki Shimada, Shinsuke Sakai, Kazuyoshi Yoshii, Tatsuya Kawahara |
| 2017 | Combining Gaussian Mixture Models and Segmental Feature Models for Speaker Recognition. Milana Milosevic, Ulrike Glavitsch |
| 2017 | Combining Residual Networks with LSTMs for Lipreading. Themos Stafylakis, Georgios Tzimiropoulos |
| 2017 | Combining Speaker Turn Embedding and Incremental Structure Prediction for Low-Latency Speaker Diarization. Guillaume Wisniewski, Hervé Bredin, Gregory Gelly, Claude Barras |
| 2017 | Comparing Human and Machine Errors in Conversational Speech Transcription. Andreas Stolcke, Jasha Droppo |
| 2017 | Comparing Languages Using Hierarchical Prosodic Analysis. Juraj Simko, Antti Suni, Katri Hiovain, Martti Vainio |
| 2017 | Comparison of Basic Beatboxing Articulations Between Expert and Novice Artists Using Real-Time Magnetic Resonance Imaging. Nimisha Patil, Timothy Greer, Reed Blaylock, Shrikanth S. Narayanan |
| 2017 | Comparison of Decoding Strategies for CTC Acoustic Models. Thomas Zenkel, Ramon Sanabria, Florian Metze, Jan Niehues, Matthias Sperber, Sebastian Stüker, Alex Waibel |
| 2017 | Comparison of Modeling Target in LSTM-RNN Duration Model. Bo Chen, Jiahao Lai, Kai Yu |
| 2017 | Comparison of Non-Parametric Bayesian Mixture Models for Syllable Clustering and Zero-Resource Speech Processing. Shreyas Seshadri, Ulpu Remes, Okko Räsänen |
| 2017 | Compensating Gender Variability in Query-by-Example Search on Speech Using Voice Conversion. Paula Lopez-Otero, Laura Docío Fernández, Carmen García-Mateo |
| 2017 | Complex-Valued Restricted Boltzmann Machine for Direct Learning of Frequency Spectra. Toru Nakashika, Shinji Takaki, Junichi Yamagishi |
| 2017 | Complexity in Speech and its Relation to Emotional Bond in Therapist-Patient Interactions During Suicide Risk Assessment Interviews. Md. Nasir, Brian R. Baucom, Craig J. Bryan, Shrikanth S. Narayanan, Panayiotis G. Georgiou |
| 2017 | Compressed Time Delay Neural Network for Small-Footprint Keyword Spotting. Ming Sun, David Snyder, Yixin Gao, Varun K. Nagaraja, Mike Rodehorst, Sankaran Panchapagesan, Nikko Strom, Spyros Matsoukas, Shiv Vitaladevuni |
| 2017 | Computational Analysis of Acoustic Descriptors in Psychotic Patients. Torsten Wörtwein, Tadas Baltrusaitis, Eugene Laksana, Luciana Pennant, Elizabeth S. Liebson, Dost Öngür, Justin T. Baker, Louis-Philippe Morency |
| 2017 | Computational Simulations of Temporal Vocalization Behavior in Adult-Child Interaction. Ellen Marklund, David Pagmar, Tove Gerholm, Lisa Gustavsson |
| 2017 | Computing Multimodal Dyadic Behaviors During Spontaneous Diagnosis Interviews Toward Automatic Categorization of Autism Spectrum Disorder. Chin-Po Chen, Xian-Hong Tseng, Susan Shur-Fen Gau, Chi-Chun Lee |
| 2017 | Concatenative Resynthesis Using Twin Networks. Soumi Maiti, Michael I. Mandel |
| 2017 | Conditional Generative Adversarial Nets Classifier for Spoken Language Identification. Peng Shen, Xugang Lu, Sheng Li, Hisashi Kawai |
| 2017 | Conditional Generative Adversarial Networks for Speech Enhancement and Noise-Robust Speaker Verification. Daniel Michelsanti, Zheng-Hua Tan |
| 2017 | Constructing Acoustic Distances Between Subwords and States Obtained from a Deep Neural Network for Spoken Term Detection. Daisuke Kaneko, Ryota Konno, Kazunori Kojima, Kazuyo Tanaka, Shi-wook Lee, Yoshiaki Itoh |
| 2017 | Content Normalization for Text-Dependent Speaker Verification. Subhadeep Dey, Srikanth R. Madikeri, Petr Motlícek, Marc Ferras |
| 2017 | Context Regularity Indexed by Auditory N1 and P2 Event-Related Potentials. Xiao Wang, Yanhui Zhang, Gang Peng |
| 2017 | Controlling Prominence Realisation in Parametric DNN-Based Speech Synthesis. Zofia Malisz, Harald Berthelsen, Jonas Beskow, Joakim Gustafson |
| 2017 | Conversing with Social Agents That Smile and Laugh. Catherine Pelachaud |
| 2017 | Convolutional Neural Network to Model Articulation Impairments in Patients with Parkinson's Disease. Juan Camilo Vásquez-Correa, Juan Rafael Orozco-Arroyave, Elmar Nöth |
| 2017 | Convolutional Recurrent Neural Networks for Small-Footprint Keyword Spotting. Sercan Ömer Arik, Markus Kliegl, Rewon Child, Joel Hestness, Andrew Gibiansky, Christopher Fougner, Ryan Prenger, Adam Coates |
| 2017 | Countermeasures for Automatic Speaker Verification Replay Spoofing Attack : On Data Augmentation, Feature Representation, Classification and Fusion. Weicheng Cai, Danwei Cai, Wenbo Liu, Gang Li, Ming Li |
| 2017 | Coupled Initialization of Multi-Channel Non-Negative Matrix Factorization Based on Spatial and Spectral Information. Yuuki Tachioka, Tomohiro Narita, Iori Miura, Takanobu Uramoto, Natsuki Monta, Shingo Uenohara, Ken'ichi Furuya, Shinji Watanabe, Jonathan Le Roux |
| 2017 | Creak as a Feature of Lexical Stress in Estonian. Kätlin Aare, Pärtel Lippus, Juraj Simko |
| 2017 | Creaky Voice as a Function of Tonal Categories and Prosodic Boundaries. Jianjing Kuang |
| 2017 | Creating a Voice for MiRo, the World's First Commercial Biomimetic Robot. Roger K. Moore, Ben Mitchinson |
| 2017 | Critical Articulators Identification from RT-MRI of the Vocal Tract. Samuel Silva, António J. S. Teixeira |
| 2017 | Cross-Database Models for the Classification of Dysarthria Presence. Stephanie Gillespie, Yash-Yee Logan, Elliot Moore, Jacqueline Laures-Gore, Scott Russell, Rupal Patel |
| 2017 | Cross-Domain Classification of Drowsiness in Speech: The Case of Alcohol Intoxication and Sleep Deprivation. Yue Zhang, Felix Weninger, Björn W. Schuller |
| 2017 | Cross-Linguistic Distinctions Between Professional and Non-Professional Speaking Styles. Plínio Almeida Barbosa, Sandra Madureira, Philippe Boula de Mareüil |
| 2017 | Cross-Linguistic Study of the Production of Turn-Taking Cues in American English and Argentine Spanish. Pablo Brusco, Juan Manuel Pérez, Agustín Gravano |
| 2017 | Cross-Modal Analysis Between Phonation Differences and Texture Images Based on Sentiment Correlations. Win Thuzar Kyaw, Yoshinori Sagisaka |
| 2017 | Cross-Speaker Variation in Voice Source Correlates of Focus and Deaccentuation. Irena Yanushevskaya, Ailbhe Ní Chasaide, Christer Gobl |
| 2017 | Cross-Subject Continuous Emotion Recognition Using Speech and Body Motion in Dyadic Interactions. Syeda Narjis Fatima, Engin Erzin |
| 2017 | Crowd-Sourced Design of Artificial Attentive Listeners. Catharine Oertel, Patrik Jonell, Dimosthenis Kontogiorgos, Joseph Mendelson, Jonas Beskow, Joakim Gustafson |
| 2017 | Crowdsourcing Universal Part-of-Speech Tags for Code-Switching. Victor Soto, Julia Hirschberg |
| 2017 | Curriculum Learning Based Probabilistic Linear Discriminant Analysis for Noise Robust Speaker Recognition. Shivesh Ranjan, Abhinav Misra, John H. L. Hansen |
| 2017 | DNN Bottleneck Features for Speaker Clustering. Jesús Jorrín, Paola García, Luis Buera |
| 2017 | DNN i-Vector Speaker Verification with Short, Text-Constrained Test Utterances. Jinghua Zhong, Wenping Hu, Frank K. Soong, Helen Meng |
| 2017 | DNN-Based Feature Extraction and Classifier Combination for Child-Directed Speech, Cold and Snoring Identification. Gábor Gosztolya, Róbert Busa-Fekete, Tamás Grósz, László Tóth |
| 2017 | DNN-Based Ultrasound-to-Speech Conversion for a Silent Speech Interface. Tamás Gábor Csapó, Tamás Grósz, Gábor Gosztolya, László Tóth, Alexandra Markó |
| 2017 | DNN-SPACE: DNN-HMM-Based Generative Model of Voice F Nobukatsu Hojo, Yasuhito Ohsugi, Yusuke Ijima, Hirokazu Kameoka |
| 2017 | Data Augmentation, Missing Feature Mask and Kernel Classification for Through-the-Wall Acoustic Surveillance. Tran Huy Dat, Wen Zheng Terence Ng, Yi Ren Leng |
| 2017 | Database of Volumetric and Real-Time Vocal Tract MRI for Speech Science. Tanner Sorensen, Zisis Iason Skordilis, Asterios Toutios, Yoon-Chul Kim, Yinghua Zhu, Jangwon Kim, Adam C. Lammert, Vikram Ramanarayanan, Louis Goldstein, Dani Byrd, Krishna S. Nayak, Shrikanth S. Narayanan |
| 2017 | Deep Activation Mixture Model for Speech Recognition. Chunyang Wu, Mark J. F. Gales |
| 2017 | Deep Auto-Encoder Based Multi-Task Learning Using Probabilistic Transcriptions. Amit Das, Mark Hasegawa-Johnson, Karel Veselý |
| 2017 | Deep Autoencoder Based Speech Features for Improved Dysarthric Speech Recognition. Bhavik Vachhani, Chitralekha Bhat, Biswajit Das, Sunil Kumar Kopparapu |
| 2017 | Deep Clustering-Based Beamforming for Separation with Unknown Number of Sources. Takuya Higuchi, Keisuke Kinoshita, Marc Delcroix, Katerina Zmolíková, Tomohiro Nakatani |
| 2017 | Deep Learning Techniques in Tandem with Signal Processing Cues for Phonetic Segmentation for Text to Speech Synthesis in Indian Languages. Arun Baby, Jeena J. Prakash, S. Rupak Vignesh, Hema A. Murthy |
| 2017 | Deep Learning-Based Telephony Speech Recognition in the Wild. Kyu Jeong Han, Seongjun Hahm, Byung-Hak Kim, Jungsuk Kim, Ian R. Lane |
| 2017 | Deep Least Squares Regression for Speaker Adaptation. Younggwan Kim, Hyungjun Lim, Jahyun Goo, Hoirin Kim |
| 2017 | Deep Neural Factorization for Speech Recognition. Jen-Tzung Chien, Chen Shen |
| 2017 | Deep Neural Network Embeddings for Text-Independent Speaker Verification. David Snyder, Daniel Garcia-Romero, Daniel Povey, Sanjeev Khudanpur |
| 2017 | Deep Recurrent Neural Network Based Monaural Speech Separation Using Recurrent Temporal Restricted Boltzmann Machines. Suman Samui, Indrajit Chakrabarti, Soumya K. Ghosh |
| 2017 | Deep Reinforcement Learning of Dialogue Policies with Less Weight Updates. Heriberto Cuayáhuitl, Seunghak Yu |
| 2017 | Deep Speaker Embeddings for Short-Duration Speaker Verification. Gautam Bhattacharya, Jahangir Alam, Patrick Kenny |
| 2017 | Deep Speaker Feature Learning for Text-Independent Speaker Verification. Lantian Li, Yixiang Chen, Ying Shi, Zhiyuan Tang, Dong Wang |
| 2017 | Denoising Recurrent Neural Network for Deep Bidirectional LSTM Based Voice Conversion. Jie Wu, Dong-Yan Huang, Lei Xie, Haizhou Li |
| 2017 | Depression Detection Using Automatic Transcriptions of De-Identified Speech. Paula Lopez-Otero, Laura Docío Fernández, Alberto Abad, Carmen García-Mateo |
| 2017 | Deriving Dyad-Level Interaction Representation Using Interlocutors Structural and Expressive Multimodal Behavior Features. Yun-Shao Lin, Chi-Chun Lee |
| 2017 | Description of the Homebank Child/Adult Addressee Corpus (HB-CHAAC). Elika Bergelson, Andrei Amatuni, Marisa Casillas, Amanda Seidl, Melanie Soderstrom, Anne S. Warlaumont |
| 2017 | Description of the Munich-Passau Snore Sound Corpus (MPSSC). Christoph Janott, Anton Batliner |
| 2017 | Description of the Upper Respiratory Tract Infection Corpus (URTIC). Jarek Krajewski, Sebastian Schnieder, Anton Batliner |
| 2017 | Detecting Overlapped Speech on Short Timeframes Using Deep Learning. Valentin Andrei, Horia Cucu, Corneliu Burileanu |
| 2017 | Detection of Mispronunciations and Disfluencies in Children Reading Aloud. Jorge Proença, Carla Lopes, Michael Tjalve, Andreas Stolcke, Sara Candeias, Fernando Perdigão |
| 2017 | Detection of Replay Attacks Using Single Frequency Filtering Cepstral Coefficients. K. N. R. K. Raju Alluri, Sivanand Achanta, Sudarsana Reddy Kadiri, Suryakanth V. Gangashetty, Anil Kumar Vuppala |
| 2017 | Developing On-Line Speaker Diarization System. Dimitrios Dimitriadis, Petr Fousek |
| 2017 | Developing an Embosi (Bantu C25) Speech Variant Dictionary to Model Vowel Elision and Morpheme Deletion. Jamison Cooper-Leavitt, Lori Lamel, Annie Rialland, Martine Adda-Decker, Gilles Adda |
| 2017 | Dialect Perception by Older Children. Ewa Jacewicz, Robert Allen Fox |
| 2017 | Dialect Recognition Based on Unsupervised Bottleneck Features. Qian Zhang, John H. L. Hansen |
| 2017 | Dialogue as Collaborative Problem Solving. James Allen |
| 2017 | Direct Acoustics-to-Word Models for English Conversational Speech Recognition. Kartik Audhkhasi, Bhuvana Ramabhadran, George Saon, Michael Picheny, David Nahamoo |
| 2017 | Direct Modeling of Frequency Spectra and Waveform Generation Based on Phase Recovery for DNN-Based Speech Synthesis. Shinji Takaki, Hirokazu Kameoka, Junichi Yamagishi |
| 2017 | Direct Modelling of Magnitude and Phase Spectra for Statistical Parametric Speech Synthesis. Felipe Espic, Cassia Valentini-Botinhao, Simon King |
| 2017 | Directing Attention During Perceptual Training: A Preliminary Study of Phonetic Learning in Southern Min by Mandarin Speakers. Ying Chen, Eric Pederson |
| 2017 | Disambiguate or not? - The Role of Prosody in Unambiguous and Potentially Ambiguous Anaphora Production in Strictly Mandarin Parallel Structures. Luying Hou, Bert Le Bruyn, René Kager |
| 2017 | Discovering Language in Marmoset Vocalization. Sakshi Verma, K. L. Prateek, Karthik Pandia, Nauman Dawalatabad, Rogier Landman, Jitendra Sharma, Mriganka Sur, Hema A. Murthy |
| 2017 | Discrete Duration Model for Speech Synthesis. Bo Chen, Tianling Bian, Kai Yu |
| 2017 | Discretized Continuous Speech Emotion Recognition with Multi-Task Deep Recurrent Neural Network. Duc Le, Zakaria Aldeneh, Emily Mower Provost |
| 2017 | Discriminative Autoencoders for Acoustic Modeling. Ming-Han Yang, Hung-Shin Lee, Yu-Ding Lu, Kuan-Yu Chen, Yu Tsao, Berlin Chen, Hsin-Min Wang |
| 2017 | Discussion. Björn W. Schuller, Anton Batliner |
| 2017 | Distilling Knowledge from an Ensemble of Models for Punctuation Prediction. Jiangyan Yi, Jianhua Tao, Zhengqi Wen, Ya Li |
| 2017 | Does Posh English Sound Attractive? Li Jiao, Chengxia Wang, Cristiane Hsu, Peter Birkholz, Yi Xu |
| 2017 | Domain Adaptation of PLDA Models in Broadcast Diarization by Means of Unsupervised Speaker Clustering. Ignacio Viñals, Alfonso Ortega, Jesús Antonio Villalba López, Antonio Miguel, Eduardo Lleida |
| 2017 | Domain Mismatch Modeling of Out-Domain i-Vectors for PLDA Speaker Verification. Md. Hafizur Rahman, Ivan Himawan, David Dean, Sridha Sridharan |
| 2017 | Domain-Independent User Satisfaction Reward Estimation for Dialogue Policy Learning. Stefan Ultes, Pawel Budzianowski, Iñigo Casanueva, Nikola Mrksic, Lina Maria Rojas-Barahona, Pei-Hao Su, Tsung-Hsien Wen, Milica Gasic, Steve J. Young |
| 2017 | Domain-Specific Utterance End-Point Detection for Speech Recognition. Roland Maas, Ariya Rastrow, Kyle Goehner, Gautam Tiwari, Shaun Joseph, Björn Hoffmeister |
| 2017 | Dominant Distortion Classification for Pre-Processing of Vowels in Remote Biomedical Voice Analysis. Amir Hossein Poorjam, Jesper Rindom Jensen, Max A. Little, Mads Græsbøll Christensen |
| 2017 | Don't Count on ASR to Transcribe for You: Breaking Bias with Two Crowds. Michael Levit, Yan Huang, Shuangyu Chang, Yifan Gong |
| 2017 | Duration Mismatch Compensation Using Four-Covariance Model and Deep Neural Network for Speaker Verification. Pierre-Michel Bousquet, Mickael Rouvier |
| 2017 | Dynamic Layer Normalization for Adaptive Neural Acoustic Modeling in Speech Recognition. Taesup Kim, Inchul Song, Yoshua Bengio |
| 2017 | Dysprosody Differentiate Between Parkinson's Disease, Progressive Supranuclear Palsy, and Multiple System Atrophy. Jan Hlavnicka, Tereza Tykalová, Roman Cmejla, Jirí Klempír, Evzen Ruzicka, Jan Rusz |
| 2017 | Earlier Identification of Children with Autism Spectrum Disorder: An Automatic Vocalisation-Based Approach. Florian B. Pokorny, Björn W. Schuller, Peter B. Marschik, Raymond Brueckner, Pär Nyström, Nicholas Cummins, Sven Bölte, Christa Einspieler, Terje Falck-Ytter |
| 2017 | Effect of Formant and F0 Discontinuity on Perceived Vowel Duration: Impacts for Concatenative Speech Synthesis. Tomás Boril, Pavel Sturm, Radek Skarnitzl, Jan Volín |
| 2017 | Effect of Language, Speaking Style and Speaker on Long-Term F0 Estimation. Pablo Arantes, Anders Eriksson, Suska Gutzeit |
| 2017 | Effectively Building Tera Scale MaxEnt Language Models Incorporating Non-Linguistic Signals. Fadi Biadsy, Mohammadreza Ghodsi, Diamantino Caseiro |
| 2017 | Effects of Pitch Fall and L1 on Vowel Length Identification in L2 Japanese. Izumi Takiguchi |
| 2017 | Effects of Talker Dialect, Gender & Race on Accuracy of Bing Speech and YouTube Automatic Captions. Rachael Tatman, Conner Kasten |
| 2017 | Effects of Training Data Variety in Generating Glottal Pulses from Acoustic Features with DNNs. Manu Airaksinen, Paavo Alku |
| 2017 | Efficient Emotion Recognition from Speech Using Deep Learning on Spectrograms. Aharon Satt, Shai Rozenberg, Ron Hoory |
| 2017 | Efficient Knowledge Distillation from an Ensemble of Teachers. Takashi Fukuda, Masayuki Suzuki, Gakuto Kurata, Samuel Thomas, Jia Cui, Bhuvana Ramabhadran |
| 2017 | Eigenvector-Based Speech Mask Estimation Using Logistic Regression. Lukas Pfeifenberger, Matthias Zöhrer, Franz Pernkopf |
| 2017 | Electrophysiological Correlates of Familiar Voice Recognition. Julien Plante-Hébert, Victor J. Boucher, Boutheina Jemel |
| 2017 | Elicitation Design for Acoustic Depression Classification: An Investigation of Articulation Effort, Linguistic Complexity, and Word Affect. Brian Stasak, Julien Epps, Roland Goecke |
| 2017 | Eliciting Meaningful Units from Speech. Daniil Kocharov, Tatiana Kachkovskaia, Pavel A. Skrelin |
| 2017 | Embedding-Based Speaker Adaptive Training of Deep Neural Networks. Xiaodong Cui, Vaibhava Goel, George Saon |
| 2017 | Emojive! Collecting Emotion Data from Speech and Facial Expression Using Mobile Game App. Ji Ho Park, Nayeon Lee, Dario Bertero, Anik Dey, Pascale Fung |
| 2017 | Emotion Category Mapping to Emotional Space by Cross-Corpus Emotion Labeling. Yoshiko Arimoto, Hiroki Mori |
| 2017 | Emotional Features for Speech Overlaps Classification. Olga Egorow, Andreas Wendemuth |
| 2017 | Emotional Speech of Mentally and Physically Disabled Individuals: Introducing the EmotAsS Database and First Findings. Simone Hantke, Hesam Sagha, Nicholas Cummins, Björn W. Schuller |
| 2017 | Emotional Thin-Slicing: A Proposal for a Short- and Long-Term Division of Emotional Speech. Daniel Oliveira Peres, Dominic Watt, Waldemar Ferreira Netto |
| 2017 | Emotional Voice Conversion with Adaptive Scales F0 Based on Wavelet Transform Using Limited Amount of Emotional Data. Zhaojie Luo, Jinhui Chen, Tetsuya Takiguchi, Yasuo Ariki |
| 2017 | Empirical Evaluation of Parallel Training Algorithms on Acoustic Modeling. Wenpeng Li, Binbin Zhang, Lei Xie, Dong Yu |
| 2017 | Empirical Exploration of Novel Architectures and Objectives for Language Models. Gakuto Kurata, Abhinav Sethy, Bhuvana Ramabhadran, George Saon |
| 2017 | End-of-Utterance Prediction by Prosodic Features and Phrase-Dependency Structure in Spontaneous Japanese Speech. Yuichi Ishimoto, Takehiro Teraoka, Mika Enomoto |
| 2017 | End-to-End Acoustic Feedback in Language Learning for Correcting Devoiced French Final-Fricatives. Sucheta Ghosh, Camille Fauth, Yves Laprie, Aghilas Sini |
| 2017 | End-to-End Deep Learning Framework for Speech Paralinguistics Detection Based on Perception Aware Spectrum. Danwei Cai, Zhidong Ni, Wenbo Liu, Weicheng Cai, Gang Li, Ming Li |
| 2017 | End-to-End Language Identification Using High-Order Utterance Representation with Bilinear Pooling. Ma Jin, Yan Song, Ian Vince McLoughlin, Wu Guo, Li-Rong Dai |
| 2017 | End-to-End Speech Recognition with Auditory Attention for Multi-Microphone Distance Speech Recognition. Suyoun Kim, Ian R. Lane |
| 2017 | End-to-End Text-Independent Speaker Verification with Triplet Loss on Short Utterances. Chunlei Zhang, Kazuhito Koishida |
| 2017 | End-to-End Training of Acoustic Models for Large Vocabulary Continuous Speech Recognition with TensorFlow. Ehsan Variani, Tom Bagby, Erik McDermott, Michiel Bacchiani |
| 2017 | Endpoint Detection Using Grid Long Short-Term Memory Networks for Streaming Speech Recognition. Shuo-Yiin Chang, Bo Li, Tara N. Sainath, Gabor Simko, Carolina Parada |
| 2017 | English Conversational Telephone Speech Recognition by Humans and Machines. George Saon, Gakuto Kurata, Tom Sercu, Kartik Audhkhasi, Samuel Thomas, Dimitrios Dimitriadis, Xiaodong Cui, Bhuvana Ramabhadran, Michael Picheny, Lynn-Li Lim, Bergul Roomi, Phil Hall |
| 2017 | Enhanced Feature Extraction for Speech Detection in Media Audio. Inseon Jang, Chunghyun Ahn, Jeongil Seo, Younseon Jang |
| 2017 | Enhancing Backchannel Prediction Using Word Embeddings. Robin Ruede, Markus Müller, Sebastian Stüker, Alex Waibel |
| 2017 | Ensemble Learning for Countermeasure of Audio Replay Spoofing Attack in ASVspoof2017. Zhe Ji, Zhi-Yi Li, Peng Li, MaoBo An, Shengxiang Gao, Dan Wu, Faru Zhao |
| 2017 | Ensembles of Multi-Scale VGG Acoustic Models. Michael Heck, Masayuki Suzuki, Takashi Fukuda, Gakuto Kurata, Satoshi Nakamura |
| 2017 | Entrainment in Multi-Party Spoken Dialogues at Multiple Linguistic Levels. Zahra Rahimi, Anish Kumar, Diane J. Litman, Susannah Paletz, Mingzhi Yu |
| 2017 | Estimating Speaker Clustering Quality Using Logistic Regression. Yishai Cohen, Itshak Lapidot |
| 2017 | Estimation of Gap Between Current Language Models and Human Performance. Xiaoyu Shen, Youssef Oualil, Clayton Greenberg, Mittul Singh, Dietrich Klakow |
| 2017 | Estimation of Place of Articulation of Fricatives from Spectral Characteristics for Speech Training. K. S. Nataraj, Prem C. Pandey, Hirak Dasgupta |
| 2017 | Estimation of the Probability Distribution of Spectral Fine Structure in the Speech Source. Tom Bäckström |
| 2017 | Evaluating Automatic Topic Segmentation as a Segment Retrieval Task. Abdessalam Bouchekif, Delphine Charlet, Géraldine Damnati, Nathalie Camelin, Yannick Estève |
| 2017 | Evaluation of Spectral Tilt Measures for Sentence Prominence Under Different Noise Conditions. Sofoklis Kakouros, Okko Räsänen, Paavo Alku |
| 2017 | Evaluation of a Silent Speech Interface Based on Magnetic Sensing and Deep Learning for a Phonetically Rich Vocabulary. José A. González, Lam Aun Cheah, Phil D. Green, James M. Gilbert, Stephen R. Ell, Roger K. Moore, Ed Holdsworth |
| 2017 | Evaluation of the Neurological State of People with Parkinson's Disease Using i-Vectors. Nicanor García, Juan Rafael Orozco-Arroyave, Luis Fernando D'Haro, Najim Dehak, Elmar Nöth |
| 2017 | Event-Related Potentials Associated with Somatosensory Effect in Audio-Visual Speech Perception. Takayuki Ito, Hiroki Ohashi, Eva Montas, Vincent L. Gracco |
| 2017 | Evolving Recurrent Neural Networks That Process and Classify Raw Audio in a Streaming Fashion. Adrien Daniel |
| 2017 | Excitation Source Features for Improving the Detection of Vowel Onset and Offset Points in a Speech Sequence. Gayadhar Pradhan, Avinash Kumar, Syed Shahnawazuddin |
| 2017 | Experimental Analysis of Features for Replay Attack Detection - Results on the ASVspoof 2017 Challenge. Roberto Font, Juan M. Espín, María José Cano |
| 2017 | Experiments in Character-Level Neural Network Models for Punctuation. William Gale, Sarangarajan Parthasarathy |
| 2017 | Exploiting Eigenposteriors for Semi-Supervised Training of DNN Acoustic Models with Sequence Discrimination. Pranay Dighe, Afsaneh Asaei, Hervé Bourlard |
| 2017 | Exploiting Intra-Annotator Rating Consistency Through Copeland's Method for Estimation of Ground Truth Labels in Couples' Therapy. Karel Mundnich, Md. Nasir, Panayiotis G. Georgiou, Shrikanth S. Narayanan |
| 2017 | Exploiting Untranscribed Broadcast Data for Improved Code-Switching Detection. Emre Yilmaz, Henk van den Heuvel, David A. van Leeuwen |
| 2017 | Exploring Dynamic Measures of Stance in Spoken Interaction. Gina-Anne Levow, Richard A. Wright |
| 2017 | Exploring Fusion Methods and Feature Space for the Classification of Paralinguistic Information. David Tavarez, Xabier Sarasola, Agustín Alonso, Jon Sánchez, Luis Serrano, Eva Navas, Inma Hernáez |
| 2017 | Exploring Low-Dimensional Structures of Modulation Spectra for Robust Speech Recognition. Bi-Cheng Yan, Chin-Hong Shih, Shih-Hung Liu, Berlin Chen |
| 2017 | Exploring Multidimensionality: Acoustic and Articulatory Correlates of Swedish Word Accents. Malin Svensson Lundmark, Gilbert Ambrazaitis, Otto Ewald |
| 2017 | Exploring the Use of Significant Words Language Modeling for Spoken Document Retrieval. Ying-Wen Chen, Kuan-Yu Chen, Hsin-Min Wang, Berlin Chen |
| 2017 | Extended Variability Modeling and Unsupervised Adaptation for PLDA Speaker Recognition. Alan McCree, Gregory Sell, Daniel Garcia-Romero |
| 2017 | Extending the EMU Speech Database Management System: Cloud Hosting, Team Collaboration, Automatic Revision Control. Markus Jochim |
| 2017 | Extracting Situation Frames from Non-English Speech: Evaluation Framework and Pilot Results. Nikolaos Malandrakis, Ondrej Glembek, Shrikanth S. Narayanan |
| 2017 | Factored Deep Convolutional Neural Networks for Noise Robust Speech Recognition. Masakiyo Fujimoto |
| 2017 | Factorial Modeling for Effective Suppression of Directional Noise. Osamu Ichikawa, Takashi Fukuda, Gakuto Kurata, Steven J. Rennie |
| 2017 | Factorised Representations for Neural Network Adaptation to Diverse Acoustic Environments. Joachim Fainberg, Steve Renals, Peter Bell |
| 2017 | Factors Affecting the Intelligibility of Low-Pass Filtered Speech. Lei Wang, Fei Chen |
| 2017 | Fast Neural Network Language Model Lookups at N-Gram Speeds. Yinghui Huang, Abhinav Sethy, Bhuvana Ramabhadran |
| 2017 | Fast and Accurate OOV Decoder on High-Level Features. Yuri Y. Khokhlov, Natalia A. Tomashenko, Ivan Medennikov, Aleksei Romanenko |
| 2017 | Feature Selection Based on CQCCs for Automatic Speaker Verification Spoofing. Xianliang Wang, YanHong Xiao, Xuan Zhu |
| 2017 | First Results in Developing a Medieval Latin Language Charter Dictation System for the East-Central Europe Region. Péter Mihajlik, Lili Szabó, Balázs Tarján, András Balog, Krisztina Rábai |
| 2017 | Float Like a Butterfly Sting Like a Bee: Changes in Speech Preceded Parkinsonism Diagnosis for Muhammad Ali. Visar Berisha, Julie Liss, Timothy Huston, Alan Wisler, Yishan Jiao, Jonathan Eig |
| 2017 | Focus Acoustics in Mandarin Nominals. Yu-Yin Hsu, Anqi Xu |
| 2017 | Forward-Backward Convolutional LSTM for Acoustic Modeling. Shigeki Karita, Atsunori Ogawa, Marc Delcroix, Tomohiro Nakatani |
| 2017 | Frame and Segment Level Recurrent Neural Networks for Phone Classification. Martin Ratajczak, Sebastian Tschiatschek, Franz Pernkopf |
| 2017 | Frame-Wise Dynamic Threshold Based Polyphonic Acoustic Event Detection. Xianjun Xia, Roberto Togneri, Ferdous Ahmed Sohel, David Huang |
| 2017 | Functional Principal Component Analysis of Vocal Tract Area Functions. Jorge C. Lucero |
| 2017 | Gain Compensation for Fast i-Vector Extraction Over Short Duration. Kong-Aik Lee, Haizhou Li |
| 2017 | Gate Activation Signal Analysis for Gated Recurrent Neural Networks and its Correlation with Phoneme Boundaries. Yu-Hsuan Wang, Cheng-Tao Chung, Hung-yi Lee |
| 2017 | Gaussian Prediction Based Attention for Online End-to-End Speech Recognition. Junfeng Hou, Shiliang Zhang, Li-Rong Dai |
| 2017 | Generalized Distillation Framework for Speaker Normalization. Neethu Mariam Joy, Sandeep Reddy Kothinti, Srinivasan Umesh, Basil Abraham |
| 2017 | Generation of Large-Scale Simulated Utterances in Virtual Rooms to Train Deep-Neural Networks for Far-Field Speech Recognition in Google Home. Chanwoo Kim, Ananya Misra, Kean K. Chin, Thad Hughes, Arun Narayanan, Tara N. Sainath, Michiel Bacchiani |
| 2017 | Generative Adversarial Network-Based Glottal Waveform Model for Statistical Parametric Speech Synthesis. Bajibabu Bollepalli, Lauri Juvela, Paavo Alku |
| 2017 | Generative Adversarial Network-Based Postfilter for STFT Spectrograms. Takuhiro Kaneko, Shinji Takaki, Hirokazu Kameoka, Junichi Yamagishi |
| 2017 | Global SNR Estimation of Speech Signals for Unknown Noise Conditions Using Noise Adapted Non-Linear Regression. Pavlos Papadopoulos, Ruchir Travadi, Shrikanth S. Narayanan |
| 2017 | Global Syllable Vectors for Building TTS Front-End with Deep Learning. Jinfu Ni, Yoshinori Shiga, Hisashi Kawai |
| 2017 | Glottal Model Based Speech Beamforming for ad-hoc Microphone Arrays. Yang Zhang, Dinei Florêncio, Mark Hasegawa-Johnson |
| 2017 | Glottal Opening and Strategies of Production of Fricatives. Benjamin Elie, Yves Laprie |
| 2017 | Glottal Source Estimation from Coded Telephone Speech Using a Deep Neural Network. N. P. Narendra, Manu Airaksinen, Paavo Alku |
| 2017 | Glottal Source Features for Automatic Speech-Based Depression Assessment. Olympia Simantiraki, Paulos Charonyktakis, Anastasia Pampouchidou, Manolis Tsiknakis, Martin Cooke |
| 2017 | Google's Next-Generation Real-Time Unit-Selection Synthesizer Using Sequence-to-Sequence LSTM-Based Autoencoders. Vincent Wan, Yannis Agiomyrgiannakis, Hanna Silén, Jakub Vít |
| 2017 | Harvest: A High-Performance Fundamental Frequency Estimator from Speech Signals. Masanori Morise |
| 2017 | Hidden Markov Model Variational Autoencoder for Acoustic Unit Discovery. Janek Ebbers, Jahn Heymann, Lukas Drude, Thomas Glarner, Reinhold Haeb-Umbach, Bhiksha Raj |
| 2017 | Hierarchical Constrained Bayesian Optimization for Feature, Acoustic Model and Decoder Parameter Optimization. Akshay Chandrashekaran, Ian R. Lane |
| 2017 | Hierarchical LSTMs with Joint Learning for Estimating Customer Satisfaction from Contact Center Calls. Atsushi Ando, Ryo Masumura, Hosana Kamiyama, Satoshi Kobashikawa, Yushi Aono |
| 2017 | Hierarchical Recurrent Neural Network for Story Segmentation. Emiru Tsunoo, Peter Bell, Steve Renals |
| 2017 | Highway-LSTM and Recurrent Highway Networks for Speech Recognition. Golan Pundak, Tara N. Sainath |
| 2017 | HomeBank: A Repository for Long-Form Real-World Audio Recordings of Children. Anne S. Warlaumont, Mark VanDam, Elika Bergelson, Alejandrina Cristià |
| 2017 | Homogeneity Measure Impact on Target and Non-Target Trials in Forensic Voice Comparison. Moez Ajili, Jean-François Bonastre, Waad Ben Kheder, Solange Rossato, Juliette Kahn |
| 2017 | How Does the Absence of Shared Knowledge Between Interlocutors Affect the Production of French Prosodic Forms? Amandine Michelas, Cecile Cau, Maud Champagne-Lavau |
| 2017 | How Long is Too Long? How Pause Features After Requests Affect the Perceived Willingness of Affirmative Answers. Lea S. Kohtz, Oliver Niebuhr |
| 2017 | How are Four-Level Length Distinctions Produced? Evidence from Moroccan Arabic. Giuseppina Turco, Karim Shoul, Rachid Ridouane |
| 2017 | Human and Automated Scoring of Fluency, Pronunciation and Intonation During Human-Machine Spoken Dialog Interactions. Vikram Ramanarayanan, Patrick L. Lange, Keelan Evanini, Hillary R. Molloy, David Suendermann-Oeft |
| 2017 | Humans do not Maximize the Probability of Correct Decision When Recognizing DANTALE Words in Noise. Mohsen Zareian Jahromi, Jan Østergaard, Jesper Jensen |
| 2017 | Hybrid Acoustic-Lexical Deep Learning Approach for Deception Detection. Gideon Mendels, Sarah Ita Levitan, Kai-Zhan Lee, Julia Hirschberg |
| 2017 | Hyperarticulation of Corrections in Multilingual Dialogue Systems. Ivan Kraljevski, Diane Hirschfeld |
| 2017 | Hypernasality Severity Analysis in Cleft Lip and Palate Speech Using Vowel Space Area. Nikitha K., Sishir Kalita, Vikram C. M., M. Pushpavathi, S. R. Mahadeva Prasanna |
| 2017 | IITG-Indigo System for NIST 2016 SRE Challenge. Nagendra Kumar, Rohan Kumar Das, Sarfaraz Jelil, Dhanush B. K, H. Kashyap, K. Sri Rama Murty, Sriram Ganapathy, Rohit Sinha, S. R. Mahadeva Prasanna |
| 2017 | ISCA Medal for Scientific Achievement. Haizhou Li |
| 2017 | Ideal Ratio Mask Estimation Using Deep Neural Networks for Monaural Speech Segregation in Noisy Reverberant Conditions. Xu Li, Junfeng Li, Yonghong Yan |
| 2017 | Implementation of a Radiology Speech Recognition System for Estonian Using Open Source Software. Tanel Alumäe, Andrus Paats, Ivo Fridolin, Einar Meister |
| 2017 | Implementing Gender-Dependent Vowel-Level Analysis for Boosting Speech-Based Depression Recognition. Bogdan Vlasenko, Hesam Sagha, Nicholas Cummins, Björn W. Schuller |
| 2017 | Improved Automatic Speech Recognition Using Subband Temporal Envelope Features and Time-Delay Neural Network Denoising Autoencoder. Cong-Thanh Do, Yannis Stylianou |
| 2017 | Improved Codebook-Based Speech Enhancement Based on MBE Model. Qizheng Huang, Changchun Bao, Xianyun Wang |
| 2017 | Improved End-of-Query Detection for Streaming Speech Recognition. Matt Shannon, Gabor Simko, Shuo-Yiin Chang, Carolina Parada |
| 2017 | Improved Example-Based Speech Enhancement by Using Deep Neural Network Acoustic Model for Noise Robust Example Search. Atsunori Ogawa, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani |
| 2017 | Improved Gender Independent Speaker Recognition Using Convolutional Neural Network Based Bottleneck Features. Shivesh Ranjan, John H. L. Hansen |
| 2017 | Improved Single System Conversational Telephone Speech Recognition with VGG Bottleneck Features. William Hartmann, Roger Hsiao, Tim Ng, Jeff Z. Ma, Francis Keith, Man-Hung Siu |
| 2017 | Improved Subword Modeling for WFST-Based Speech Recognition. Peter Smit, Sami Virpioja, Mikko Kurimo |
| 2017 | Improving Child Speech Disorder Assessment by Incorporating Out-of-Domain Adult Speech. Daniel V. Smith, Alex Sneddon, Lauren Ward, Andreas Duenser, Jill Freyne, David Silvera-Tawil, Angela Morgan |
| 2017 | Improving Children's Speech Recognition Through Explicit Pitch Scaling Based on Iterative Spectrogram Inversion. Waquar Ahmad, Syed Shahnawazuddin, Hemant Kumar Kathania, Gayadhar Pradhan, Arun B. Samaddar |
| 2017 | Improving Computer Lipreading via DNN Sequence Discriminative Training Techniques. Kwanchiva Thangthai, Richard W. Harvey |
| 2017 | Improving DNN Bluetooth Narrowband Acoustic Models by Cross-Bandwidth and Cross-Lingual Initialization. Xiaodan Zhuang, Arnab Ghoshal, Antti-Veikko Rosti, Matthias Paulik, Daben Liu |
| 2017 | Improving Deliverable Speech-to-Text Systems with Multilingual Knowledge Transfer. Jeff Z. Ma, Francis Keith, Tim Ng, Man-Hung Siu, Owen Kimball |
| 2017 | Improving Mask Learning Based Speech Enhancement System with Restoration Layers and Residual Connection. Zhuo Chen, Yan Huang, Jinyu Li, Yifan Gong |
| 2017 | Improving Mispronunciation Detection for Non-Native Learners with Multisource Information and LSTM-Based Deep Models. Wei Li, Nancy F. Chen, Sabato Marco Siniscalchi, Chin-Hui Lee |
| 2017 | Improving Prediction of Speech Activity Using Multi-Participant Respiratory State. Marcin Wlodarczak, Kornel Laskowski, Mattias Heldner, Kätlin Aare |
| 2017 | Improving Robustness of Speaker Recognition to New Conditions Using Unlabeled Data. Diego Castán, Mitchell McLaren, Luciana Ferrer, Aaron Lawson, Alicia Lozano-Diez |
| 2017 | Improving Source Separation via Multi-Speaker Representations. Jeroen Zegers, Hugo Van hamme |
| 2017 | Improving Speaker Verification Performance in Presence of Spoofing Attacks Using Out-of-Domain Spoofed Data. Achintya Kumar Sarkar, Md. Sahidullah, Zheng-Hua Tan, Tomi Kinnunen |
| 2017 | Improving Speaker Verification for Reverberant Conditions with Deep Neural Network Dereverberation Processing. Peter Guzewich, Stephen A. Zahorian |
| 2017 | Improving Speaker-Independent Lipreading with Domain-Adversarial Training. Michael Wand, Jürgen Schmidhuber |
| 2017 | Improving Speech Intelligibility in Binaural Hearing Aids by Estimating a Time-Frequency Mask with a Weighted Least Squares Classifier. David Ayllón, Roberto Gil-Pita, Manuel Rosa-Zurera |
| 2017 | Improving Speech Recognition by Revising Gated Recurrent Units. Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio |
| 2017 | Improving Speech Recognizers by Refining Broadcast Data with Inaccurate Subtitle Timestamps. Jeong-Uk Bang, Mu-Yeol Choi, Sang-Hun Kim, Oh-Wook Kwon |
| 2017 | Improving Sub-Phone Modeling for Better Native Language Identification with Non-Native English Speech. Yao Qian, Keelan Evanini, Xinhao Wang, David Suendermann-Oeft, Robert A. Pugh, Patrick L. Lange, Hillary R. Molloy, Frank K. Soong |
| 2017 | Improving YANGsaf F0 Estimator with Adaptive Kalman Filter. Kanru Hua |
| 2017 | Improving the Effectiveness of Speaker Verification Domain Adaptation with Inadequate In-Domain Data. Bengt J. Borgström, Elliot Singer, Douglas A. Reynolds, Seyed Omid Sadjadi |
| 2017 | Incorporating Acoustic Features for Spontaneous Speech Driven Content Retrieval. Hiroto Tasaki, Tomoyosi Akiba |
| 2017 | Incorporating Local Acoustic Variability Information into Short Duration Speaker Verification. Jianbo Ma, Vidhyasaharan Sethu, Eliathamby Ambikairajah, Kong-Aik Lee |
| 2017 | Increasing Recall of Lengthening Detection via Semi-Automatic Classification. Simon Betz, Jana Voße, Sina Zarrieß, Petra Wagner |
| 2017 | Incremental Dialogue Act Recognition: Token- vs Chunk-Based Classification. Eustace Ebhotemhen, Volha Petukhova, Dietrich Klakow |
| 2017 | Independent Modelling of High and Low Energy Speech Frames for Spoofing Detection. Gajan Suthokumar, Kaavya Sriskandaraja, Vidhyasaharan Sethu, Chamith Wijenayake, Eliathamby Ambikairajah |
| 2017 | Indoor/Outdoor Audio Classification Using Foreground Speech Segmentation. Banriskhem K. Khonglah, K. T. Deepak, S. R. Mahadeva Prasanna |
| 2017 | Infected Phonemes: How a Cold Impairs Speech on a Phonetic Level. Johannes Wagner, Thiago Fraga-Silva, Yvan Josse, Dominik Schiller, Andreas Seiderer, Elisabeth André |
| 2017 | Inferring Stance from Prosody. Nigel G. Ward, Jason C. Carlson, Olac Fuentes, Diego Castán, Elizabeth Shriberg, Andreas Tsiartas |
| 2017 | Integrated Mechanical Model of [r]-[l] and [b]-[m]-[w] Producing Consonant Cluster [br]. Takayuki Arai |
| 2017 | Integrating Articulatory Information in Deep Learning-Based Text-to-Speech Synthesis. Beiming Cao, Myung Jong Kim, Jan P. H. van Santen, Ted Mau, Jun Wang |
| 2017 | Integrating the Talkamatic Dialogue Manager with Alexa. Staffan Larsson, Alexander Berman, Andreas Krona, Fredrik Kronlid |
| 2017 | Intelligibilities of Mandarin Chinese Sentences with Spectral "Holes". Yafan Chen, Yong Xu, Jun Yang |
| 2017 | Inter-Speaker Variability: Speaker Normalisation and Quantitative Estimation of Articulatory Invariants in Speech Production for French. Antoine Serrurier, Pierre Badin, Louis-Jean Boë, Laurent Lamalle, Christiane Neuschaefer-Rube |
| 2017 | Interaction and Transition Model for Speech Emotion Recognition in Dialogue. Ruo Zhang, Atsushi Ando, Satoshi Kobashikawa, Yushi Aono |
| 2017 | Internal Memory Gate for Recurrent Neural Networks with Application to Spoken Language Understanding. Mohamed Morchid |
| 2017 | Interpretable Objective Assessment of Dysarthric Speech Based on Deep Neural Networks. Ming Tu, Visar Berisha, Julie Liss |
| 2017 | Intonation Facilitates Prediction of Focus Even in the Presence of Lexical Tones. Martin Ho Kwan Ip, Anne Cutler |
| 2017 | Intonation of Contrastive Topic in Estonian. Heete Sahkai, Meelis Mihkla |
| 2017 | Introducing Weighted Kernel Classifiers for Handling Imbalanced Paralinguistic Corpora: Snoring, Addressee and Cold. Heysem Kaya, Alexey A. Karpov |
| 2017 | Investigating Bidirectional Recurrent Neural Network Language Models for Speech Recognition. Xie Chen, Anton Ragni, Xunying Liu, Mark J. F. Gales |
| 2017 | Investigating Efficient Feature Representation Methods and Training Objective for BLSTM-Based Phone Duration Prediction. Yibin Zheng, Jianhua Tao, Zhengqi Wen, Ya Li, Bin Liu |
| 2017 | Investigating Scalability in Hierarchical Language Identification System. Saad Irtza, Vidhyasaharan Sethu, Eliathamby Ambikairajah, Haizhou Li |
| 2017 | Investigating the Effect of ASR Tuning on Named Entity Recognition. Mohamed Ameur Ben Jannet, Olivier Galibert, Martine Adda-Decker, Sophie Rosset |
| 2017 | It Sounds Like You Have a Cold! Testing Voice Features for the Interspeech 2017 Computational Paralinguistics Cold Challenge. Mark A. Huckvale, András Beke |
| 2017 | Iterative Optimal Preemphasis for Improved Glottal-Flow Estimation by Iterative Adaptive Inverse Filtering. Parham Mokhtari, Hiroshi Ando |
| 2017 | Jee haan, I'd like both, por favor: Elicitation of a Code-Switched Corpus of Hindi-English and Spanish-English Human-Machine Dialog. Vikram Ramanarayanan, David Suendermann-Oeft |
| 2017 | Joint Estimation of Articulatory Features and Acoustic Models for Low-Resource Languages. Basil Abraham, Srinivasan Umesh, Neethu Mariam Joy |
| 2017 | Joint Learning of Correlated Sequence Labeling Tasks Using Bidirectional Recurrent Neural Networks. Vardaan Pahuja, Anirban Laha, Shachar Mirkin, Vikas C. Raykar, Lili Kotlerman, Guy Lev |
| 2017 | Joint Training of Expanded End-to-End DNN for Text-Dependent Speaker Verification. Hee-Soo Heo, Jee-weon Jung, Il-Ho Yang, Sung-Hyun Yoon, Ha-Jin Yu |
| 2017 | Joint Training of Multi-Channel-Condition Dereverberation and Acoustic Modeling of Microphone Array Speech for Robust Distant Speech Recognition. Fengpei Ge, Kehuang Li, Bo Wu, Sabato Marco Siniscalchi, Yonghong Yan, Chin-Hui Lee |
| 2017 | Jointly Predicting Arousal, Valence and Dominance with Multi-Task Learning. Srinivas Parthasarathy, Carlos Busso |
| 2017 | Jointly Trained Sequential Labeling and Classification by Sparse Attention Neural Networks. Mingbo Ma, Kai Zhao, Liang Huang, Bing Xiang, Bowen Zhou |
| 2017 | Kinematic Signatures of Prosody in Lombard Speech. Stefan Benus, Juraj Simko, Mona Lehtinen |
| 2017 | L1 Perceptions of L2 Prosody: The Interplay Between Intonation, Rhythm, and Speech Rate and Their Contribution to Accentedness and Comprehensibility. Lieke van Maastricht, Tim Zee, Emiel Krahmer, Marc Swerts |
| 2017 | LSTM Neural Network-Based Speaker Segmentation Using Acoustic and Language Modelling. Miquel India, José A. R. Fonollosa, Javier Hernando |
| 2017 | Label-Dependency Coding in Simple Recurrent Networks for Spoken Language Understanding. Marco Dinarelli, Vedran Vukotic, Christian Raymond |
| 2017 | Large-Scale Domain Adaptation via Teacher-Student Learning. Jinyu Li, Michael L. Seltzer, Xi Wang, Rui Zhao, Yifan Gong |
| 2017 | Large-Scale Speaker Ranking from Crowdsourced Pairwise Listener Ratings. Timo Baumann |
| 2017 | Laryngeal Articulation During Trumpet Performance: An Exploratory Study. Luis M. T. Jesus, Bruno Rocha, Andreia Hall |
| 2017 | Learning Factorized Transforms for Unsupervised Adaptation of LSTM-RNN Acoustic Models. Lahiru Samarakoon, Brian Mak, Khe Chai Sim |
| 2017 | Learning Latent Representations for Speech Generation and Transformation. Wei-Ning Hsu, Yu Zhang, James R. Glass |
| 2017 | Learning Similarity Functions for Pronunciation Variations. Einat Naaman, Yossi Adi, Joseph Keshet |
| 2017 | Learning Weakly Supervised Multimodal Phoneme Embeddings. Rahma Chaabouni, Ewan Dunbar, Neil Zeghidour, Emmanuel Dupoux |
| 2017 | Learning Word Vector Representations Based on Acoustic Counts. Manuel Sam Ribeiro, Oliver Watts, Junichi Yamagishi |
| 2017 | Learning the Mapping Function from Voltage Amplitudes to Sensor Positions in 3D-EMA Using Deep Neural Networks. Christian Kroos, Mark D. Plumbley |
| 2017 | Leveraging Text Data for Word Segmentation for Underresourced Languages. Thomas Glarner, Benedikt T. Boenninghoff, Oliver Walter, Reinhold Haeb-Umbach |
| 2017 | Lexical Adaptation to a Novel Accent in German: A Comparison Between German, Swedish, and Finnish Listeners. Adriana Hanulíková, Jenny Ekström |
| 2017 | Lexically Guided Perceptual Learning in Mandarin Chinese. L. Ann Burchfield, San-hei Kenny Luk, Mark Antoniou, Anne Cutler |
| 2017 | Listening in the Dips: Comparing Relevant Features for Speech Recognition in Humans and Machines. Constantin Spille, Bernd T. Meyer |
| 2017 | Locally Weighted Linear Discriminant Analysis for Robust Speaker Verification. Abhinav Misra, Shivesh Ranjan, John H. L. Hansen |
| 2017 | Locating Burst Onsets Using SFF Envelope and Phase Information. Bhanu Teja Nellore, RaviShankar Prasad, Sudarsana Reddy Kadiri, Suryakanth V. Gangashetty, B. Yegnanarayana |
| 2017 | Longitudinal Speaker Clustering and Verification Corpus with Code-Switching Frisian-Dutch Speech. Emre Yilmaz, Jelske Dijkstra, Hans Van de Velde, Frederik Kampstra, Jouke Algra, Henk van den Heuvel, David A. van Leeuwen |
| 2017 | Low-Complexity Pitch Estimation Based on Phase Differences Between Low-Resolution Spectra. Simon Graf, Tobias Herbig, Markus Buck, Gerhard Schmidt |
| 2017 | Low-Dimensional Representation of Spectral Envelope Without Deterioration for Full-Band Speech Analysis/Synthesis System. Masanori Morise, Genta Miyashita, Kenji Ozawa |
| 2017 | Low-Frequency Ultrasonic Communication for Speech Broadcasting in Public Transportation. Kwang Myung Jeon, Nam Kyun Kim, Chan Woong Kwak, Jung Min Moon, Hong Kook Kim |
| 2017 | MMN Responses in Adults After Exposure to Bimodal and Unimodal Frequency Distributions of Rotated Speech. Ellen Marklund, Elísabet Eir Cortes, Johan Sjons |
| 2017 | Machine Assisted Analysis of Vowel Length Contrasts in Wolof. Elodie Gauthier, Laurent Besacier, Sylvie Voisin |
| 2017 | Manual and Automatic Transcriptions in Dementia Detection from Speech. Jochen Weiner, Mathis Engelbart, Tanja Schultz |
| 2017 | Mapping Across Feature Spaces in Forensic Voice Comparison: The Contribution of Auditory-Based Voice Quality to (Semi-)Automatic System Testing. Vincent Hughes, Philip Harrison, Paul Foulkes, Peter French, Colleen Kavanagh, Eugenia San Segundo |
| 2017 | Matrix of Polynomials Model Based Polynomial Dictionary Learning Method for Acoustic Impulse Response Modeling. Jian Guan, Xuan Wang, Pengming Feng, Jing Dong, Wenwu Wang |
| 2017 | Measuring Encoding Efficiency in Swedish and English Language Learner Speech Production. Gintare Grigonyte, Gerold Schneider |
| 2017 | Measuring Synchrony in Task-Based Dialogues. Justine Reverdy, Carl Vogel |
| 2017 | Mechanisms of Tone Sandhi Rule Application by Non-Native Speakers. Si Chen, Yunjuan He, Chun Wah Yuen, Bei Li, Yike Yang |
| 2017 | Mel-Cepstral Distortion of German Vowels in Different Information Density Contexts. Erika Brandt, Frank Zimmerer, Bistra Andreeva, Bernd Möbius |
| 2017 | Mental Representation of Japanese Mora; Focusing on its Intrinsic Duration. Kosuke Sugai |
| 2017 | MetaLab: A Repository for Meta-Analyses on Language Development, and More. Sho Tsuji, Christina Bergmann, Molly Lewis, Mika Braginsky, Page Piccinini, Michael C. Frank, Alejandrina Cristià |
| 2017 | Metrics for Modeling Code-Switching Across Corpora. Gualberto A. Guzmán, Joseph Ricard, Jacqueline Serigos, Barbara E. Bullock, Almeida Jacqueline Toribio |
| 2017 | Mind the Peak: When Museum is Temporarily Understood as Musical in Australian English. Katharina Zahner, Heather Kember, Bettina Braun |
| 2017 | Minimum Semantic Error Cost Training of Deep Long Short-Term Memory Networks for Topic Spotting on Conversational Speech. Zhong Meng, Biing-Hwang Juang |
| 2017 | Mismatched Crowdsourcing from Multiple Annotator Languages for Recognizing Zero-Resourced Languages: A Nullspace Clustering Approach. Wenda Chen, Mark Hasegawa-Johnson, Nancy F. Chen, Boon Pang Lim |
| 2017 | Misperceptions of the Emotional Content of Natural and Vocoded Speech in a Car. Jaime Lorenzo-Trueba, Cassia Valentini-Botinhao, Gustav Eje Henter, Junichi Yamagishi |
| 2017 | MixMax Approximation as a Super-Gaussian Log-Spectral Amplitude Estimator for Speech Enhancement. Robert Rehr, Timo Gerkmann |
| 2017 | MoPAReST - Mobile Phone Assisted Remote Speech Therapy Platform. Chitralekha Bhat, Anjali Kant, Bhavik Vachhani, Sarita Rautara, Ashok Kumar Sinha, Sunil Kumar Kopparapu |
| 2017 | Modeling Categorical Perception with the Receptive Fields of Auditory Neurons. Chris Neufeld |
| 2017 | Modeling Laryngeal Muscle Activation Noise for Low-Order Physiological Based Speech Synthesis. Rodrigo Manríquez, Sean D. Peterson, Pavel Prado, Patricio Orio, Matías Zañartu |
| 2017 | Modeling Perceivers Neural-Responses Using Lobe-Dependent Convolutional Neural Network to Improve Speech Emotion Recognition. Ya-Tse Wu, Hsuan-Yu Chen, Yu-Hsien Liao, Li-Wei Kuo, Chi-Chun Lee |
| 2017 | Modelling the Informativeness of Non-Verbal Cues in Parent-Child Interaction. Mats Wirén, Kristina N. Björkenstam, Robert Östling |
| 2017 | Modifying Amazon's Alexa ASR Grammar and Lexicon - A Case Study. Hassan Alam, Aman Kumar, Manan Vyas, Tina Werner, Rachmat Hartono |
| 2017 | Montreal Forced Aligner: Trainable Text-Speech Alignment Using Kaldi. Michael McAuliffe, Michaela Socolof, Sarah Mihuc, Michael Wagner, Morgan Sonderegger |
| 2017 | Motion Analysis in Vocalized Surprise Expressions. Carlos Toshinori Ishi, Takashi Minato, Hiroshi Ishiguro |
| 2017 | Multi-Channel Apollo Mission Speech Transcripts Calibration. Lakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen |
| 2017 | Multi-Scale Context Adaptation for Improving Child Automatic Speech Recognition in Child-Adult Spoken Interactions. Manoj Kumar, Daniel Bone, Kelly McWilliams, Shanna Williams, Thomas D. Lyon, Shrikanth S. Narayanan |
| 2017 | Multi-Stage DNN Training for Automatic Recognition of Dysarthric Speech. Emre Yilmaz, Mario Ganzeboom, Catia Cucchiarini, Helmer Strik |
| 2017 | Multi-Target Ensemble Learning for Monaural Speech Separation. Hui Zhang, Xueliang Zhang, Guanglai Gao |
| 2017 | Multi-Task Learning Using Mismatched Transcription for Under-Resourced Speech Recognition. Van Hai Do, Nancy F. Chen, Boon Pang Lim, Mark Hasegawa-Johnson |
| 2017 | Multi-Task Learning for Mispronunciation Detection on Singapore Children's Mandarin Speech. Rong Tong, Nancy F. Chen, Bin Ma |
| 2017 | Multi-Task Learning for Prosodic Structure Generation Using BLSTM RNN with Structured Output Layer. Yuchen Huang, Zhiyong Wu, Runnan Li, Helen Meng, Lianhong Cai |
| 2017 | Multilingual Recurrent Neural Networks with Residual Learning for Low-Resource Speech Recognition. Shiyu Zhou, Yuanyuan Zhao, Shuang Xu, Bo Xu |
| 2017 | Multilingual i-Vector Based Statistical Modeling for Music Genre Classification. Jia Dai, Wei Xue, Wenju Liu |
| 2017 | Multimodal Markers of Persuasive Speech: Designing a Virtual Debate Coach. Volha Petukhova, Manoj Raju, Harry Bunt |
| 2017 | Multimodal Prediction of Affective Dimensions via Fusing Multiple Regression Techniques. Dong-Yan Huang, Wan Ding, Mingyu Xu, Huaiping Ming, Minghui Dong, Xinguo Yu, Haizhou Li |
| 2017 | Multiple Sound Source Counting and Localization Based on Spatial Principal Eigenvector. Bing Yang, Hong Liu, Cheng Pang |
| 2017 | Multitask Learning with CTC and Segmental CRF for Speech Recognition. Liang Lu, Lingpeng Kong, Chris Dyer, Noah A. Smith |
| 2017 | Multitask Learning with Low-Level Auxiliary Tasks for Encoder-Decoder Based Speech Recognition. Shubham Toshniwal, Hao Tang, Liang Lu, Karen Livescu |
| 2017 | Multitask Sequence-to-Sequence Models for Grapheme-to-Phoneme Conversion. Benjamin Milde, Christoph Schmidt, Joachim Köhler |
| 2017 | Multiview Representation Learning via Deep CCA for Silent Speech Recognition. Myung Jong Kim, Beiming Cao, Ted Mau, Jun Wang |
| 2017 | Music Tempo Estimation Using Sub-Band Synchrony. Shreyan Chowdhury, Tanaya Guha, Rajesh M. Hegde |
| 2017 | Musical Speech: A New Methodology for Transcribing Speech Prosody. Alexsandro R. Meireles, Antônio R. M. Simões, Antonio Celso Ribeiro, Beatriz Raposo de Medeiros |
| 2017 | Mylly - The Mill: A New Platform for Processing Speech and Text Corpora Easily and Efficiently. Mietta Lennes, Jussi Piitulainen, Martin Matthiesen |
| 2017 | NMT-Based Segmentation and Punctuation Insertion for Real-Time Spoken Language Translation. Eunah Cho, Jan Niehues, Alex Waibel |
| 2017 | NTCD-TIMIT: A New Database and Baseline for Noise-Robust Audio-Visual Speech Recognition. Ahmed Hussen Abdelaziz |
| 2017 | Nativization of Foreign Names in TTS for Automatic Reading of World News in Swahili. Joseph Mendelson, Pilar Oplustil, Oliver Watts, Simon King |
| 2017 | Nature of Contrast and Coarticulation: Evidence from Mizo Tones and Assamese Vowel Harmony. Indranil Dutta, Irfan S., Pamir Gogoi, Priyankoo Sarmah |
| 2017 | Neural Network-Based Spectrum Estimation for Online WPE Dereverberation. Keisuke Kinoshita, Marc Delcroix, Haeyong Kwon, Takuma Mori, Tomohiro Nakatani |
| 2017 | Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition. Hagen Soltau, Hank Liao, Hasim Sak |
| 2017 | Node Pruning Based on Entropy of Weights and Node Activity for Small-Footprint Acoustic Model Based on Deep Neural Networks. Ryu Takeda, Kazuhiro Nakadai, Kazunori Komatani |
| 2017 | Non-Local Estimation of Speech Signal for Vowel Onset Point Detection in Varied Environments. Avinash Kumar, Syed Shahnawazuddin, Gayadhar Pradhan |
| 2017 | Non-Uniform MCE Training of Deep Long Short-Term Memory Recurrent Neural Networks for Keyword Spotting. Zhong Meng, Biing-Hwang Juang |
| 2017 | Nonparametrically Trained Probabilistic Linear Discriminant Analysis for i-Vector Speaker Verification. Abbas Khosravani, Mohammad Mehdi Homayounpour |
| 2017 | Nora the Empathetic Psychologist. Genta Indra Winata, Onno Kampman, Yang Yang, Anik Dey, Pascale Fung |
| 2017 | Novel Shifted Real Spectrum for Exact Signal Reconstruction. Meet H. Soni, Rishabh Tak, Hemant A. Patil |
| 2017 | Novel Variable Length Teager Energy Separation Based Instantaneous Frequency Features for Replay Detection. Hemant A. Patil, Madhu R. Kamble, Tanvina B. Patel, Meet H. Soni |
| 2017 | Nuance - Politecnico di Torino's 2016 NIST Speaker Recognition Evaluation System. Daniele Colibro, Claudio Vair, Emanuele Dalmasso, Kevin Farrell, Gennady Karvitsky, Sandro Cumani, Pietro Laface |
| 2017 | Null-Hypothesis LLR: A Proposal for Forensic Automatic Speaker Recognition. Yosef A. Solewicz, Michael Jessen, David van der Vloed |
| 2017 | Objective Severity Assessment from Disordered Voice Using Estimated Glottal Airflow. Yu-Ren Chien, Michal Borský, Jón Guðnason |
| 2017 | Occupancy Detection in Commercial and Residential Environments Using Audio Signal. Shabnam Ghaffarzadegan, Attila Reiss, Mirko Ruhs, Robert Dürichen, Zhe Feng |
| 2017 | Off-Topic Spoken Response Detection Using Siamese Convolutional Neural Networks. Chong Min Lee, Su-Youn Yoon, Xihao Wang, Matthew Mulholland, Ikkyu Choi, Keelan Evanini |
| 2017 | Off-Topic Spoken Response Detection with Word Embeddings. Su-Youn Yoon, Chong Min Lee, Ikkyu Choi, Xinhao Wang, Matthew Mulholland, Keelan Evanini |
| 2017 | On Building Mixed Lingual Speech Synthesis Systems. Sai Krishna Rallabandi, Alan W. Black |
| 2017 | On Design of Robust Deep Models for CHiME-4 Multi-Channel Speech Recognition with Multiple Configurations of Array Microphones. Yanhui Tu, Jun Du, Lei Sun, Feng Ma, Chin-Hui Lee |
| 2017 | On Improving Acoustic Models for TORGO Dysarthric Speech Database. Neethu Mariam Joy, Srinivasan Umesh, Basil Abraham |
| 2017 | On Multi-Domain Training and Adaptation of End-to-End RNN Acoustic Models for Distant Speech Recognition. Seyedmahdad Mirsamadi, John H. L. Hansen |
| 2017 | On the Duration of Mandarin Tones. Jing Yang, Yu Zhang, Aijun Li, Li Xu |
| 2017 | On the Influence of Modifying Magnitude and Phase Spectrum to Enhance Noisy Speech Signals. Hans-Günter Hirsch, Michael Gref |
| 2017 | On the Linguistic Relevance of Speech Units Learned by Unsupervised Acoustic Modeling. Siyuan Feng, Tan Lee |
| 2017 | On the Quality and Intelligibility of Noisy Speech Processed for Near-End Listening Enhancement. Tudor-Catalin Zorila, Yannis Stylianou |
| 2017 | On the Role of Temporal Variability in the Acquisition of the German Vowel Length Contrast. Felicitas Kleber |
| 2017 | On the Use of Band Importance Weighting in the Short-Time Objective Intelligibility Measure. Asger Heidemann Andersen, Jan Mark de Haan, Zheng-Hua Tan, Jesper Jensen |
| 2017 | Online Adaptation of an Attention-Based Neural Network for Natural Language Generation. Matthieu Riou, Bassam Jabaian, Stéphane Huet, Fabrice Lefèvre |
| 2017 | Online End-of-Turn Detection from Speech Based on Stacked Time-Asynchronous Sequential Networks. Ryo Masumura, Taichi Asami, Hirokazu Masataki, Ryo Ishii, Ryuichiro Higashinaka |
| 2017 | OpenMM: An Open-Source Multimodal Feature Extraction Tool. Michelle Renee Morales, Stefan Scherer, Rivka Levitan |
| 2017 | Opinion Dynamics Modeling for Movie Review Transcripts Classification with Hidden Conditional Random Fields. Valentin Barrière, Chloé Clavel, Slim Essid |
| 2017 | Optimized Time Series Filters for Detecting Laughter and Filler Events. Gábor Gosztolya |
| 2017 | Optimizing DNN Adaptation for Recognition of Enhanced Speech. Marco Matassoni, Alessio Brutti, Daniele Falavigna |
| 2017 | Optimizing Expected Word Error Rate via Sampling for Speech Recognition. Matt Shannon |
| 2017 | Order-Preserving Abstractive Summarization for Spoken Content Based on Connectionist Temporal Classification. Bo-Ru Lu, Frank Shyu, Yun-Nung Chen, Hung-yi Lee, Lin-Shan Lee |
| 2017 | PRAV: A Phonetically Rich Audio Visual Corpus. Abhishek Narwekar, Prasanta Kumar Ghosh |
| 2017 | Parallel Hierarchical Attention Networks with Shared Memory Reader for Multi-Stream Conversational Document Classification. Naoki Sawada, Ryo Masumura, Hiromitsu Nishizaki |
| 2017 | Parallel Neural Network Features for Improved Tandem Acoustic Modeling. Zoltán Tüske, Wilfried Michel, Ralf Schlüter, Hermann Ney |
| 2017 | Parallel-Data-Free Many-to-Many Voice Conversion Based on DNN Integrated with Eigenspace Using a Non-Parallel Speech Corpus. Tetsuya Hashimoto, Hidetsugu Uchida, Daisuke Saito, Nobuaki Minematsu |
| 2017 | Pashto Intonation Patterns. Luca Rognoni, Judith Bishop, Miriam Corris |
| 2017 | Perception and Acoustics of Vowel Nasality in Brazilian Portuguese. Luciana Marques, Rebecca Scarborough |
| 2017 | Perception and Production of Word-Final /ʁ/ in French. Cédric Gendrot |
| 2017 | Perception of Non-Contrastive Variations in American English by Japanese Learners: Flaps are Less Favored Than Stops. Kiyoko Yoneyama, Mafuyu Kitahara, Keiichi Tajima |
| 2017 | Perceptual Ratings of Voice Likability Collected Through In-Lab Listening Tests vs. Mobile-Based Crowdsourcing. Laura Fernández Gallardo, Rafael Zequeira Jiménez, Sebastian Möller |
| 2017 | Perceptual and Acoustic CorreLates of Gender in the Prepubertal Voice. Adrian P. Simpson, Riccarda Funk, Frederik Palmer |
| 2017 | PercyConfigurator - Perception Experiments as a Service. Christoph Draxler |
| 2017 | Personalized Quantification of Voice Attractiveness in Multidimensional Merit Space. Yasunari Obuchi |
| 2017 | Phase Modeling Using Integrated Linear Prediction Residual for Statistical Parametric Speech Synthesis. Nagaraj Adiga, S. R. Mahadeva Prasanna |
| 2017 | Phone Classification Using a Non-Linear Manifold with Broad Phone Class Dependent DNNs. Linxue Bai, Peter Jancovic, Martin J. Russell, Philip Weber, Stephen M. Houghton |
| 2017 | Phone Duration Modeling for LVCSR Using Neural Networks. Hossein Hadian, Daniel Povey, Hossein Sameti, Sanjeev Khudanpur |
| 2017 | Phoneme State Posteriorgram Features for Speech Based Automatic Classification of Speakers in Cold and Healthy Condition. Akshay Kalkunte Suresh, Srinivasa Raghavan K. M., Prasanta Kumar Ghosh |
| 2017 | Phoneme-Discriminative Features for Dysarthric Speech Conversion. Ryo Aihara, Tetsuya Takiguchi, Yasuo Ariki |
| 2017 | Phonetic Correlates of Pharyngeal and Pharyngealized Consonants in Saudi, Lebanese, and Jordanian Arabic: An rt-MRI Study. Zainab Hermes, Marissa S. Barlaz, Ryan Shosted, Zhi-Pei Liang, Bradley P. Sutton |
| 2017 | Phonetic Restoration of Temporally Reversed Speech. Shiyu Wang, Fei Chen |
| 2017 | Phonological Complexity, Segment Rate and Speech Tempo Perception. Leendert Plug, Rachel Smith |
| 2017 | Phonological Feature Based Mispronunciation Detection and Diagnosis Using Multi-Task DNNs and Active Learning. Vipul Arora, Aditi Lahiri, Henning Reetz |
| 2017 | Phonological Markers of Oxytocin and MDMA Ingestion. Carla Agurto, Raquel Norel, Rachel Ostrand, Gillinder Bedi, Harriet de Wit, Matthew J. Baggott, Matthew G. Kirkpatrick, Margaret Wardle, Guillermo A. Cecchi |
| 2017 | Phrase Break Prediction for Long-Form Reading TTS: Exploiting Text Structure Information. Viacheslav Klimkov, Adam Nadolski, Alexis Moinet, Bartosz Putrycz, Roberto Barra-Chicote, Thomas Merritt, Thomas Drugman |
| 2017 | Physically Constrained Statistical F Kou Tanaka, Hirokazu Kameoka, Tomoki Toda, Satoshi Nakamura |
| 2017 | Pitch Convergence as an Effect of Perceived Attractiveness and Likability. Jan Michalsky, Heike Schoormann |
| 2017 | Polyglot and Speech Corpus Tools: A System for Representing, Integrating, and Querying Speech Corpora. Michael McAuliffe, Elias Stengel-Eskin, Michaela Socolof, Morgan Sonderegger |
| 2017 | Predicting Automatic Speech Recognition Performance Over Communication Channels from Instrumental Speech Quality and Intelligibility Scores. Laura Fernández Gallardo, Sebastian Möller, John Beerends |
| 2017 | Predicting Epenthetic Vowel Quality from Acoustics. Adriana Guevara-Rukoz, Erika Parlato-Oliveira, Shi Yu, Yuki Hirose, Sharon Peperkamp, Emmanuel Dupoux |
| 2017 | Predicting Head Pose from Speech with a Conditional Variational Autoencoder. David Greenwood, Stephen D. Laycock, Iain A. Matthews |
| 2017 | Predicting Speech Intelligibility Using a Gammachirp Envelope Distortion Index Based on the Signal-to-Distortion Ratio. Katsuhiko Yamamoto, Toshio Irino, Toshie Matsui, Shoko Araki, Keisuke Kinoshita, Tomohiro Nakatani |
| 2017 | Prediction of Speech Delay from Acoustic Measurements. Jason Lilley, Madhavi Vedula Ratnagiri, H. Timothy Bunnell |
| 2017 | Principles for Learning Controllable TTS from Annotated and Latent Variation. Gustav Eje Henter, Jaime Lorenzo-Trueba, Xin Wang, Junichi Yamagishi |
| 2017 | Production of Sustained Vowels and Categorical Perception of Tones in Mandarin Among Cochlear-Implanted Children. Wentao Gu, Jiao Yin, James J. Mahshie |
| 2017 | Proficiency Assessment of ESL Learner's Sentence Prosody with TTS Synthesized Voice as Reference. Yujia Xiao, Frank K. Soong |
| 2017 | Progressive Neural Networks for Transfer Learning in Emotion Recognition. John Gideon, Soheil Khorram, Zakaria Aldeneh, Dimitrios Dimitriadis, Emily Mower Provost |
| 2017 | Pronunciation Learning with RNN-Transducers. Antoine Bruguier, Danushen Gnanapragasam, Leif Johnson, Kanishka Rao, Françoise Beaufays |
| 2017 | Prosodic Analysis of Attention-Drawing Speech. Carlos Toshinori Ishi, Jun Arai, Norihiro Hagita |
| 2017 | Prosodic Event Recognition Using Convolutional Neural Networks with Context Information. Sabrina Stehwien, Ngoc Thang Vu |
| 2017 | Prosodic Facilitation and Interference While Judging on the Veracity of Synthesized Statements. Ramiro H. Gálvez, Stefan Benus, Agustín Gravano, Marián Trnka |
| 2017 | Prosody Analysis of L2 English for Naturalness Evaluation Through Speech Modification. Dean Luo, Ruxin Luo, Lixin Wang |
| 2017 | Prosody Aware Word-Level Encoder Based on BLSTM-RNNs for DNN-Based Speech Synthesis. Yusuke Ijima, Nobukatsu Hojo, Ryo Masumura, Taichi Asami |
| 2017 | Prosody Control of Utterance Sequence for Information Delivering. Ishin Fukuoka, Kazuhiko Iwata, Tetsunori Kobayashi |
| 2017 | Prosograph: A Tool for Prosody Visualisation of Large Speech Corpora. Alp Öktem, Mireia Farrús, Leo Wanner |
| 2017 | QMDIS: QCRI-MIT Advanced Dialect Identification System. Sameer Khurana, Maryam Najafian, Ahmed Ali, Tuka Al Hanai, Yonatan Belinkov, James R. Glass |
| 2017 | Qualitative Differences in L3 Learners' Neurophysiological Response to L1 versus L2 Transfer. Alejandra Keidel Fernández, Thomas Hörberg |
| 2017 | Quaternion Denoising Encoder-Decoder for Theme Identification of Telephone Conversations. Titouan Parcollet, Mohamed Morchid, Georges Linarès |
| 2017 | Query-by-Example Search with Discriminative Neural Acoustic Word Embeddings. Shane Settle, Keith D. Levin, Herman Kamper, Karen Livescu |
| 2017 | R Andy Murphy, Irena Yanushevskaya, Ailbhe Ní Chasaide, Christer Gobl |
| 2017 | RNN-LDA Clustering for Feature Based DNN Adaptation. Xurong Xie, Xunying Liu, Tan Lee, Lan Wang |
| 2017 | Rapid Development of TTS Corpora for Four South African Languages. Daniel R. van Niekerk, Charl Johannes van Heerden, Marelie H. Davel, Neil Kleynhans, Oddur Kjartansson, Martin Jansche, Linne Ha |
| 2017 | Re-Inventing Speech - The Biological Way. Björn Lindblom |
| 2017 | Reading Validation for Pronunciation Evaluation in the Digitala Project. Aku Rouhe, Reima Karhila, Peter Smit, Mikko Kurimo |
| 2017 | Real Time Pitch Shifting with Formant Structure Preservation Using the Phase Vocoder. Michal Lenarczyk |
| 2017 | Real-Time Modulation Enhancement of Temporal Envelopes for Increasing Speech Intelligibility. Maria Koutsogiannaki, Holly Francois, Kihyun Choo, Eunmi Oh |
| 2017 | Real-Time Reactive Speech Synthesis: Incorporating Interruptions. Mirjam Wester, David A. Braude, Blaise Potard, Matthew P. Aylett, Francesca Shaw |
| 2017 | Real-Time Speech Enhancement with GCC-NMF. Sean U. N. Wood, Jean Rouat |
| 2017 | Real-Time Speech Enhancement with GCC-NMF: Demonstration on the Raspberry Pi and NVIDIA Jetson. Sean U. N. Wood, Jean Rouat |
| 2017 | Reanalyze Fundamental Frequency Peak Delay in Mandarin. Lixia Hao, Wei Zhang, Yanlu Xie, Jinsong Zhang |
| 2017 | Recognizing Multi-Talker Speech with Permutation Invariant Training. Dong Yu, Xuankai Chang, Yanmin Qian |
| 2017 | Recurrent Neural Aligner: An Encoder-Decoder Neural Network Model for Sequence to Sequence Mapping. Hasim Sak, Matt Shannon, Kanishka Rao, Françoise Beaufays |
| 2017 | Recursive Whitening Transformation for Speaker Recognition on Language Mismatched Condition. Suwon Shon, Seongkyu Mun, Hanseok Ko |
| 2017 | Reducing Mismatch in Training of DNN-Based Glottal Excitation Models in a Statistical Parametric Text-to-Speech System. Lauri Juvela, Bajibabu Bollepalli, Junichi Yamagishi, Paavo Alku |
| 2017 | Reducing the Computational Complexity of Two-Dimensional LSTMs. Bo Li, Tara N. Sainath |
| 2017 | Relating Unsupervised Word Segmentation to Reported Vocabulary Acquisition. Elin Larsen, Alejandrina Cristià, Emmanuel Dupoux |
| 2017 | Relationships Between Speech Timing and Perceived Hostility in a French Corpus of Political Debates. Charlotte Kouklia, Nicolas Audibert |
| 2017 | Remote Articulation Test System Based on WebRTC. Ikuyo Masuda-Katsuse |
| 2017 | Replay Attack Detection Using DNN for Channel Discrimination. Parav Nagarsheth, Elie Khoury, Kailash Patil, Matt Garland |
| 2017 | ResNet and Model Fusion for Automatic Spoofing Detection. Zhuxin Chen, Zhifeng Xie, Weibin Zhang, Xiangmin Xu |
| 2017 | Rescoring-Aware Beam Search for Reduced Search Errors in Contextual Automatic Speech Recognition. Ian Williams, Petar S. Aleksic |
| 2017 | Reshaping the Transformed LF Model: Generating the Glottal Source from the Waveshape Parameter R Christer Gobl |
| 2017 | Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition. Jaeyoung Kim, Mostafa El-Khamy, Jungwon Lee |
| 2017 | Residual Memory Networks in Language Modeling: Improving the Reputation of Feed-Forward Networks. Karel Benes, Murali Karthick Baskar, Lukás Burget |
| 2017 | Rhythmic Characteristics of Parkinsonian Speech: A Study on Mandarin and Polish. Massimo Pettorino, Wentao Gu, Pawel Pólrola, Ping Fan |
| 2017 | Robust Method for Estimating F Kenichiro Miwa, Masashi Unoki |
| 2017 | Robust Online i-Vectors for Unsupervised Adaptation of DNN Acoustic Models: A Study in the Context of Digital Voice Assistants. Harish Arsikere, Sri Garimella |
| 2017 | Robust Source-Filter Separation of Speech Signal in the Phase Domain. Erfan Loweimi, Jon Barker, Oscar Saz-Torralba, Thomas Hain |
| 2017 | Robust Speech Recognition Based on Binaural Auditory Processing. Anjali Menon, Chanwoo Kim, Richard M. Stern |
| 2017 | Robust Speech Recognition via Anchor Word Representations. Brian John King, I-Fan Chen, Yonatan Vaizman, Yuzong Liu, Roland Maas, Sree Hari Krishnan Parthasarathi, Björn Hoffmeister |
| 2017 | Robustness Over Time-Varying Channels in DNN-HMM ASR Based Human-Robot Interaction. José Novoa, Jorge Wuth, Juan Pablo Escudero, Josué Fredes, Rodrigo Mahú, Richard M. Stern, Néstor Becerra Yoma |
| 2017 | Rushing to Judgement: How do Laypeople Rate Caller Engagement in Thin-Slice Videos of Human-Machine Dialog? Vikram Ramanarayanan, Chee Wee Leong, David Suendermann-Oeft |
| 2017 | SEGAN: Speech Enhancement Generative Adversarial Network. Santiago Pascual, Antonio Bonafonte, Joan Serrà |
| 2017 | SFF Anti-Spoofer: IIIT-H Submission for Automatic Speaker Verification Spoofing and Countermeasures Challenge 2017. K. N. R. K. Raju Alluri, Sivanand Achanta, Sudarsana Reddy Kadiri, Suryakanth V. Gangashetty, Anil Kumar Vuppala |
| 2017 | SIAK - A Game for Foreign Language Pronunciation Learning. Reima Karhila, Sari Ylinen, Seppo Enarvi, Kalle J. Palomäki, Aleksander Nikulin, Olli Rantula, Vertti Viitanen, Krupakar Dhinakaran, Anna-Riikka Smolander, Heini Kallio, Katja Junttila, Maria Uther, Perttu Hämäläinen, Mikko Kurimo |
| 2017 | SLPAnnotator: Tools for Implementing Sign Language Phonetic Annotation. Kathleen Currie Hall, Scott Mackie, Michael Fry, Oksana Tkachman |
| 2017 | Sampling-Based Speech Parameter Generation Using Moment-Matching Networks. Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari |
| 2017 | Schwa Realization in French: Using Automatic Speech Processing to Study Phonological and Socio-Linguistic Factors in Large Corpora. Yaru Wu, Martine Adda-Decker, Cécile Fougeron, Lori Lamel |
| 2017 | Segment Level Voice Conversion with Recurrent Neural Networks. Miguel Varela Ramos, Alan W. Black, Ramón Fernandez Astudillo, Isabel Trancoso, Nuno Fonseca |
| 2017 | Semantic Edge Detection for Tracking Vocal Tract Air-Tissue Boundaries in Real-Time Magnetic Resonance Images. Krishna Somandepalli, Asterios Toutios, Shrikanth S. Narayanan |
| 2017 | Semi Parametric Concatenative TTS with Instant Voice Modification Capabilities. Alexander Sorin, Slava Shechtman, Asaf Rendel |
| 2017 | Semi-Supervised Adaptation of RNNLMs by Fine-Tuning with Domain-Specific Auxiliary Features. Salil Deena, Raymond W. M. Ng, Pranava Swaroop Madhyastha, Lucia Specia, Thomas Hain |
| 2017 | Semi-Supervised DNN Training with Word Selection for ASR. Karel Veselý, Lukás Burget, Jan Cernocký |
| 2017 | Semi-Supervised Learning of a Pronunciation Dictionary from Disjoint Phonemic Transcripts and Text. Takahiro Shinozaki, Shinji Watanabe, Daichi Mochihashi, Graham Neubig |
| 2017 | Semi-Supervised Learning with Semantic Knowledge Extraction for Improved Speech Recognition in Air Traffic Control. Ajay Srinivasamurthy, Petr Motlícek, Ivan Himawan, György Szaszák, Youssef Oualil, Hartmut Helmke |
| 2017 | Sequence to Sequence Modeling for User Simulation in Dialog Systems. Paul A. Crook, Alex Marin |
| 2017 | Sequence-to-Sequence Models Can Directly Translate Foreign Speech. Ron J. Weiss, Jan Chorowski, Navdeep Jaitly, Yonghui Wu, Zhifeng Chen |
| 2017 | Sequence-to-Sequence Voice Conversion with Similarity Metric Learned Using Generative Adversarial Networks. Takuhiro Kaneko, Hirokazu Kameoka, Kaoru Hiramatsu, Kunio Kashino |
| 2017 | Shadowing Synthesized Speech - Segmental Analysis of Phonetic Convergence. Iona Gessinger, Eran Raveh, Sébastien Le Maguer, Bernd Möbius, Ingmar Steiner |
| 2017 | Siamese Autoencoders for Speech Style Extraction and Switching Applied to Voice Identification and Conversion. Seyed Hamidreza Mohammadi, Alexander Kain |
| 2017 | Similar Prosodic Structure Perceived Differently in German and English. Heather Kember, Ann-Kathrin Grohe, Katharina Zahner, Bettina Braun, Andrea Weber, Anne Cutler |
| 2017 | Similarity Learning Based Query Modeling for Keyword Search. Batuhan Gündogdu, Murat Saraclar |
| 2017 | Simulations of High-Frequency Vocoder on Mandarin Speech Recognition for Acoustic Hearing Preserved Cochlear Implant. Tsung-Chen Wu, Tai-Shih Chi, Chia-Fone Lee |
| 2017 | Simultaneous Articulatory and Acoustic Distortion in L1 and L2 Listening: Locally Time-Reversed "Fast" Speech. Mako Ishida |
| 2017 | Single-Ended Prediction of Listening Effort Based on Automatic Speech Recognition. Rainer Huber, Constantin Spille, Bernd T. Meyer |
| 2017 | Sinusoidal Partials Tracking for Singing Analysis Using the Heuristic of the Minimal Frequency and Magnitude Difference. Kin Wah Edward Lin, Hans Anderson, Clifford So, Simon Lui |
| 2017 | Siri On-Device Deep Learning-Guided Unit Selection Text-to-Speech System. Tim Capes, Paul Coles, Alistair Conkie, Ladan Golipour, Abie Hadjitarkhani, Qiong Hu, Nancy Huddleston, Melvyn Hunt, Jiangchuan Li, Matthias Neeracher, Kishore Prahallad, Tuomo Raitio, Ramya Rasipuram, Greg Townsend, Becci Williamson, David Winarsky, Zhizheng Wu, Hepeng Zhang |
| 2017 | Snore Sound Classification Using Image-Based Deep Spectrum Features. Shahin Amiriparian, Maurice Gerczuk, Sandra Ottl, Nicholas Cummins, Michael Freitag, Sergey Pugachevskiy, Alice Baird, Björn W. Schuller |
| 2017 | Social Attractiveness in Dialogs. Antje Schweitzer, Natalie Lewandowski, Daniel Duran |
| 2017 | Social Signal Detection in Spontaneous Dialogue Using Bidirectional LSTM-CTC. Hirofumi Inaguma, Koji Inoue, Masato Mimura, Tatsuya Kawahara |
| 2017 | Sociophonetic Realizations Guide Subsequent Lexical Access. Jonny Kim, Katie Drager |
| 2017 | Sounds of the Human Vocal Tract. Reed Blaylock, Nimisha Patil, Timothy Greer, Shrikanth S. Narayanan |
| 2017 | Soundtracing for Realtime Speech Adjustment to Environmental Conditions in 3D Simulations. Bartosz Ziólko, Tomasz Pedzimaz, Szymon Piotr Palka |
| 2017 | Spanish Sign Language Recognition with Different Topology Hidden Markov Models. Carlos D. Martínez-Hinarejos, Zuzanna Parcheta |
| 2017 | Sparse Non-Negative Matrix Language Modeling: Maximum Entropy Flexibility on the Cheap. Ciprian Chelba, Diamantino Caseiro, Fadi Biadsy |
| 2017 | Speaker Adaptation in DNN-Based Speech Synthesis Using d-Vectors. Rama Doddipatla, Norbert Braunschweiler, Ranniery Maia |
| 2017 | Speaker Change Detection in Broadcast TV Using Bidirectional Long Short-Term Memory Networks. Ruiqing Yin, Hervé Bredin, Claude Barras |
| 2017 | Speaker Clustering by Iteratively Finding Discriminative Feature Space and Cluster Labels. Sungrack Yun, Hye Jin Jang, Taesu Kim |
| 2017 | Speaker Dependency Analysis, Audiovisual Fusion Cues and a Multimodal BLSTM for Conversational Engagement Recognition. Yuyun Huang, Emer Gilmartin, Nick Campbell |
| 2017 | Speaker Dependent Approach for Enhancing a Glossectomy Patient's Speech via GMM-Based Voice Conversion. Kei Tanaka, Sunao Hara, Masanobu Abe, Masaaki Sato, Shogo Minagi |
| 2017 | Speaker Diarization Using Convolutional Neural Network for Statistics Accumulation Refinement. Zbynek Zajíc, Marek Hrúz, Ludek Müller |
| 2017 | Speaker Direction-of-Arrival Estimation Based on Frequency-Independent Beampattern. Feng Guo, Yuhang Cao, Zheng Liu, Jiaen Liang, Baoqing Li, Xiaobing Yuan |
| 2017 | Speaker Verification Under Adverse Conditions Using i-Vector Adaptation and Neural Networks. Jahangir Alam, Patrick Kenny, Gautam Bhattacharya, Marcel Kockmann |
| 2017 | Speaker Verification via Estimating Total Variability Space Using Probabilistic Partial Least Squares. Chen Chen, Jiqing Han, Yilin Pan |
| 2017 | Speaker-Aware Neural Network Based Beamformer for Speaker Extraction in Speech Mixtures. Katerina Zmolíková, Marc Delcroix, Keisuke Kinoshita, Takuya Higuchi, Atsunori Ogawa, Tomohiro Nakatani |
| 2017 | Speaker-Dependent WaveNet Vocoder. Akira Tamamori, Tomoki Hayashi, Kazuhiro Kobayashi, Kazuya Takeda, Tomoki Toda |
| 2017 | Speaker-Specific Biomechanical Model-Based Investigation of a Simple Speech Task Based on Tagged-MRI. Keyi Tang, Negar M. Harandi, Jonghye Woo, Georges El Fakhri, Maureen Stone, Sidney S. Fels |
| 2017 | Speaker2Vec: Unsupervised Learning and Adaptation of a Speaker Manifold Using Deep Neural Networks with an Evaluation on Speaker Segmentation. Arindam Jati, Panayiotis G. Georgiou |
| 2017 | Speaking Style Conversion from Normal to Lombard Speech Using a Glottal Vocoder and Bayesian GMMs. Ana Ramírez López, Shreyas Seshadri, Lauri Juvela, Okko Räsänen, Paavo Alku |
| 2017 | Spectro-Temporal Modelling with Time-Frequency LSTM and Structured Output Layer for Voice Conversion. Runnan Li, Zhiyong Wu, Yishuang Ning, Lifa Sun, Helen Meng, Lianhong Cai |
| 2017 | Speech Detection and Enhancement Using Single Microphone for Distant Speech Applications in Reverberant Environments. Vinay Kothapally, John H. L. Hansen |
| 2017 | Speech Emotion Recognition with Emotion-Pair Based Framework Considering Emotion Distribution Information in Dimensional Emotion Space. Xi Ma, Zhiyong Wu, Jia Jia, Mingxing Xu, Helen Meng, Lianhong Cai |
| 2017 | Speech Enhancement Based on Harmonic Estimation Combined with MMSE to Improve Speech Intelligibility for Cochlear Implant Recipients. Dongmei Wang, John H. L. Hansen |
| 2017 | Speech Enhancement Using Bayesian Wavenet. Kaizhi Qian, Yang Zhang, Shiyu Chang, Xuesong Yang, Dinei Florêncio, Mark Hasegawa-Johnson |
| 2017 | Speech Enhancement Using Non-Negative Spectrogram Models with Mel-Generalized Cepstral Regularization. Li Li, Hirokazu Kameoka, Tomoki Toda, Shoji Makino |
| 2017 | Speech Intelligibility in Cars: The Effect of Speaking Style, Noise and Listener Age. Cassia Valentini-Botinhao, Junichi Yamagishi |
| 2017 | Speech Processing Approach for Diagnosing Dementia in an Early Stage. Roozbeh Sadeghian, J. David Schaffer, Stephen A. Zahorian |
| 2017 | Speech Rate Comparison When Talking to a System and Talking to a Human: A Study from a Speech-to-Speech, Machine Translation Mediated Map Task. Hayakawa Akira, Carl Vogel, Saturnino Luz, Nick Campbell |
| 2017 | Speech Recognition and Understanding on Hardware-Accelerated DSP. Georg Stemmer, Munir Georges, Joachim Hofer, Piotr Rozen, Josef G. Bauer, Jakub Nowicki, Tobias Bocklet, Hannah R. Colett, Ohad Falik, Michael Deisher, Sylvia J. Downing |
| 2017 | Speech Representation Learning Using Unsupervised Data-Driven Modulation Filtering for Robust ASR. Purvi Agrawal, Sriram Ganapathy |
| 2017 | Speech Synthesis for Mixed-Language Navigation Instructions. Khyathi Raghavi Chandu, Sai Krishna Rallabandi, Sunayana Sitaram, Alan W. Black |
| 2017 | Speech and Text Analysis for Multimodal Addressee Detection in Human-Human-Computer Interaction. Oleg Akhtiamov, Maxim Sidorov, Alexey A. Karpov, Wolfgang Minker |
| 2017 | Spoken Language Identification Using LSTM-Based Angular Proximity. Gregory Gelly, Jean-Luc Gauvain |
| 2017 | Spoof Detection Using Source, Instantaneous Frequency and Cepstral Features. Sarfaraz Jelil, Rohan Kumar Das, S. R. Mahadeva Prasanna, Rohit Sinha |
| 2017 | Spotting Social Signals in Conversational Speech over IP: A Deep Learning Perspective. Raymond Brueckner, Maximilian Schmitt, Maja Pantic, Björn W. Schuller |
| 2017 | Stability of Prosodic Characteristics Across Age and Gender Groups. Jan Volín, Tereza Tykalová, Tomás Boril |
| 2017 | Statistical Voice Conversion with WaveNet-Based Waveform Generation. Kazuhiro Kobayashi, Tomoki Hayashi, Akira Tamamori, Tomoki Toda |
| 2017 | Stepsize Control for Acoustic Feedback Cancellation Based on the Detection of Reverberant Signal Periods and the Estimated System Distance. Philipp Bulling, Klaus Linhard, Arthur Wolf, Gerhard Schmidt |
| 2017 | Stochastic Recurrent Neural Network for Speech Recognition. Jen-Tzung Chien, Chen Shen |
| 2017 | Structured-Based Curriculum Learning for End-to-End English-Japanese Speech Translation. Takatomo Kano, Sakriani Sakti, Satoshi Nakamura |
| 2017 | Student-Teacher Training with Diverse Decision Tree Ensembles. Jeremy Heng Meng Wong, Mark J. F. Gales |
| 2017 | Studying the Link Between Inter-Speaker Coordination and Speech Imitation Through Human-Machine Interactions. Leonardo Lancia, Thierry Chaminade, Noël Nguyen, Laurent Prévot |
| 2017 | Subband Selection for Binaural Speech Source Localization. Girija Ramesan Karthik, Prasanta Kumar Ghosh |
| 2017 | Subject-Independent Classification of Japanese Spoken Sentences by Multiple Frequency Bands Phase Pattern of EEG Response During Speech Perception. Hiroki Watanabe, Hiroki Tanaka, Sakriani Sakti, Satoshi Nakamura |
| 2017 | Subjective Intelligibility of Deep Neural Network-Based Speech Enhancement. Femke B. Gelderblom, Tron V. Tronstad, Erlend Magnus Viggen |
| 2017 | Symbol Sequence Search from Telephone Conversation. Masayuki Suzuki, Gakuto Kurata, Abhinav Sethy, Bhuvana Ramabhadran, Kenneth Ward Church, Mark Drake |
| 2017 | Synthesis of VV Utterances from Muscle Activation to Sound with a 3D Model. Saeed Dabbaghchian, Marc Arnela, Olov Engwall, Oriol Guasch |
| 2017 | Synthesising Uncertainty: The Interplay of Vocal Effort and Hesitation Disfluencies. Éva Székely, Joseph Mendelson, Joakim Gustafson |
| 2017 | Synthesising isiZulu-English Code-Switch Bigrams Using Word Embeddings. Ewald van der Westhuizen, Thomas Niesler |
| 2017 | System for Speech Transcription and Post-Editing in Microsoft Word. Askars Salimbajevs, Indra Ikauniece |
| 2017 | TBT (Toolkit to Build TTS): A High Performance Framework to Build Multiple Language HTS Voice. Atish Shankar Ghone, Rachana Nerpagar, Pranaw Kumar, Arun Baby, S. Aswin Shanmugam, M. Sasikumar, Hema A. Murthy |
| 2017 | Tacotron: Towards End-to-End Speech Synthesis. Yuxuan Wang, R. J. Skerry-Ryan, Daisy Stanton, Yonghui Wu, Ron J. Weiss, Navdeep Jaitly, Zongheng Yang, Ying Xiao, Zhifeng Chen, Samy Bengio, Quoc V. Le, Yannis Agiomyrgiannakis, Rob Clark, Rif A. Saurous |
| 2017 | Team ELISA System for DARPA LORELEI Speech Evaluation 2016. Pavlos Papadopoulos, Ruchir Travadi, Colin Vaz, Nikolaos Malandrakis, Ulf Hermjakob, Nima Pourdamghani, Michael Pust, Boliang Zhang, Xiaoman Pan, Di Lu, Ying Lin, Ondrej Glembek, Murali Karthick Baskar, Martin Karafiát, Lukás Burget, Mark Hasegawa-Johnson, Heng Ji, Jonathan May, Kevin Knight, Shrikanth S. Narayanan |
| 2017 | Temporal Dynamics of Lateral Channel Formation in /l/: 3D EMA Data from Australian English. Jia Ying, Christopher Carignan, Jason A. Shaw, Michael I. Proctor, Donald Derrick, Catherine T. Best |
| 2017 | Test-Retest Repeatability of Articulatory Strategies Using Real-Time Magnetic Resonance Imaging. Tanner Sorensen, Asterios Toutios, Johannes Töger, Louis Goldstein, Shrikanth S. Narayanan |
| 2017 | The 2016 NIST Speaker Recognition Evaluation. Seyed Omid Sadjadi, Timothée Kheyrkhah, Audrey Tong, Craig S. Greenberg, Douglas A. Reynolds, Elliot Singer, Lisa P. Mason, Jaime Hernandez-Cordero |
| 2017 | The ABAIR Initiative: Bringing Spoken Irish into the Digital Space. Ailbhe Ní Chasaide, Neasa Ní Chiaráin, Christoph Wendler, Harald Berthelsen, Andy Murphy, Christer Gobl |
| 2017 | The ASVspoof 2017 Challenge: Assessing the Limits of Replay Spoofing Attack Detection. Tomi Kinnunen, Md. Sahidullah, Héctor Delgado, Massimiliano Todisco, Nicholas W. D. Evans, Junichi Yamagishi, Kong-Aik Lee |
| 2017 | The Acoustics of Word Stress in Czech as a Function of Speaking Style. Radek Skarnitzl, Anders Eriksson |
| 2017 | The Acquisition of Focal Lengthening in Stockholm Swedish. Anna Sara H. Romøren, Aoju Chen |
| 2017 | The Effect of Gesture on Persuasive Speech. Judith Peters, Marieke Hoetjes |
| 2017 | The Effect of Situation-Specific Non-Speech Acoustic Cues on the Intelligibility of Speech in Noise. Lauren Ward, Ben G. Shirley, Yan Tang, William J. Davies |
| 2017 | The Effect of Spectral Profile on the Intelligibility of Emotional Speech in Noise. Chris Davis, Chee Seng Chong, Jeesun Kim |
| 2017 | The Effect of Spectral Tilt on Size Discrimination of Voiced Speech Sounds. Toshie Matsui, Toshio Irino, Kodai Yamamoto, Hideki Kawahara, Roy D. Patterson |
| 2017 | The Effects of Real and Placebo Alcohol on Deaffrication. Urban Zihlmann |
| 2017 | The Extended SPaRKy Restaurant Corpus: Designing a Corpus with Variable Information Density. David M. Howcroft, Dietrich Klakow, Vera Demberg |
| 2017 | The Formant Dynamics of Long Close Vowels in Three Varieties of Swedish. Otto Ewald, Eva Liina Asu, Susanne Schötz |
| 2017 | The Frequency Range of "The Ling Six Sounds" in Standard Chinese. Aijun Li, Hua Zhang, Wen Sun |
| 2017 | The I4U Mega Fusion and Collaboration for NIST Speaker Recognition Evaluation 2016. Kong-Aik Lee, Ville Hautamäki, Tomi Kinnunen, Anthony Larcher, Chunlei Zhang, Andreas Nautsch, Themos Stafylakis, Gang Liu, Mickaël Rouvier, Wei Rao, Federico Alegre, J. Ma, Man-Wai Mak, Achintya Kumar Sarkar, Héctor Delgado, Rahim Saeidi, Hagai Aronowitz, Aleksandr Sizov, Hanwu Sun, Trung Hieu Nguyen, Guangsen Wang, Bin Ma, Ville Vestman, Md. Sahidullah, M. Halonen, Anssi Kanervisto, Gaël Le Lan, Fahimeh Bahmaninezhad, Sergey Isadskiy, Christian Rathgeb, Christoph Busch, Georgios Tzimiropoulos, Q. Qian, Z. Wang, Q. Zhao, T. Wang, H. Li, J. Xue, S. Zhu, R. Jin, T. Zhao, Pierre-Michel Bousquet, Moez Ajili, Waad Ben Kheder, Driss Matrouf, Zhi Hao Lim, Chenglin Xu, Haihua Xu, Xiong Xiao, Eng Siong Chng, Benoit G. B. Fauve, Kaavya Sriskandaraja, Vidhyasaharan Sethu, W. W. Lin, Dennis Alexander Lehmann Thomsen, Zheng-Hua Tan, Massimiliano Todisco, Nicholas W. D. Evans, Haizhou Li, John H. L. Hansen, Jean-François Bonastre, Eliathamby Ambikairajah |
| 2017 | The INTERSPEECH 2017 Computational Paralinguistics Challenge: A Summary of Results. Stefan Steidl |
| 2017 | The INTERSPEECH 2017 Computational Paralinguistics Challenge: Addressee, Cold & Snoring. Björn W. Schuller, Stefan Steidl, Anton Batliner, Elika Bergelson, Jarek Krajewski, Christoph Janott, Andrei Amatuni, Marisa Casillas, Amanda Seidl, Melanie Soderstrom, Anne S. Warlaumont, Guillermo Hidalgo, Sebastian Schnieder, Clemens Heiser, Winfried Hohenhorst, Michael Herzog, Maximilian Schmitt, Kun Qian, Yue Zhang, George Trigeorgis, Panagiotis Tzirakis, Stefanos Zafeiriou |
| 2017 | The Influence of Synthetic Voice on the Evaluation of a Virtual Character. João Paulo Cabral, Benjamin R. Cowan, Katja Zibrek, Rachel McDonnell |
| 2017 | The Influence on Realization and Perception of Lexical Tones from Affricate's Aspiration. Chong Cao, Yanlu Xie, Qi Zhang, Jinsong Zhang |
| 2017 | The Kaldi OpenKWS System: Improving Low Resource Keyword Search. Jan Trmal, Matthew Wiesner, Vijayaditya Peddinti, Xiaohui Zhang, Pegah Ghahremani, Yiming Wang, Vimal Manohar, Hainan Xu, Daniel Povey, Sanjeev Khudanpur |
| 2017 | The LENA System Applied to Swedish: Reliability of the Adult Word Count Estimate. Iris-Corinna Schwarz, Noor Botros, Alekzandra Lord, Amelie Marcusson, Henrik Tidelius, Ellen Marklund |
| 2017 | The MIT-LL, JHU and LRDE NIST 2016 Speaker Recognition Evaluation System. Pedro A. Torres-Carrasquillo, Fred Richardson, Shahan C. Nercessian, Douglas E. Sturim, William M. Campbell, Youngjune Gwon, Swaroop Vattam, Najim Dehak, Sri Harish Reddy Mallidi, Phani Sankar Nidadavolu, Ruizhi Li, Réda Dehak |
| 2017 | The ModelTalker Project: A Web-Based Voice Banking Pipeline for ALS/MND Patients. H. Timothy Bunnell, Jason Lilley, Kathleen McGrath |
| 2017 | The Motivation and Development of MPAi, a Māori Pronunciation Aid. Catherine Inez Watson, Peter Keegan, Margaret Maclagan, Ray Harlow, J. King |
| 2017 | The Opensesame NIST 2016 Speaker Recognition Evaluation System. Gang Liu, Qi Qian, Zhibin Wang, Qingen Zhao, Tianzhou Wang, Hao Li, Jian Xue, Shenghuo Zhu, Rong Jin, Tuo Zhao |
| 2017 | The Perception of Emotions in Noisified Nonsense Speech. Emilia Parada-Cabaleiro, Alice Baird, Anton Batliner, Nicholas Cummins, Simone Hantke, Björn W. Schuller |
| 2017 | The Perception of English Intonation Patterns by German L2 Speakers of English. Karin Puga, Robert Fuchs, Jane Setter, Peggy Mok |
| 2017 | The Phonological Status of the French Initial Accent and its Role in Semantic Processing: An Event-Related Potentials Study. Noémie te Rietmolen, Radouane El Yagoubi, Alain Ghio, Corine Astésano |
| 2017 | The Recognition of Compounds: A Computational Account. Louis ten Bosch, Lou Boves, Mirjam Ernestus |
| 2017 | The Relationship Between F0 Synchrony and Speech Convergence in Dyadic Interaction. Sankar Mukherjee, Alessandro D'Ausilio, Noël Nguyen, Luciano Fadiga, Leonardo Badino |
| 2017 | The Relationship Between the Perception and Production of Non-Native Tones. Kaile Zhang, Gang Peng |
| 2017 | The Relative Cueing Power of F0 and Duration in German Prominence Perception. Oliver Niebuhr, Jana Winkler |
| 2017 | The Role of Linguistic and Prosodic Cues on the Prediction of Self-Reported Satisfaction in Contact Centre Phone Calls. Jordi Luque, Carlos Segura, Ariadna Sánchez, Martí Umbert, Luis Angel Galindo |
| 2017 | The Role of Temporal Amplitude Modulations in the Political Arena: Hillary Clinton vs. Donald Trump. Hans Rutger Bosker |
| 2017 | The STC Keyword Search System for OpenKWS 2016 Evaluation. Yuri Y. Khokhlov, Ivan Medennikov, Aleksei Romanenko, Valentin Mendelev, Maxim Korenevsky, Alexey Prudnikov, Natalia A. Tomashenko, Alexander Zatvornitsky |
| 2017 | The Social Life of Setswana Ejectives. Daniel Duran, Jagoda Bruni, Grzegorz Dogil, Justus Roux |
| 2017 | The Sound of Deception - What Makes a Speaker Credible? Anne Schröder, Simon Stone, Peter Birkholz |
| 2017 | The Vocative Chant and Beyond: German Calling Melodies Under Routine and Urgent Contexts. Sergio I. Quiroz, Marzena Zygis |
| 2017 | Three Dimensions of Sentence Prosody and Their (Non-)Interactions. Michael Wagner, Michael McAuliffe |
| 2017 | Tied Hidden Factors in Neural Networks for End-to-End Speaker Recognition. Antonio Miguel, Jorge Llombart, Alfonso Ortega, Eduardo Lleida |
| 2017 | Tied Variational Autoencoder Backends for i-Vector Speaker Recognition. Jesús Villalba, Niko Brümmer, Najim Dehak |
| 2017 | Tight Integration of Spatial and Spectral Features for BSS with Deep Clustering Embeddings. Lukas Drude, Reinhold Haeb-Umbach |
| 2017 | Time Delay Histogram Based Speech Source Separation Using a Planar Array. Zhaoqiong Huang, Zhanzhong Cao, Dongwen Ying, Jielin Pan, Yonghong Yan |
| 2017 | Time-Domain Envelope Modulating the Noise Component of Excitation in a Continuous Residual-Based Vocoder for Statistical Parametric Speech Synthesis. Mohammed Salah Al-Radhi, Tamás Gábor Csapó, Géza Németh |
| 2017 | Time-Frequency Coherence for Periodic-Aperiodic Decomposition of Speech Signals. Karthika Vijayan, Jitendra Kumar Dhiman, Chandra Sekhar Seelamantula |
| 2017 | Time-Frequency Masking for Blind Source Separation with Preserved Spatial Cues. Shadi Pirhosseinloo, Kostas Kokkinakis |
| 2017 | Time-Varying Autoregressions for Speaker Verification in Reverberant Conditions. Ville Vestman, Dhananjaya N. Gowda, Md. Sahidullah, Paavo Alku, Tomi Kinnunen |
| 2017 | To Improve the Robustness of LSTM-RNN Acoustic Models Using Higher-Order Feedback from Multiple Histories. Hengguan Huang, Brian Mak |
| 2017 | To Plan or not to Plan? Discourse Planning in Slot-Value Informed Sequence to Sequence Models for Language Generation. Neha Nayak, Dilek Hakkani-Tür, Marilyn A. Walker, Larry P. Heck |
| 2017 | To See or not to See: Interlocutor Visibility and Likeability Influence Convergence in Intonation. Katrin Schweitzer, Michael Walsh, Antje Schweitzer |
| 2017 | Top-Down versus Bottom-Up Theories of Phonological Acquisition: A Big Data Approach. Christina Bergmann, Sho Tsuji, Alejandrina Cristià |
| 2017 | Topic Identification for Speech Without ASR. Chunxi Liu, Jan Trmal, Matthew Wiesner, Craig Harman, Sanjeev Khudanpur |
| 2017 | Toward Expressive Speech Translation: A Unified Sequence-to-Sequence LSTMs Approach for Translating Words and Emphasis. Quoc Truong Do, Sakriani Sakti, Satoshi Nakamura |
| 2017 | Towards Better Decoding and Language Model Integration in Sequence to Sequence Models. Jan Chorowski, Navdeep Jaitly |
| 2017 | Towards Deep End-of-Turn Prediction for Situated Spoken Dialogue Systems. Angelika Maier, Julian Hough, David Schlangen |
| 2017 | Towards End-to-End Spoken Dialogue Systems with Turn Embeddings. Ali Orkan Bayer, Evgeny A. Stepanov, Giuseppe Riccardi |
| 2017 | Towards Intelligent Crowdsourcing for Audio Data Annotation: Integrating Active Learning in the Real World. Simone Hantke, Zixing Zhang, Björn W. Schuller |
| 2017 | Towards Speaker Characterization: Identifying and Predicting Dimensions of Person Attribution. Laura Fernández Gallardo, Benjamin Weiss |
| 2017 | Towards Speech Emotion Recognition "in the Wild" Using Aggregated Corpora and Deep Multi-Task Learning. Jaebok Kim, Gwenn Englebienne, Khiet P. Truong, Vanessa Evers |
| 2017 | Towards Zero-Shot Frame Semantic Parsing for Domain Scaling. Ankur Bapna, Gökhan Tür, Dilek Hakkani-Tür, Larry P. Heck |
| 2017 | Towards an Autarkic Embedded Cognitive User Interface. Frank Duckhorn, Markus Huber, Werner Meyer, Oliver Jokisch, Constanze Tschöpe, Matthias Wolff |
| 2017 | Training Context-Dependent DNN Acoustic Models Using Probabilistic Sampling. Tamás Grósz, Gábor Gosztolya, László Tóth |
| 2017 | Transfer Learning Between Concepts for Human Behavior Modeling: An Application to Sincerity and Deception Prediction. Qinyi Luo, Rahul Gupta, Shrikanth S. Narayanan |
| 2017 | Transfer Learning and Distillation Techniques to Improve the Acoustic Modeling of Low Resource Languages. Basil Abraham, Tejaswi Seeram, Srinivasan Umesh |
| 2017 | Trisyllabic Tone 3 Sandhi Patterns in Mandarin Produced by Cantonese Speakers. Jung-Yueh Tu, Janice Wing Sze Wong, Jih-Ho Cha |
| 2017 | Turbo Decoders for Audio-Visual Continuous Speech Recognition. Ahmed Hussen Abdelaziz |
| 2017 | Turn-Taking Estimation Model Based on Joint Embedding of Lexical and Prosodic Contents. Chaoran Liu, Carlos Toshinori Ishi, Hiroshi Ishiguro |
| 2017 | Turn-Taking Offsets and Dialogue Context. Peter A. Heeman, Rebecca Lunsford |
| 2017 | UTD-CRSS Systems for 2016 NIST Speaker Recognition Evaluation. Chunlei Zhang, Fahimeh Bahmaninezhad, Shivesh Ranjan, Chengzhu Yu, Navid Shokouhi, John H. L. Hansen |
| 2017 | Uncertainty Decoding with Adaptive Sampling for Noise Robust DNN-Based Acoustic Modeling. Dung T. Tran, Marc Delcroix, Atsunori Ogawa, Tomohiro Nakatani |
| 2017 | Unfolded Deep Recurrent Convolutional Neural Network with Jump Ahead Connections for Acoustic Modeling. Dung T. Tran, Marc Delcroix, Shigeki Karita, Michael Hentschel, Atsunori Ogawa, Tomohiro Nakatani |
| 2017 | Uniform Multilingual Multi-Speaker Acoustic Model for Statistical Parametric Speech Synthesis of Low-Resourced Languages. Alexander Gutkin |
| 2017 | Unit Selection with Hierarchical Cascaded Long Short Term Memory Bidirectional Recurrent Neural Nets. Vincent Pollet, Enrico Zovato, Sufian Irhimeh, Pier Domenico Batzu |
| 2017 | Unmixing Convolutive Mixtures by Exploiting Amplitude Co-Modulation: Methods and Evaluation on Mandarin Speech Recordings. Bo-Rui Chen, Huang-Yi Lee, Yi-Wen Liu |
| 2017 | Unsupervised Discriminative Training of PLDA for Domain Adaptation in Speaker Verification. Qiongqiong Wang, Takafumi Koshinaka |
| 2017 | Unsupervised Filterbank Learning Using Convolutional Restricted Boltzmann Machine for Environmental Sound Classification. Hardik B. Sailor, Dharmesh M. Agrawal, Hemant A. Patil |
| 2017 | Unsupervised Representation Learning Using Convolutional Restricted Boltzmann Machine for Spoof Speech Detection. Hardik B. Sailor, Madhu R. Kamble, Hemant A. Patil |
| 2017 | Unsupervised Speech Signal to Symbol Transformation for Zero Resource Speech Applications. Saurabhchand Bhati, Shekhar Nayak, K. Sri Rama Murty |
| 2017 | Use of Global and Acoustic Features Associated with Contextual Factors to Adapt Language Models for Spontaneous Speech Recognition. Shohei Toyama, Daisuke Saito, Nobuaki Minematsu |
| 2017 | Use of Graphemic Lexicons for Spoken Language Assessment. Kate M. Knill, Mark J. F. Gales, Konstantinos Kyriakopoulos, Anton Ragni, Yu Wang |
| 2017 | Using Approximated Auditory Roughness as a Pre-Filtering Feature for Human Screaming and Affective Speech AED. Di He, Zuofu Cheng, Mark Hasegawa-Johnson, Deming Chen |
| 2017 | Using Knowledge Graph and Search Query Click Logs in Statistical Language Model for Speech Recognition. Weiwu Zhu |
| 2017 | Using Prosody to Classify Discourse Relations. Janine Kleinhans, Mireia Farrús, Agustín Gravano, Juan Manuel Pérez, Catherine Lai, Leo Wanner |
| 2017 | Using Voice Quality Features to Improve Short-Utterance, Text-Independent Speaker Verification Systems. Soo Jin Park, Gary Yeung, Jody Kreiman, Patricia A. Keating, Abeer Alwan |
| 2017 | Utterance Selection for Optimizing Intelligibility of TTS Voices Trained on ASR Data. Erica Cooper, Xinyue Wang, Alison Chang, Yocheved Levitan, Julia Hirschberg |
| 2017 | VCV Synthesis Using Task Dynamics to Animate a Factor-Based Articulatory Model. Rachel Alexander, Tanner Sorensen, Asterios Toutios, Shrikanth S. Narayanan |
| 2017 | Variational Recurrent Neural Networks for Speech Separation. Jen-Tzung Chien, Kuan-Ting Kuo |
| 2017 | Very Low Resource Radio Browsing for Agile Developmental and Humanitarian Monitoring. Armin Saeb, Raghav Menon, Hugh Cameron, William Kibira, John A. Quinn, Thomas Niesler |
| 2017 | Video-Based Tracking of Jaw Movements During Speech: Preliminary Results and Future Directions. Andrea Bandini, Aravind Namasivayam, Yana Yunusova |
| 2017 | Virtual Adversarial Training and Data Augmentation for Acoustic Event Detection with Gated Recurrent Neural Networks. Matthias Zöhrer, Franz Pernkopf |
| 2017 | Visible Vowels: A Tool for the Visualization of Vowel Variation. Wilbert Heeringa, Hans Van de Velde |
| 2017 | Visual Learning 2: Pronunciation App Using Ultrasound, Video, and MRI. Kyori Suzuki, Ian Wilson, Hayato Watanabe |
| 2017 | Visual, Laughter, Applause and Spoken Expression Features for Predicting Engagement Within TED Talks. Fasih Haider, Fahim A. Salim, Saturnino Luz, Carl Vogel, Owen Conlan, Nick Campbell |
| 2017 | Visually Grounded Learning of Keyword Prediction from Untranscribed Speech. Herman Kamper, Shane Settle, Gregory Shakhnarovich, Karen Livescu |
| 2017 | Vocal Tract Airway Tissue Boundary Tracking for rtMRI Using Shape and Appearance Priors. Sasan Asadiabadi, Engin Erzin |
| 2017 | Vocal-Tract Model with Static Articulators: Lips, Teeth, Tongue, and More. Takayuki Arai |
| 2017 | Voice Conservation and TTS System for People Facing Total Laryngectomy. Markéta Juzová, Daniel Tihelka, Jindrich Matousek, Zdenek Hanzlícek |
| 2017 | Voice Conversion Using Sequence-to-Sequence Learning of Context Posterior Probabilities. Hiroyuki Miyoshi, Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari |
| 2017 | Voice Conversion from Unaligned Corpora Using Variational Autoencoding Wasserstein Generative Adversarial Networks. Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao, Hsin-Min Wang |
| 2017 | Voice Disguise vs. Impersonation: Acoustic and Perceptual Measurements of Vocal Flexibility in Non Experts. Véronique Delvaux, Lise Caucheteux, Kathy Huet, Myriam Piccaluga, Bernard Harmegnies |
| 2017 | Voice-to-Affect Mapping: Inferences on Language Voice Baseline Settings. Ailbhe Ní Chasaide, Irena Yanushevskaya, Christer Gobl |
| 2017 | Vowel Onset Point Detection Using Sonority Information. Bidisha Sharma, S. R. Mahadeva Prasanna |
| 2017 | Vowel and Consonant Sequences in three Bavarian Dialects of Austria. Nicola Klingler, Sylvia Moosmüller, Hannes Scheutz |
| 2017 | Vowels in the Barunga Variety of North Australian Kriol. Caroline Jones, Katherine Demuth, Weicong Li, Andre Almeida |
| 2017 | VoxCeleb: A Large-Scale Speaker Identification Dataset. Arsha Nagrani, Joon Son Chung, Andrew Zisserman |
| 2017 | Waveform Modeling Using Stacked Dilated Convolutional Neural Networks for Speech Bandwidth Extension. Yu Gu, Zhen-Hua Ling |
| 2017 | Waveform Patterns in Pitch Glides Near a Vocal Tract Resonance. Tiina Murtola, Jarmo Malinen |
| 2017 | Wavelet Speech Enhancement Based on Robust Principal Component Analysis. Chia-Lung Wu, Hsiang-Ping Hsu, Syu-Siang Wang, Jeih-weih Hung, Ying-Hui Lai, Hsin-Min Wang, Yu Tsao |
| 2017 | Weakly-Supervised Phrase Assignment from Text in a Speech-Synthesis System Using Noisy Labels. Asaf Rendel, Raul Fernandez, Zvi Kons, Andrew Rosenberg, Ron Hoory, Bhuvana Ramabhadran |
| 2017 | WebSubDub - Experimental System for Creating High-Quality Alternative Audio Track for TV Broadcasting. Martin Gruber, Jindrich Matousek, Zdenek Hanzlícek, Jakub Vít, Daniel Tihelka |
| 2017 | Weighted Spatial Covariance Matrix Estimation for MUSIC Based TDOA Estimation of Speech Source. Chenglin Xu, Xiong Xiao, Sining Sun, Wei Rao, Eng Siong Chng, Haizhou Li |
| 2017 | What Does the Speaker Embedding Encode? Shuai Wang, Yanmin Qian, Kai Yu |
| 2017 | What You See is What You Get Prosodically Less - Visibility Shapes Prosodic Prominence Production in Spontaneous Interaction. Petra Wagner, Nataliya Bryhadyr |
| 2017 | What do Babies Hear? Analyses of Child- and Adult-Directed Speech. Marisa Casillas, Andrei Amatuni, Amanda Seidl, Melanie Soderstrom, Anne S. Warlaumont, Elika Bergelson |
| 2017 | What do Finnish and Central Bavarian Have in Common? Towards an Acoustically Based Quantity Typology. Markus Jochim, Felicitas Kleber |
| 2017 | What is the Relevant Population? Considerations for the Computation of Likelihood Ratios in Forensic Voice Comparison. Vincent Hughes, Paul Foulkes |
| 2017 | When a Dog is a Cat and How it Changes Your Pupil Size: Pupil Dilation in Response to Information Mismatch. Lena F. Renner, Marcin Wlodarczak |
| 2017 | Whether Long-Term Tracking of Speech Rate Affects Perception Depends on Who is Talking. Merel Maslowski, Antje S. Meyer, Hans Rutger Bosker |
| 2017 | Which Acoustic and Phonological Factors Shape Infants' Vowel Discrimination? Exploiting Natural Variation in InPhonDB. Sho Tsuji, Alejandrina Cristià |
| 2017 | Wireless Neck-Surface Accelerometer and Microphone on Flex Circuit with Application to Noise-Robust Monitoring of Lombard Speech. Daryush D. Mehta, Patrick C. Chwalek, Thomas F. Quatieri, Laura J. Brattain |
| 2017 | Zero Frequency Filter Based Analysis of Voice Disorders. Nagaraj Adiga, Vikram C. M., Keerthi Pullela, S. R. Mahadeva Prasanna |
| 2017 | Zero-Shot Learning Across Heterogeneous Overlapping Domains. Anjishnu Kumar, Pavankumar Reddy Muddireddy, Markus Dreyer, Björn Hoffmeister |
| 2017 | Zero-Shot Learning for Natural Language Understanding Using Domain-Independent Sequential Structure and Question Types. Kugatsu Sadamitsu, Yukinori Homma, Ryuichiro Higashinaka, Yoshihiro Matsuo |
| 2017 | i-Vector DNN Scoring and Calibration for Noise Robust Speaker Verification. Zhili Tan, Man-Wai Mak |
| 2017 | i-Vector Transformation Using a Novel Discriminative Denoising Autoencoder for Noise-Robust Speaker Recognition. Shivangi Mahto, Hitoshi Yamamoto, Takafumi Koshinaka |
| 2017 | pyannote.metrics: A Toolkit for Reproducible Evaluation, Diagnostic, and Error Analysis of Speaker Diarization Systems. Hervé Bredin |