| 2012 | "Help Me, I Need More User Tests!" User Simulations as Supportive Tool in the Development Process of Spoken Dialogue Systems. Florian Kretzschmar, Sebastian Möller |
| 2012 | 13th Annual Conference of the International Speech Communication Association, INTERSPEECH 2012, Portland, Oregon, USA, September 9-13, 2012 |
| 2012 | A Bayesian Approach to Speaker Recognition Based on GMMs Using Multiple Model Structures. Takafumi Hattori, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda |
| 2012 | A Case Study: Detecting Counselor Reflections in Psychotherapy for Addictions using Linguistic Features. Dogan Can, Panayiotis G. Georgiou, David C. Atkins, Shrikanth S. Narayanan |
| 2012 | A Comparison of Classification Paradigms for Speaker Likeability Determination. Nicholas Cummins, Julien Epps, Jia Min Karen Kua |
| 2012 | A Continuous Prominence Score Based On Acoustic Features. Jean-Philippe Goldman, Mathieu Avanzi, Anne-Catherine Simon, Antoine Auchlin |
| 2012 | A Conversational Movie Search System Based on Conditional Random Fields. Jingjing Liu, Scott Cyphers, Panupong Pasupat, Ian McGraw, James R. Glass |
| 2012 | A Corpus-Based Study of Interruptions in Spoken Dialogue. Agustín Gravano, Julia Hirschberg |
| 2012 | A Correlational Discriminant Approach to Feature Extraction for Robust Speech Recognition. Vikrant Singh Tomar, Richard C. Rose |
| 2012 | A Data-driven Approach to Understanding Spoken Route Directions in Human-Robot Dialogue. Raveesh Meena, Gabriel Skantze, Joakim Gustafson |
| 2012 | A Discriminative Classification-Based Approach to Information State Updates for a Multi-Domain Dialog System. Dilek Hakkani-Tür, Gökhan Tür, Larry P. Heck, Ashley Fidler, Asli Celikyilmaz |
| 2012 | A Fast-Converging Adaptive Frequency-Domain MVDR Beamformer for Speech Enhancement. Shengkui Zhao, Douglas L. Jones |
| 2012 | A Feature Space Transformation Method for Personalization using Generalized I-Vector Clustering. Kaisheng Yao, Yifan Gong, Chaojun Liu |
| 2012 | A Frame Pruning Approach for Paralinguistic Recognition Tasks. Johannes Wagner, Florian Lingenfelser, Elisabeth André |
| 2012 | A HMM approach to residual estimation for high resolution voice conversion. Winston S. Percybrooks, Elliot Moore |
| 2012 | A Hierarchical Bayesian Approach for Semi-supervised Discriminative Language Modeling. Yik-Cheung Tam, Paul Vozila |
| 2012 | A Initial Attempt on Task-Specific Adaptation for Deep Neural Network-based Large Vocabulary Continuous Speech Recognition. Yeming Xiao, Zhen Zhang, Shang Cai, Jielin Pan, Yonghong Yan |
| 2012 | A Natural In-Car Speech Interface to Internet Services Using Hybrid ASR. Hansjörg Hofmann, Ute Ehrlich, Klaus Bader, Ilona Nothelfer, André Berton |
| 2012 | A New Approach of Speaking Rate Modeling for Mandarin Speech Prosody. Chiao-Hua Hsieh, Chen-Yu Chiang, Yih-Ru Wang, Hsiu-Min Yu, Sin-Horng Chen |
| 2012 | A Non-Uniform Filterbank for Speaker Recognition. Jia Min Karen Kua, Tharmarajah Thiruvaran, Eliathamby Ambikairajah |
| 2012 | A Novel Confidence Measure Based on Context Consistency for Spoken Term Detection. Haiyang Li, Jiqing Han, Tieran Zheng, Guibin Zheng |
| 2012 | A Preliminary Study on Cross-Databases Emotion Recognition using the Glottal Features in Speech. Rui Sun, Elliot Moore II |
| 2012 | A Random, Semantically Appropriate Sentence Generator for Speaker Verification. Jason Lilley, Amanda Stent, Ilija Zeljkovic |
| 2012 | A Robust Unsupervised Arousal Rating Framework using Prosody with Cross-Corpora Evaluation. Daniel Bone, Chi-Chun Lee, Shrikanth S. Narayanan |
| 2012 | A Rule Based Pronunciation Generator and Regional Accent Databank for Portuguese. Simone Ashby, Sílvia Barbosa, Silvia Brandão, José Pedro Ferreira, Maarten Janssen, Catarina Silva, Mário Eduardo Viaro |
| 2012 | A Self-Learning Assistive Vocal Interface Based on Vocabulary Learning and Grammar Induction. Jort F. Gemmeke, Janneke van de Loo, Guy De Pauw, Joris Driesen, Hugo Van hamme, Walter Daelemans |
| 2012 | A Sequential Bayesian Dialog Agent for Computational Ethnography. Abe Kazemzadeh, James Gibson, Juanchen Li, Sungbok Lee, Panayiotis G. Georgiou, Shrikanth S. Narayanan |
| 2012 | A Simple Hybrid Acoustic / Morphologically-Constrained Technique for the Synthesis of Stop Consonants in Various Vocalic Contexts. Frédéric Berthommier, Laurent Girin, Louis-Jean Boë |
| 2012 | A Sparse Plus Low Rank Maximum Entropy Language Model. Brian Hutchinson, Mari Ostendorf, Maryam Fazel |
| 2012 | A Specialized WFST Approach for Class Models and Dynamic Vocabulary. Paul R. Dixon, Chiori Hori, Hideki Kashioka |
| 2012 | A Stochastic Model of Singing Voice F0 Contours for Characterizing Expressive Dynamic Components. Yasunori Ohishi, Hirokazu Kameoka, Daichi Mochihashi, Kunio Kashino |
| 2012 | A Study of Mutual Information for GMM-Based Spectral Conversion. Hsin-Te Hwang, Yu Tsao, Hsin-Min Wang, Yih-Ru Wang, Sin-Horng Chen |
| 2012 | A Study on Using Word-Level HMMs to Improve ASR Performance over State-of-the-Art Phone-Level Acoustic Modeling for LVCSR. I-Fan Chen, Chin-Hui Lee |
| 2012 | A Triple-Microphone Real-Time Speech Enhancement Algorithm Based on Approximate Array Analytical Solutions. Meng Yu, Ryan Ritch, Jack Xin |
| 2012 | A Two-stage Speaker Adaptation Approach for Subspace Gaussian Mixture Model based Nonnative Speech Recognition. Bo Li, Khe Chai Sim |
| 2012 | A Two-step NMF Based Algorithm for Single Channel Speech Separation. Shuo Wang, Wenjun Wu |
| 2012 | A Weighted Combination of Speech with Text-based Models for Arabic Diacritization. Aisha S. Azim, Xiaoxuan Wang, Khe Chai Sim |
| 2012 | A comparative study of adaptive, automatic recognition of disordered speech. Heidi Christensen, Stuart P. Cunningham, Charles Fox, Phil D. Green, Thomas Hain |
| 2012 | A factorized representation of FMLLR transform based on QR-decomposition. Shakti P. Rath, Martin Karafiát, Ondrej Glembek, Jan Cernocký |
| 2012 | A full-band adaptive harmonic representation of speech. Gilles Degottex, Yannis Stylianou |
| 2012 | A method of speaker identification based on phoneme mean F-ratio contribution. Songgun Hyon, Hongcui Wang, Chen Zhao, Jianguo Wei, Jianwu Dang |
| 2012 | A methodology for the study of rhythm in drummed forms of languages: application to Bora Manguaré of Amazon. Julien Meyer, Laure Dentel, Frank Seifart |
| 2012 | A new noise-tracking algorithm for generalizing binary time-frequency (T-F) masking to ratio masking. Shan Liang, Wei Jiang, Wenju Liu |
| 2012 | A signal-separation-based array postfilter for distant speech recognition. Rita Singh, Ken'ichi Kumatani, John W. McDonough, Chen Liu |
| 2012 | A simple and efficient method to align very long speech signals to acoustically imperfect transcriptions. Germán Bordel, Mikel Peñagarikano, Luis Javier Rodríguez-Fuentes, Amparo Varona |
| 2012 | A speaker-role based approach for detecting politicians in TV broadcast news. Delphine Charlet, Géraldine Damnati |
| 2012 | A speech parameter generation algorithm using local variance for HMM-based speech synthesis. Vataya Chunwijitra, Takashi Nose, Takao Kobayashi |
| 2012 | A tutorial dialogue system with unrestricted spoken input. Peter Bell, Myroslava O. Dzikovska, Amy Isard |
| 2012 | Accelerated Batch Learning of Convex Log-linear Models for LVCSR. Simon Wiesler, Ralf Schlüter, Hermann Ney |
| 2012 | Accentual Transfer from Swiss-German to French. A Study of "Français Fédéral". Mathieu Avanzi, Pauline Dubosson, Sandra Schwab, Nicolas Obin |
| 2012 | Accounting for Speech Rate in Spoken Word Recognition. David Cheng-Huan Li, Elsi Kaiser |
| 2012 | Acoustic Cues of Vowel Quality to Coda Nasal Perception in Southern Min. Ying Chen, Vsevolod Kapatsinski, Susan Guion-Anderson |
| 2012 | Acoustic Feature-based Non-scorable Response Detection for an Automated Speaking Proficiency Assessment. Je Hun Jeon, Su-Youn Yoon |
| 2012 | Acoustic Features for Classification Based Speech Separation. Yuxuan Wang, Kun Han, DeLiang Wang |
| 2012 | Acoustic and Data-driven Features for Robust Speech Activity Detection. Samuel Thomas, Sri Harish Reddy Mallidi, Thomas Janu, Hynek Hermansky, Nima Mesgarani, Xinhui Zhou, Shihab A. Shamma, Tim Ng, Bing Zhang, Long Nguyen, Spyros Matsoukas |
| 2012 | Acoustic and Perceptual Similarity in Coarticulatorily Nasalized Vowels. Rebecca Scarborough, Georgia Zellou |
| 2012 | Active Learning by Sparse Instance Tracking and Classifier Confidence in Acoustic Emotion Recognition. Zixing Zhang, Björn W. Schuller |
| 2012 | Addressing Confusions in Spoken Language in ESL Pronunciation Tutors. Oscar Saz, Maxine Eskénazi |
| 2012 | Advances in combined electro-optical palatography. Peter Birkholz, Philippe Daechert, Christiane Neuschaefer-Rube |
| 2012 | Advances in noise robust digit recognition using hybrid exemplar-based techniques. Jort F. Gemmeke, Hugo Van hamme |
| 2012 | Age Estimation from Telephone Speech using i-vectors. Mohamad Hasan Bahari, Mitchell McLaren, Hugo Van hamme, David A. van Leeuwen |
| 2012 | Aligning manifolds to model the earliest phonological abstraction in infant-caretaker vocal imitation. Andrew R. Plummer |
| 2012 | Amplitude Modulation Filters as Feature Sets for Robust ASR: Constant Absolute or Relative Bandwidth? Niko Moritz, Jörn Anemüller, Birger Kollmeier |
| 2012 | Amplitude Spectrum based Excitation Model for HMM-based Speech Synthesis. Zhengqi Wen, Jianhua Tao |
| 2012 | An Auditory Inspired Multimodal Framework for Speech Enhancement. Majid Mirbagheri, Sahar Akram, Shihab A. Shamma |
| 2012 | An Automatic Child-Directed Speech Detector for the Study of Child Language Development. Soroush Vosoughi, Deb Roy |
| 2012 | An Evaluation of Parameter Generation Methods with Rich Context Models in HMM-Based Speech Synthesis. Shinnosuke Takamichi, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai, Sakriani Sakti, Satoshi Nakamura |
| 2012 | An Information-Extraction Approach to Speech Analysis and Processing. Chin-Hui Lee |
| 2012 | An MRI study of the oral articulation of European Portuguese nasal vowels. Catarina Oliveira, Paula Martins, Samuel S. Silva, António J. S. Teixeira |
| 2012 | An On-Line, Cloud-Based Spanish-Spanish Sign Language Translation System. Javier Tejedor, Fernando J. López-Colino, Jordi Porta, José Colás |
| 2012 | An Online Generated Transducer to Increase Dialog Manager Coverage. Joaquin Planells, Lluís F. Hurtado, Emilio Sanchis, Encarna Segarra |
| 2012 | An alignment matching method to explore pseudosyllable properties across different corpora. Raymond W. M. Ng, Thomas Hain, Keikichi Hirose |
| 2012 | Analysis of Mimicry Speech. D. Gomathi, Sathya Adithya Thati, Karthik Venkat Sridaran, Bayya Yegnanarayana |
| 2012 | Analysis of Temporal Resolution in Frequency Domain Linear Prediction. Sriram Ganapathy, Hynek Hermansky |
| 2012 | Analysis of speaker clustering strategies for HMM-based speech synthesis. Rasmus Dall, Christophe Veaux, Junichi Yamagishi, Simon King |
| 2012 | Analysis of the Characteristics of Talk-show TV Programs. Fabio Brugnara, Daniele Falavigna, Diego Giuliani, Roberto Gretter |
| 2012 | Analysis of vocal tremor and jitter by empirical mode decomposition of glottal cycle length time series. Christophe Mertens, Francis Grenez, Jean Schoentgen |
| 2012 | Analysis on the Importance of Short-Term Speech Parameterizations for Emotional Statistical Parametric Speech Synthesis. Ranniery Maia |
| 2012 | Analyzing and Interpreting Automatically Learned Rules Across Dialects. Nancy F. Chen, Wade Shen, Joseph P. Campbell |
| 2012 | Anchor Models and WCCN Normalization For Speaker Trait Classification. Yazid Attabi, Pierre Dumouchel |
| 2012 | Annotation and Recognition of Personality Traits in Spoken Conversations from the AMI Meetings Corpus. Fabio Valente, Samuel Kim, Petr Motlícek |
| 2012 | Application of Pretrained Deep Neural Networks to Large Vocabulary Speech Recognition. Navdeep Jaitly, Patrick Nguyen, Andrew W. Senior, Vincent Vanhoucke |
| 2012 | Application of Structural Events Detected on ASR Outputs for Automated Speaking Assessment. Lei Chen, Su-Youn Yoon |
| 2012 | Applying multiview learning algorithms to human-human conversation classification. Sokol Koço, Cécile Capponi, Frédéric Béchet |
| 2012 | Arabic Dialect Identification - 'Is the Secret in the Silence?' and Other Observations. Hynek Boril, Abhijeet Sangwan, John H. L. Hansen |
| 2012 | Are Sparse Representations Rich Enough for Acoustic Modeling? Oriol Vinyals, Li Deng |
| 2012 | Articulatory Feature based Multilingual MLPs for Low-Resource Speech Recognition. Yanmin Qian, Jia Liu |
| 2012 | Articulatory Strategies in Obstruent Production in Mandarin Esophageal Speech. Fang Hu, Yungang Wu, Wen Xu, Demin Han |
| 2012 | Articulatory VCV Synthesis from EMA Data. Asterios Toutios, Shinji Maeda |
| 2012 | Articulatory differences between oral and nasal vowels based on the simulation of a speaker-adaptive articulatory model. Panying Rong, Ryan Shosted, David Kuehn |
| 2012 | Articulatory speaker normalisation based on MRI-data using three-way linear decomposition methods. Julián Andrés Valdés Vargas, Pierre Badin, Laurent Lamalle |
| 2012 | Assessing agreement level between forced alignment models with data from endangered language documentation corpora. Christian DiCanio, Hosung Nam, Douglas H. Whalen, H. Timothy Bunnell, Jonathan D. Amith, Rey Castillo García |
| 2012 | Assessment of Disordered Voices Using Empirical Mode Decomposition in the Log-Spectral Domain. Abdellah Kacha, Francis Grenez, Jean Schoentgen |
| 2012 | Assessment of user simulators for spoken dialogue systems by means of subspace multidimensional clustering. Zoraida Callejas, David Griol, Klaus-Peter Engelbrecht |
| 2012 | Asymmetries in the perception of synthesized speech. Anna C. Janska, Erich Schröger, Thomas Jacobsen, Robert A. J. Clark |
| 2012 | Audio and Contact Microphones for Cough Detection. Thomas Drugman, Jérôme Urbain, Nathalie Bauwens, Ricardo Chessini, Anne-Sophie Aubriot, Patrick Lebecque, Thierry Dutoit |
| 2012 | Audio-visual Evaluation and Detection of Word Prominence in a Human-Machine Interaction Scenario. Martin Heckmann |
| 2012 | Audiovisual correlates of basic emotions in blind and sighted people. Marc Swerts, Kitty Leuverink, Madelène Munnik, Vera Nijveld |
| 2012 | Audiovisual discrimination of CV syllables: a simultaneous fMRI-EEG study. Cyril Dubois, Rudolph Sock |
| 2012 | Auditory and Dynamic Modeling Paradigms to Detect L2 Mispronunciations. Christos Koniaris, Olov Engwall, Giampiero Salvi |
| 2012 | Auditory-visual speech to infants and adults: signals and correlations. Jeesun Kim, Chris Davis, Christine Kitamura |
| 2012 | Automated Dysarthria Severity Classification for Improved Objective Intelligibility Assessment of Spastic Dysarthric Speech. Milton Orlando Sarria-Paja, Tiago H. Falk |
| 2012 | Automatic Detection of High Vocal Effort in Telephone Speech. Jouni Pohjalainen, Tuomo Raitio, Hannu Pulakka, Paavo Alku |
| 2012 | Automatic Error Recovery for Pronunciation Dictionaries. Tim Schlippe, Sebastian Ochs, Ngoc Thang Vu, Tanja Schultz |
| 2012 | Automatic Measurement of Positive and Negative Voice Onset Time. Katharine Henry, Morgan Sonderegger, Joseph Keshet |
| 2012 | Automatic Phoneme Segmentation Using Auditory Attention Features. Ozlem Kalinli |
| 2012 | Automatic Pronunciation Error Detection Based on Extended Pronunciation Space Using the Unsupervised Clustering of Pronunciation Errors. Long Zhang, Haifeng Li |
| 2012 | Automatic Speech Segmentation Using Probabilistic Latent Component Modeling. Sayan Ghosh, T. V. Sreenivas |
| 2012 | Automatic Tone Assessment of Non-Native Mandarin Speakers. Jian Cheng |
| 2012 | Automatic Topology Generation of Glottal Source HMM. Akira Sasou |
| 2012 | Automatic Transcription of Lecture Speech using Language Model Based on Speaking-Style Transformation of Proceeding Texts. Yuya Akita, Makoto Watanabe, Tatsuya Kawahara |
| 2012 | Automatic Vocabulary Adaptation Based on Semantic Similarity and Speech Recognition Confidence Measure. Shoko Yamahata, Yoshikazu Yamaguchi, Atsunori Ogawa, Hirokazu Masataki, Osamu Yoshioka, Satoshi Takahashi |
| 2012 | Automatic detection of conflict escalation in spoken conversations. Samuel Kim, Sree Harsha Yella, Fabio Valente |
| 2012 | Automatic detection of hypernasal speech signals using nonlinear and entropy measurements. Juan Rafael Orozco-Arroyave, Julián D. Arias-Londoño, Jesús Francisco Vargas-Bonilla, Elmar Nöth |
| 2012 | Automatic estimation of the first two subglottal resonances in children's speech with application to speaker normalization in limited-data conditions. Harish Arsikere, Gary K. F. Leung, Steven M. Lulich, Abeer Alwan |
| 2012 | Automatic intelligibility assessment of pathologic speech in head and neck cancer based on auditory-inspired spectro-temporal modulations. Xinhui Zhou, Daniel Garcia-Romero, Nima Mesgarani, Maureen L. Stone, Carol Y. Espy-Wilson, Shihab A. Shamma |
| 2012 | Automatic transcription error recovery for Person Name Recognition. Richard Dufour, Géraldine Damnati, Delphine Charlet, Frédéric Béchet |
| 2012 | Automatic word naming recognition for treatment and assessment of aphasia. Alberto Abad, Anna Pompili, Ângela Costa, Isabel Trancoso |
| 2012 | Automating Crowd-supervised Learning for Spoken Language Systems. Ian McGraw, Scott Cyphers, Panupong Pasupat, Jingjing Liu, James R. Glass |
| 2012 | Average Spectrotemporal Structure of Continuous Speech Matches with the Frequency Resolution of Human Hearing. Okko Räsänen |
| 2012 | Bag-of-Audio-Words Approach for Multimedia Event Classification. Stephanie Pancoast, Murat Akbacak |
| 2012 | Based on Isolated Saliency or Causal Integration? Toward a Better Understanding of Human Annotation Process using Multiple Instance Learning and Sequential Probability Ratio Test. Chi-Chun Lee, Athanasios Katsamanis, Panayiotis G. Georgiou, Shrikanth S. Narayanan |
| 2012 | Bayesian Feature Enhancement for ASR of Noisy Reverberant Real-World Data. Alexander Krueger, Oliver Walter, Volker Leutnant, Reinhold Haeb-Umbach |
| 2012 | Bayesian Group Sparse Learning for Nonnegative Matrix Factorization. Jen-Tzung Chien, Hsin-Lung Hsieh |
| 2012 | Bayesian Mixture of Probabilistic Linear Regressions for Voice Conversion. Na Li, Yu Qiao |
| 2012 | Beamforming using uniform circular arrays for distant speech recognition in reverberant environments and double talk scenarios. Hannes Pessentheiner, Stefan Petrik, Harald Romsdorfer |
| 2012 | Bilinear Factor Analysis for iVector Based Speaker Verification. Yun Lei, Lukás Burget, Nicolas Scheffer |
| 2012 | Binary Mask Estimation for Improved Speech Intelligibility in Reverberant Environments. Oldooz Hazrati, Jaewook Lee, Philipos C. Loizou |
| 2012 | Binaural Noise Reduction Using Frequency-Warped FIR Filters. Jorge I. Marin-Hurtado, David V. Anderson |
| 2012 | Boosting Classification Based Speech Separation Using Temporal Dynamics. Yuxuan Wang, DeLiang Wang |
| 2012 | C2H: A Computational Model of H&H-based Phonetic Contrast in Synthetic Speech. Mauro Nicolao, Javier Latorre, Roger K. Moore |
| 2012 | CRF-based Diacritisation of Colloquial Arabic for Automatic Speech Recognition. Sarah Al-Shareef, Thomas Hain |
| 2012 | Calibration of probabilistic age recognition. David A. van Leeuwen, Mohamad Hasan Bahari |
| 2012 | Caller Response Timing Patterns in Spoken Dialog Systems. Silke M. Witt |
| 2012 | Can litheners retune native categories acroth a thoneme boundary? Michael D. Tyler, Mona Faris |
| 2012 | Can modified casual speech reach the intelligibility of clear speech? Maria Koutsogiannaki, Michèle Pettinato, Cassie Mayo, Varvara Kandia, Yannis Stylianou |
| 2012 | Characterizing Covert Articulation in Apraxic Speech Using real-time MRI. Christina Hagedorn, Michael I. Proctor, Louis Goldstein, Maria Luisa Gorno-Tempini, Shrikanth S. Narayanan |
| 2012 | Children's Productions of Multi-Syllabic Lexical Stress Patterns in Different Prosodic Positions. Irina Shport |
| 2012 | Classification of Stressed Speech Using Physical Parameters Derived from Two-Mass Model. Xiao Yao, Takatoshi Jitsuhiro, Chiyomi Miyajima, Norihide Kitaoka, Kazuya Takeda |
| 2012 | Classifying Skewed Data: Importance Weighting to Optimize Average Recall. Andrew Rosenberg |
| 2012 | ClippyScript: A Programming Language for Multi-Domain Dialogue Systems. Frank Seide, Sean McDirmid |
| 2012 | Co-occurrence of reduced word forms in natural speech. Malte C. Viebahn, Mirjam Ernestus, James M. McQueen |
| 2012 | Coherent Topic Transition in a Conversational Agent. Daniel Macías-Galindo, Wilson Wong, Lawrence Cavedon, John Thangarajah |
| 2012 | Combination of Multiple Speech Dimensions for Automatic Assessment of Dysarthric Speech Intelligibility. Myung Jong Kim, Hoirin Kim |
| 2012 | Combination of Sparse Classification and Multilayer Perceptron for Noise-robust ASR. Yang Sun, Mathew M. Doss, Jort F. Gemmeke, Bert Cranen, Louis ten Bosch, Lou Boves |
| 2012 | Combining Acoustic Data Driven G2P and Letter-to-Sound Rules for Under Resource Lexicon Generation. Ramya Rasipuram, Mathew Magimai-Doss |
| 2012 | Combining Bottleneck-BLSTM and Semi-Supervised Sparse NMF for Recognition of Conversational Speech in Highly Instationary Noise. Felix Weninger, Martin Wöllmer, Björn W. Schuller |
| 2012 | Combining Ranking and Classification to Improve Emotion Recognition in Spontaneous Speech. Houwei Cao, Ragini Verma, Ani Nenkova |
| 2012 | Combining frame and segment based models for environmental sound classification. Pengfei Hu, Wenju Liu, Wei Jiang |
| 2012 | Combining multiple high quality corpora for improving HMM-TTS. Vincent Wan, Javier Latorre, K. K. Chin, Langzhou Chen, Mark J. F. Gales, Heiga Zen, Kate M. Knill, Masami Akamine |
| 2012 | Combining temporal and cepstral features for the automatic perceptual categorization of disordered connected speech. Ali Alpan, Jean Schoentgen, Francis Grenez |
| 2012 | Compact Audio Representation for Event Detection in Consumer Media. Xiaodan Zhuang, Stavros Tsakalidis, Shuang Wu, Pradeep Natarajan, Rohit Prasad, Prem Natarajan |
| 2012 | Comparative Analysis of Intensity between Native Speakers and Japanese Speakers of English. Tomoko Nariai, Kazuyo Tanaka, Tatsuya Kawahara |
| 2012 | Comparing different acoustic modeling techniques for multilingual boosting. David Imseng, John Dines, Petr Motlícek, Philip N. Garner, Hervé Bourlard |
| 2012 | Comparing transcription agreement on non-native English speech corpus between native and non-native annotators. Hyuksu Ryu, Sunhee Kim, Minhwa Chung |
| 2012 | Comparison of Grapheme-to-Phoneme Methods on Large Pronunciation Dictionaries and LVCSR Tasks. Stefan Hahn, Paul Vozila, Maximilian Bisani |
| 2012 | Compensating for Ageing and Quality variation in Speaker Verification. Finnian Kelly, Andrzej Drygajlo, Naomi Harte |
| 2012 | Compensation of Intrinsic Variability with Factor Analysis Modeling for Robust Speaker Verification. Sheng Chen, Mingxing Xu |
| 2012 | Complementary Phone Error Training. Frank Diehl, Philip C. Woodland |
| 2012 | Computational Modelling of the Recognition of Foreign-Accented Speech. Odette Scharenborg, Marijt J. Witteman, Andrea Weber |
| 2012 | Confidence Measures in Speech Emotion Recognition Based on Semi-supervised Learning. Jun Deng, Björn W. Schuller |
| 2012 | Confidence for Speaker Diarization using PCA Spectral Ratio. Orith Toledo-Ronen, Hagai Aronowitz |
| 2012 | Confidence measure for speech indexing based on Latent Dirichlet Allocation. Grégory Senay, Georges Linarès |
| 2012 | Considering Global Variance of the Log Power Spectrum Derived from Mel-Cepstrum in HMM-based Parametric Speech Synthesis. Xiang Yin, Zhen-Hua Ling, Ming Lei, Li-Rong Dai |
| 2012 | Consonantal space area in Children with a Cleft Palate An acoustic Study. Marion Bechet, Fabrice Hirsch, Camille Fauth, Rudolph Sock |
| 2012 | Constrained Maximum Mutual Information Dimensionality Reduction for Language Identification. Shuai Huang, Glen A. Coppersmith, Damianos G. Karakos |
| 2012 | Constrained Multichannel Speech Dereverberation. Meng Yu, Frank K. Soong |
| 2012 | Consumer-level multimedia event detection through unsupervised audio signal modeling. Byungki Byun, Ilseo Kim, Sabato Marco Siniscalchi, Chin-Hui Lee |
| 2012 | Context-Dependent MLPs for LVCSR: TANDEM, Hybrid or Both? Zoltán Tüske, Ralf Schlüter, Hermann Ney, Martin Sundermeyer |
| 2012 | Continuous Articulatory-to-Acoustic Mapping using Phone-based Trajectory HMM for a Silent Speech Interface. Thomas Hueber, Gérard Bailly, Bruce Denby |
| 2012 | Continuous Digit Recognition in Noise: Reservoirs can do an excellent job! Azarakhsh Jalalvand, Fabian Triefenbach, Jean-Pierre Martens |
| 2012 | Contrasting Cues to Verbal and Non-Verbal Backchannels in Multi-lingual Dyadic Rapport. Gina-Anne Levow, Susan Duncan |
| 2012 | Contrastive intonation in autism: The effect of speaker- and listener-perspective. Constantijn Kaland, Emiel Krahmer, Marc Swerts |
| 2012 | Contribution of Spectral Shapes to Tone Perception. Natthawut Kertkeidkachorn, Surapol Vorapatratorn, Sirinart Tangruamsub, Proadpran Punyabukkana, Atiwong Suchato |
| 2012 | Conversion of Recurrent Neural Network Language Models to Weighted Finite State Transducers for Automatic Speech Recognition. Gwénolé Lecorvé, Petr Motlícek |
| 2012 | Convolutive Non-Negative Sparse Coding and New Features for Speech Overlap Handling in Speaker Diarization. Jürgen T. Geiger, Ravichander Vipperla, Simon Bozonnet, Nicholas W. D. Evans, Björn W. Schuller, Gerhard Rigoll |
| 2012 | Correlation Between Model-based Approximations of Grounding-related Cognition and User Judgments. Klaus-Peter Engelbrecht, Sebastian Möller |
| 2012 | Correlation between vocal tract length, body height, formant frequencies, and pitch frequency for the five Japanese vowels uttered by fifteen male speakers. Hiroaki Hatano, Tatsuya Kitamura, Hironori Takemoto, Parham Mokhtari, Kiyoshi Honda, Shinobu Masaki |
| 2012 | Coupling identification and reconstruction of missing features for noise-robust automatic speech recognition. Ning Ma, Jon Barker |
| 2012 | Cries and Whispers - Classification of Vocal Effort in Expressive Speech. Nicolas Obin |
| 2012 | Cross Linguistic Comparison of Mandarin and English EMA Articulatory Data. Sheng Li, Lan Wang |
| 2012 | Cross-Lingual and Ensemble MLPs Strategies for Low-Resource Speech Recognition. Yanmin Qian, Jia Liu |
| 2012 | Cross-lingual Speaker Adaptation for HMM-based Speech Synthesis based on Perceptual Characteristics and Speaker Interpolation. Viviane de Franca Oliveira, Sayaka Shiota, Yoshihiko Nankaku, Keiichi Tokuda |
| 2012 | Cross-speaker Acoustic-to-Articulatory Inversion using Phone-based Trajectory HMM for Pronunciation Training. Thomas Hueber, Atef Ben Youssef, Gérard Bailly, Pierre Badin, Frédéric Elisei |
| 2012 | Data-driven Posterior Features for Low Resource Speech Recognition Applications. Samuel Thomas, Sriram Ganapathy, Aren Jansen, Hynek Hermansky |
| 2012 | Decoding of Uncertain Features Using the Posterior Distribution of the Clean Data for Robust Speech Recognition. Ahmed Hussen Abdelaziz, Dorothea Kolossa |
| 2012 | Deep Architectures for Articulatory Inversion. Benigno Uria, Iain Murray, Steve Renals, Korin Richmond |
| 2012 | Demonstration of Advanced Multi-Modal, Network-Centric Communication Management Suite. Victor S. Finomore |
| 2012 | Dereverberation based on Wavelet Packet Filtering for Robust Automatic Speech Recognition. Tatsuya Kawahara, Randy Gomez |
| 2012 | Deriving conversation-based features from unlabeled speech for discriminative language modeling. Damianos G. Karakos, Brian Roark, Izhak Shafran, Kenji Sagae, Maider Lehr, Emily Tucker Prud'hommeaux, Puyang Xu, Nathan Glenn, Sanjeev Khudanpur, Murat Saraclar, Daniel M. Bikel, Mark Dredze, Chris Callison-Burch, Yuan Cao, Keith B. Hall, Eva Hasler, Philipp Koehn, Adam Lopez, Matt Post, Darcey Riley |
| 2012 | Describing the development of intonational categories using a target-oriented parametric approach. Britta Lintfert, Bernd Möbius |
| 2012 | Descriptive Vocabulary Development for Degraded Speech. Dushyant Sharma, Gaston Hilkhuysen, Patrick A. Naylor, Nikolay D. Gaubitch, Mark A. Huckvale, Mike Brookes |
| 2012 | Designing a spoken language interface for a tutorial dialogue system. Peter Bell, Myroslava O. Dzikovska, Amy Isard |
| 2012 | Detecting Acronyms from Capital Letter Sequences in Spanish. Rubén San Segundo, Juan Manuel Montero, Verónica López-Ludeña, Simon King |
| 2012 | Detecting Converted Speech and Natural Speech for anti-Spoofing Attack in Speaker Recognition. Zhizheng Wu, Chng Eng Siong, Haizhou Li |
| 2012 | Detecting Intelligibility by Linear Dimensionality Reduction and Normalized Voice Quality Hierarchical Features. Dong-Yan Huang, Yongwei Zhu, Dajun Wu, Rongshan Yu |
| 2012 | Detecting OOV Named-Entities in Conversational Speech. Rohit Kumar, Rohit Prasad, Sankaranarayanan Ananthakrishnan, Aravind Namandi Vembu, David Stallard, Stavros Tsakalidis, Prem Natarajan |
| 2012 | Detecting System-directed Utterances using Dialogue-level Features. Kazunori Komatani, Akira Hirano, Mikio Nakano |
| 2012 | Detection and Positioning of Overlapped Sounds in a Room Environment. Rupayan Chakraborty, Climent Nadeu, Taras Butko |
| 2012 | Detection of Transition Segments in VCV Utterances for Estimation of the Place of Closure of Oral Stops for Speech Training. K. S. Nataraj, Prem C. Pandey |
| 2012 | Developing a Speech Activity Detection System for the DARPA RATS Program. Tim Ng, Bing Zhang, Long Nguyen, Spyros Matsoukas, Xinhui Zhou, Nima Mesgarani, Karel Veselý, Pavel Matejka |
| 2012 | Development and Evaluation of Automatic Punctuation for French and English Speech-to-Text. Jáchym Kolár, Lori Lamel |
| 2012 | Deviation measure of waveform symmetry and its application to high-speed and temporally-fine F0 extraction for vocal sound texture manipulation. Hideki Kawahara, Masanori Morise, Ryuichi Nisimura, Toshio Irino |
| 2012 | Diagnostic Prediction of Transmitted Speech Quality: A New Framework for Signal-based and Parametric Models. Sebastian Möller, Marcel Wältermann, Nicolas Côté |
| 2012 | Dialectal and generational variations in vowels in spontaneous speech. Robert Allen Fox, Ewa Jacewicz |
| 2012 | DiarTk : An Open Source Toolkit for Research in Multistream Speaker Diarization and its Application to Meetings Recordings. Deepu Vijayasenan, Fabio Valente |
| 2012 | Direction of Arrival Estimation Based on Subband Weighting for Noisy Conditions. Wei Xue, Wenju Liu |
| 2012 | Discontinuous Observation HMM for Prosodic-Event-Based F0 Generation. Tomoki Koriyama, Takashi Nose, Takao Kobayashi |
| 2012 | Discrimination of Linguistic and Non-Linguistic Vocalizations in Spontaneous Speech: Intra- and Inter-Corpus Perspectives. Felix Weninger, Björn W. Schuller |
| 2012 | Discriminative Decision Function Based Scoring Method in Joint Factor Analysis for Speaker Verification. Chunyan Liang, Xiang Zhang, Lin Yang, Yonghong Yan |
| 2012 | Discriminative Fuzzy Clustering Maximum a Posterior Linear Regression for Speaker Adaptation. Ting-Yao Hu, Yu Tsao, Lin-Shan Lee |
| 2012 | Discriminative Reranking for LVCSR Leveraging Invariant Structure. Masayuki Suzuki, Gakuto Kurata, Masafumi Nishimura, Nobuaki Minematsu |
| 2012 | Discriminative Training Using Non-uniform Criteria for Keyword Spotting on Spontaneous Speech. Chao Weng, Biing-Hwang Juang, Daniel Povey |
| 2012 | Discriminative feature-space transforms using deep neural networks. George Saon, Brian Kingsbury |
| 2012 | Discriminatively learning factorized finite state pronunciation models from dynamic Bayesian networks. Preethi Jyothi, Eric Fosler-Lussier, Karen Livescu |
| 2012 | Discriminatively trained phoneme confusion model for keyword spotting. Panagiota Karanasou, Lukás Burget, Dimitra Vergyri, Murat Akbacak, Arindam Mandal |
| 2012 | Disentangling lexical, morphological, syntactic and semantic influences on German prominence - Evidence from a production study. Barbara Samlowski, Petra Wagner, Bernd Möbius |
| 2012 | Distance-Dependent Noise Reduction for Two-Channel Microphones. Thomas Fehér, Dietmar Richter, Oliver Jokisch, Rüdiger Hoffmann |
| 2012 | Duration of ambulatory monitoring needed to accurately estimate voice use. Daryush D. Mehta, Rebecca Woodbury Listfield, Harold A. Cheyne II, James T. Heaton, Shengran W. Feng, Matías Zanartu, Robert E. Hillman |
| 2012 | Dutch Automatic Speech Recognition on the Web: Towards a General Purpose System. Joris Pelemans, Kris Demuynck, Patrick Wambacq |
| 2012 | Dynamic Conditional Random Fields for Joint Sentence Boundary and Punctuation Prediction. Xuancong Wang, Hwee Tou Ng, Khe Chai Sim |
| 2012 | Dynamic Grammars with Lookahead Composition for WFST-based Speech Recognition. Josef R. Novak, Nobuaki Minematsu, Keikichi Hirose |
| 2012 | EFL Conversational Triads: Foreigner-directed Speech and Hyperarticulation. Hua-Li Jian, Richard Konopka |
| 2012 | Effect of Relevance Factor of Maximum a posteriori Adaptation for GMM-SVM in Speaker and Language Recognition. Changhuai You, Haizhou Li, Bin Ma, Kong-Aik Lee |
| 2012 | Effect of Tongue Tip Trilling on the Glottal Excitation Source. Vinay Kumar Mittal, N. Dhananjaya, Bayya Yegnanarayana |
| 2012 | Effect of being seen on the production of visible speech cues. A pilot study on Lombard speech. Maeva Garnier, Lucie Ménard, Gabrielle Richard |
| 2012 | Effect of noise type and level on focus related fundamental frequency changes. Martti Vainio, Daniel Aalto, Antti Suni, Anja Arnhold, Tuomo Raitio, Henri Seijo, Juhani Järvikivi, Paavo Alku |
| 2012 | Effect of prosodic changes on speech intelligibility. Catherine Mayo, Vincent Aubanel, Martin Cooke |
| 2012 | Effect of speech priors in single-channel speech-music separation for ASR. Cemil Demir, Ali Taylan Cemgil, Murat Saraçlar |
| 2012 | Effects of Dialectal Origin on Articulation Rate in French. Mathieu Avanzi, Pauline Dubosson, Sandra Schwab |
| 2012 | Effects of Speaker Adaptive Training on Tensor-based Arbitrary Speaker Conversion. Daisuke Saito, Nobuaki Minematsu, Keikichi Hirose |
| 2012 | Effects of stress and speech rate on vowel quality in Catalan and Spanish. Marianna Nadeu |
| 2012 | Effects of the availability of visual information and presence of competing conversations on speech production. Vincent Aubanel, Martin Cooke, Emma Foster, María Luisa García Lecumberri, Cassie Mayo |
| 2012 | Effects of visual speech information on native listener judgments of L2 consonant intelligibility. Saya Kawase, Yue Wang |
| 2012 | Efficient Beam Width Control to Suppress Excessive Speech Recognition Computation Time Based on Prior Score Range Normalization. Satoshi Kobashikawa, Takaaki Hori, Yoshikazu Yamaguchi, Taichi Asami, Hirokazu Masataki, Satoshi Takahashi |
| 2012 | Efficient On-The-Fly Hypothesis Rescoring in a Hybrid GPU/CPU-based Large Vocabulary Continuous Speech Recognition Engine. Jungsuk Kim, Jike Chong, Ian R. Lane |
| 2012 | Efficient Segmental Conditional Random Fields for One-Pass Phone Recognition. Yanzhang He, Eric Fosler-Lussier |
| 2012 | Efficient Structured Language Modeling for Speech Recognition. Ariya Rastrow, Mark Dredze, Sanjeev Khudanpur |
| 2012 | Efficient VTS Adaptation Using Jacobian Approximation. Jinyu Li, Michael L. Seltzer, Yifan Gong |
| 2012 | Efficient multipulse approximation of speech excitation using the most singular manifold. Vahid Khanagha, Khalid Daoudi |
| 2012 | Emotion Recognition using Acoustic and Lexical Features. Viktor Rozgic, Sankaranarayanan Ananthakrishnan, Shirin Saleem, Rohit Kumar, Aravind Namandi Vembu, Rohit Prasad |
| 2012 | Emotional Speech: A Spectral Analysis. Pouria Fewzee, Fakhri Karray |
| 2012 | Emphatic segments and emphasis spread in Lebanese Arabic: a Real-time Magnetic Resonance Imaging Study. Assaf Israel, Michael I. Proctor, Louis Goldstein, Khalil Iskarous, Shrikanth S. Narayanan |
| 2012 | Employing Sentence Structure: Syntax Trees as Prosody Generators. Sarah Hoffmann, Beat Pfister |
| 2012 | Enhanced Polyphone Decision Tree Adaptation for Accented Speech Recognition. Udhyakumar Nallasamy, Florian Metze, Tanja Schultz |
| 2012 | Enhancing Exemplar-Based Posteriors for Speech Recognition Tasks. Tara N. Sainath, David Nahamoo, Dimitri Kanevsky, Bhuvana Ramabhadran |
| 2012 | Enhancing Speech Understanding in Spoken Dialogue Systems by Means of a New Frame-Correction Technique. Ramón López-Cózar, Zoraida Callejas, David Griol |
| 2012 | Enhancing Speech by Reconstruction from Robust Acoustic Features. Philip Harding, Ben Milner |
| 2012 | Enhancing Subjective Speech Intelligibility Using a Statistical Model of Speech. Petko Nikolov Petkov, W. Bastiaan Kleijn, Gustav Eje Henter |
| 2012 | Enhancing Vocal Tract Length Normalization with Elastic Registration for Automatic Speech Recognition. Florian Müller, Alfred Mertins |
| 2012 | Ensemble Classifiers Using Unsupervised Data Selection for Speaker Recognition. Chien-Lin Huang, Chiori Hori, Hideki Kashioka, Bin Ma |
| 2012 | Enumerating Differences Between Various Communicative Functions for Purposes of Czech Expressive Speech Synthesis in Limited Domain. Martin Gruber |
| 2012 | Enumerative Algebraic Coding for ACELP. Tom Bäckström |
| 2012 | Error Pattern Detection Integrating Generative and Discriminative Learning for Computer-Aided Pronunciation Training. Yow-Bang Wang, Lin-Shan Lee |
| 2012 | Estimating Classifier Performance in Unknown Noise. Ehsan Variani, Hynek Hermansky |
| 2012 | Estimating Word-Stability During Incremental Speech Recognition. Ian McGraw, Alexander Gruenstein |
| 2012 | Estimating the Vocal-Tract Area Function From Formants Using a Sensitivity Function and Least Square. Tokihiko Kaburagi, Tetsuro Takano, Yuki Sakamoto |
| 2012 | Estimating the voice source in noise. Gang Chen, Yen-Liang Shue, Jody Kreiman, Abeer Alwan |
| 2012 | Estimation of Talker's Head Orientation Based on Discrimination of the Shape of Cross-power Spectrum Phase Coefficients. Ryoichi Takashima, Tetsuya Takiguchi, Yasuo Ariki |
| 2012 | Estimation of the vocal tract shape of nasals using a Bayesian scheme. Christian H. Kasess, Wolfgang Kreuzer, Ewald Enzinger, Nadja Kerschhofer-Puhalo |
| 2012 | EuskoParl: a speech and text Spanish-Basque parallel corpus. Alicia Pérez, José M. Alcaide, M. Inés Torres |
| 2012 | Evaluating NLP Features for Automatic Prediction of Language Impairment Using Child Speech Transcripts. Khairun-nisa Hassanali, Yang Liu, Thamar Solorio |
| 2012 | Evaluating Prosodic Processing for Incremental Speech Synthesis. Timo Baumann, David Schlangen |
| 2012 | Evaluation of Many-to-Many Alignment Algorithm by Automatic Pronunciation Annotation Using Web Text Mining. Keigo Kubo, Hiromichi Kawanami, Hiroshi Saruwatari, Kiyohiro Shikano |
| 2012 | Evaluation of a Sparse Representation-Based Classifier For Bird Phrase Classification Under Limited Data Conditions. Lee Ngee Tan, Kantapon Kaewtip, Martin L. Cody, Charles E. Taylor, Abeer Alwan |
| 2012 | Evaluation of a formant-based speech-driven lip motion generation. Carlos Toshinori Ishi, Chaoran Liu, Hiroshi Ishiguro, Norihiro Hagita |
| 2012 | Event-based Video Retrieval Using Audio. Qin Jin, Peter Franz Schulam, Shourabh Rawat, Susanne Burger, Duo Ding, Florian Metze |
| 2012 | Example-based speech enhancement with joint utilization of spatial, spectral & temporal cues of speech and noise. Keisuke Kinoshita, Marc Delcroix, Mehrez Souden, Tomohiro Nakatani |
| 2012 | Exemplar-Based Sparse Representation for Language Recognition on I-Vectors. Bing Jiang, Yan Song, Wu Guo, Li-Rong Dai |
| 2012 | Expand CRF to Model Long Distance Dependencies in Prosodic Break Prediction. Jian Luan |
| 2012 | Exploiting Discriminative Point Process Models for Spoken Term Detection. Atta Norouzian, Aren Jansen, Richard C. Rose, Samuel Thomas |
| 2012 | Exploiting Temporal Sequence Structure for Semantic Analysis of Multimedia. Sourish Chaudhuri, Rita Singh, Bhiksha Raj |
| 2012 | Exploiting the Semantic Web for Unsupervised Natural Language Semantic Parsing. Gökhan Tür, Minwoo Jeong, Ye-Yi Wang, Dilek Hakkani-Tür, Larry P. Heck |
| 2012 | Exploring Discriminative Speech Trajectory Structures. Heyun Huang, Louis ten Bosch, Bert Cranen, Lou Boves |
| 2012 | Exploring Joint Equalization of Spatial-Temporal Contextual Statistics of Speech Features for Robust Speech Recognition. Hsin-Ju Hsieh, Jeih-weih Hung, Berlin Chen |
| 2012 | Exploring Off Time Nature for Speech Enhancement. Meng Yu, Jack Xin |
| 2012 | Exploring Rich Expressive Information from Audiobook Data Using Cluster Adaptive Training. Langzhou Chen, Mark J. F. Gales, Vincent Wan, Javier Latorre, Masami Akamine |
| 2012 | Expressing Speaker's Intentions through Sentence-Final Intonations for Japanese Conversational Speech Synthesis. Kazuhiko Iwata, Tetsunori Kobayashi |
| 2012 | Extrinsic normalization for vocal tracts depends on the signal, not on attention. Matthias J. Sjerps, James M. McQueen, Holger Mitterer |
| 2012 | F0 and the Perception of Prominence. Tim Mahrt, Jennifer Cole, Margaret M. Fleck, Mark Hasegawa-Johnson |
| 2012 | Factor Analysis and Nuisance Attribute Projection Revisited. Lukás Machlica, Zbynek Zajíc |
| 2012 | Factored MLLR Adaptation Algorithm for HMM-based Expressive TTS. June Sig Sung, Doo Hwa Hong, Hyun Woo Koo, Nam Soo Kim |
| 2012 | Factored adaptation using a combination of feature-space and model-space transforms. Michael L. Seltzer, Alex Acero |
| 2012 | Feature Selection for Speaker Traits. Jouni Pohjalainen, Serdar Kadioglu, Okko Räsänen |
| 2012 | Feature extraction based on hearing system signal processing for robust large vocabulary speech recognition. Peter Li, Xie Sun |
| 2012 | Foreground Speech Segmentation using Zero Frequency Filtered Signal. Deepak K. T., Biswajit Dev Sarma, S. R. Mahadeva Prasanna |
| 2012 | From PVI to Perception: A Return to the Roots of Rhythm in Broadcast News. Matthew Benton |
| 2012 | Front-end Channel Compensation using Mixture-dependent Feature Transformations for i-Vector Speaker Recognition. Taufiq Hasan, John H. L. Hansen |
| 2012 | Fully Automated Neuropsychological Assessment for Detecting Mild Cognitive Impairment. Maider Lehr, Emily Tucker Prud'hommeaux, Izhak Shafran, Brian Roark |
| 2012 | Fully Bayesian speaker clustering based on hierarchically structured utterance-oriented Dirichlet process mixture model. Naohiro Tawara, Tetsuji Ogawa, Shinji Watanabe, Atsushi Nakamura, Tetsunori Kobayashi |
| 2012 | GCC-PHAT based Head Orientation Estimation. Carlos Segura, Javier Hernando |
| 2012 | Gaussian Map based Acoustic Model Adaptation Using Untranscribed Data for Speech Recognition in Severely Adverse Environments. Wooil Kim, John H. L. Hansen |
| 2012 | Gaussian Mixture Gain Priors for Regularized Nonnegative Matrix Factorization in Single-Channel Source Separation. Emad M. Grais, Hakan Erdogan |
| 2012 | Gaze Patterns in Turn-Taking. Catharine Oertel, Marcin Wlodarczak, Jens Edlund, Petra Wagner, Joakim Gustafson |
| 2012 | Gendered sound symbolism and masking effects in speech processing. Molly Babel, Grant McGuire |
| 2012 | Genetic Algorithm Based Feature Selection for Speaker Trait Classification. Dongrui Wu |
| 2012 | Glottal Waveform Analysis of Physical Task Stress Speech. Keith W. Godin, Taufiq Hasan, John H. L. Hansen |
| 2012 | Glottal source shape parameter estimation using phase minimization variants. Stefan Huber, Axel Röbel, Gilles Degottex |
| 2012 | Goal-Oriented Auditory Scene Recognition. Kailash Patil, Mounya Elhilali |
| 2012 | Group Sparse Hidden Markov Models for Speech Recognition. Jen-Tzung Chien, Cheng-Chun Chiang |
| 2012 | Group Sparsity for Speaker Identity Discrimination in Factorisation-based Speech Recognition. Antti Hurmalainen, Rahim Saeidi, Tuomas Virtanen |
| 2012 | HMM Based Continuous EOG Recognition for Eye-input Speech Interface. Fuming Fang, Takahiro Shinozaki, Yasuo Horiuchi, Shingo Kuroiwa, Sadaoki Furui, Toshimitsu Musha |
| 2012 | HMM-based speech synthesis using sub-band basis spectrum model. Yamato Ohtani, Masatsune Tamura, Masahiro Morita, Takehiko Kagoshima, Masami Akamine |
| 2012 | Hearing Loss and the Use of Acoustic Cues in Phonetic Categorisation of Fricatives. Odette Scharenborg, Esther Janse |
| 2012 | Hermitian based Hidden Activation Functions for Adaptation of Hybrid HMM/ANN Models. Sabato Marco Siniscalchi, Jinyu Li, Chin-Hui Lee |
| 2012 | Heterogeneous Convolutive Non-Negative Sparse Coding. Dong Wang, Javier Tejedor |
| 2012 | Hidden Conditional Random Fields with M-to-N Alignments for Grapheme-to-Phoneme Conversion. Patrick Lehnen, Stefan Hahn, Vlad-Andrei Guta, Hermann Ney |
| 2012 | Hidden Markov Convolutive Mixture Model for Pitch Contour Analysis of Speech. Kota Yoshizato, Hirokazu Kameoka, Daisuke Saito, Shigeki Sagayama |
| 2012 | Hidden Markov Models as Priors for Regularized Nonnegative Matrix Factorization in Single-Channel Source Separation. Emad M. Grais, Hakan Erdogan |
| 2012 | Hierarchical English Emphatic Speech Synthesis Based on HMM with Limited Training Data. Fanbo Meng, Zhiyong Wu, Helen M. Meng, Jia Jia, Lianhong Cai |
| 2012 | Histogram-based spectral equalization for HMM-based speech synthesis using mel-LSP. Yamato Ohtani, Masatsune Tamura, Masahiro Morita, Takehiko Kagoshima, Masami Akamine |
| 2012 | Hooking up spectro-temporal filters with auditory-inspired representations for robust automatic speech recognition. Bernd T. Meyer, Constantin Spille, Birger Kollmeier, Nelson Morgan |
| 2012 | How Marni Helps English Language Learners Acquire Oral Reading Fluency. Ronald A. Cole, Daniel Bolaños, Wayne H. Ward, J. T. Carmer, Eric Borts, Edward Svirsky |
| 2012 | How consonants, dialect and speech rate affect vowel devoicing? Masako Fujimoto, Seiya Funatsu, Ichiro Fujimoto |
| 2012 | I-vectors and ILP clustering adapted to cross-show speaker diarization. Grégor Dupuy, Mickael Rouvier, Sylvain Meignier, Yannick Estève |
| 2012 | IVN-Based Joint Training Of GMM And HMMs Using An Improved VTS-Based Feature Compensation For Noisy Speech Recognition. Jun Du, Qiang Huo |
| 2012 | Implementation of Computationally Efficient Real-Time Voice Conversion. Tomoki Toda, Takashi Muramatsu, Hideki Banno |
| 2012 | Implementation of Simple Spectral Techniques to Enhance the Intelligibility of Speech using a Harmonic Model. Daniel Erro, Yannis Stylianou, Eva Navas, Inma Hernáez |
| 2012 | Improve the Implementation of Pitch Features for Mandarin Digit String Recognition Task. Pei Ding, Liqiang He |
| 2012 | Improved Automatic Extraction of Generation Process Model Commands and Its use for Generating Fundamental Frequency Contours for Training HMM-based Speech Synthesis. Hiroya Hashimoto, Keikichi Hirose, Nobuaki Minematsu |
| 2012 | Improved Model Selection for the ASR-Driven Binary Mask. William Hartmann, Eric Fosler-Lussier |
| 2012 | Improved Prediction of Japanese Word Accent Sandhi Using CRF. Nobuaki Minematsu, Shumpei Kobayashi, Shinya Shimizu, Keikichi Hirose |
| 2012 | Improved Speech Intelligibility with a Chimaera Hearing Aid Algorithm. Andrew Hines, Naomi Harte |
| 2012 | Improved formant frequency estimation from high-pitched vowels by downgrading the contribution of the glottal source with weighted linear prediction. Paavo Alku, Jouni Pohjalainen, Martti Vainio, Anne-Maria Laukkanen, Brad H. Story |
| 2012 | Improvement in Automatic Pronunciation Scoring using Additional Basic Scores and Learning to Rank. Liang-Yu Chen, Jyh-Shing Roger Jang |
| 2012 | Improvements in Japanese Voice Search. Ken-ichi Iso, Edward Whittaker, Tadashi Emori, Junpei Miyake |
| 2012 | Improvements of the Beta-Order Minimum Mean-Square Error (MMSE) Spectral Amplitude Estimator using Chi Priors. Marek B. Trawicki, Michael T. Johnson |
| 2012 | Improving Discriminative Training for Robust Acoustic Models in Large Vocabulary Continuous Speech Recognition. Janne Pylkkönen, Mikko Kurimo |
| 2012 | Improving L1-Specific Phonological Error Diagnosis in Computer Assisted Pronunciation Training. Theban Stanley, Kadri Hacioglu |
| 2012 | Improving Recognition of Speaker States and Traits by Cumulative Evidence: Intoxication, Sleepiness, Age and Gender. Felix Weninger, Erik Marchi, Björn W. Schuller |
| 2012 | Improving WFST-based G2P Conversion with Alignment Constraints and RNNLM N-best Rescoring. Josef R. Novak, Nobuaki Minematsu, Keikichi Hirose, Chiori Hori, Hideki Kashioka, Paul R. Dixon |
| 2012 | Improving the Entropy Estimate of Neuronal Firings of Modeled Cochlear Nucleus Neurons. Andrea Grigorescu, Marek Rudnicki, Michael Isik, Werner Hemmert, Stefano Rini |
| 2012 | Indexing Raw Acoustic Features for Scalable Zero Resource Search. Aren Jansen, Benjamin Van Durme |
| 2012 | Inference of Critical Articulator Position for Fricative Consonants. Alexander Sepúlveda, Rodrigo Capobianco Guido, Germán Castellanos-Domínguez |
| 2012 | Initialization Schemes for Multilayer Perceptron Training and their Impact on ASR Performance using Multilingual Data. Ngoc Thang Vu, Wojtek Breiter, Florian Metze, Tanja Schultz |
| 2012 | Integrated Feature Normalization and Enhancement for robust Speaker Recognition using Acoustic Factor Analysis. Taufiq Hasan, John H. L. Hansen |
| 2012 | Integrating Adaptive Beam-forming and Auditory Features for Robust Large Vocabulary Speech Recognition. Xie Sun, Peter Li, Manli Zhu, Qiru Zhou |
| 2012 | Integrating Deep Neural Networks into Structural Classification Approach based on Weighted Finite-State Transducers. Yotaro Kubo, Takaaki Hori, Atsushi Nakamura |
| 2012 | Integrating Intra-Speaker Topic Modeling and Temporal-Based Inter-Speaker Topic Modeling in Random Walk for Improved Multi-Party Meeting Summarization. Yun-Nung Chen, Florian Metze |
| 2012 | Integrating Stress Information in Large Vocabulary Continuous Speech Recognition. Bogdan Ludusan, Stefan Ziegler, Guillaume Gravier |
| 2012 | Intelligibility classification of pathological speech using fusion of multiple high level descriptors. Jangwon Kim, Naveen Kumar, Andreas Tsiartas, Ming Li, Shrikanth S. Narayanan |
| 2012 | Intelligibility of speech spoken in noise/reverberation for older adults in reverberant environments. Nao Hodoshima, Takayuki Arai, Kiyohiro Kurisu |
| 2012 | Inter-gestural timing in French nasal vowels: A comparative study of (Liège, Tournai) Northern French vs. (Marseille, Toulouse) Southern French. Véronique Delvaux, Kathy Huet, Myriam Piccaluga, Bernard Harmegnies |
| 2012 | Interactions Between Turn-taking Gaps, Disfluencies and Social Obligation. Rebecca Lunsford, Peter A. Heeman, Jan P. H. van Santen |
| 2012 | Interactive Spoken Content Retrieval with Different Types of Actions Optimized By a Markov Decision Process. Tsung-Hsien Wen, Hung-yi Lee, Lin-Shan Lee |
| 2012 | Interplay between verbal response latency and physiology of children with autism during ECA interactions. Theodora Chaspari, Chi-Chun Lee, Shrikanth S. Narayanan |
| 2012 | Interspeech Pathology Challenge: Investigations into Speaker and Sentence Specific Effects. Anthony P. Stark, Alireza Bayestehtashk, Meysam Asgari, Izhak Shafran |
| 2012 | Intrinsic Spectral Analysis for Zero and High Resource Speech Recognition. Aren Jansen, Samuel Thomas, Hynek Hermansky |
| 2012 | Intrinsic velocity differences of lip and jaw movements: preliminary results. Peter Birkholz, Phil Hoole |
| 2012 | Inventory-Based Audio-Visual Speech Enhancement. Dorothea Kolossa, Robert M. Nickel, Steffen Zeiler, Rainer Martin |
| 2012 | Inverting the Point Process Model for Fast Phonetic Keyword Search. Keith Kintzley, Aren Jansen, Kenneth Church, Hynek Hermansky |
| 2012 | Investigating Performance of the Discriminative Methods for Long-Term Speaker Adaptation. Danning Jiang, Dimitri Kanevsky, Vaibhava Goel, Yong Qin |
| 2012 | Investigating syllabic prominence with Conditional Random Fields and Latent-Dynamic Conditional Random Fields. Francesco Cutugno, Enrico Leone, Bogdan Ludusan, Antonio Origlia |
| 2012 | Investigation of Maximum Entropy Hybrid Language Models for Open Vocabulary German and Polish LVCSR. M. Ali Basha Shaik, Amr El-Desoky Mousa, Ralf Schlüter, Hermann Ney |
| 2012 | Is 'not bad' good enough? Aspects of unknown voices' likability. Benjamin Weiss, Felix Burkhardt |
| 2012 | Iterative MMSE Estimation of Vocal Tract Length Normalization Factors for Voice Transformation. Daniel Erro, Eva Navas, Inma Hernáez |
| 2012 | Joint Decoding for Speech Recognition and Semantic Tagging. Anoop Deoras, Ruhi Sarikaya, Gökhan Tür, Dilek Hakkani-Tür |
| 2012 | Joint Pitch-Analysis Formant-Synthesis framework for CS recovery of speech. Srikanth Raj Chetupally, Thippur V. Sreenivas |
| 2012 | Judging temporal onset differences for concurrent vowels: Results for young, middle-aged, and older adults. Daniel Fogerty, Diane Kewley-Port, Larry E. Humes |
| 2012 | KNNDIST: A Non-Parametric Distance Measure for Speaker Segmentation. Seyed Hamidreza Mohammadi, Hossein Sameti, Mahsa Sadat Elyasi Langarani, Amirhossein Tavanaei |
| 2012 | Knowledge-Based Word Lattice Rescoring in a Dynamic Context. Todd Shore, Friedrich Faubel, Hartmut Helmke, Dietrich Klakow |
| 2012 | LSTM Neural Networks for Language Modeling. Martin Sundermeyer, Ralf Schlüter, Hermann Ney |
| 2012 | Language Modeling for Voice-Enabled Social TV Using Tweets. Junlan Feng, Bernard Renger |
| 2012 | Language differences in the perceptual weight of prominence-lending properties. Bistra Andreeva, William J. Barry, Magdalena Wolska |
| 2012 | Large Scale Hierarchical Neural Network Language Models. Hong-Kwang Kuo, Ebru Arisoy, Ahmad Emami, Paul Vozila |
| 2012 | Large Vocabulary Speech Recognition Using Deep Tensor Neural Networks. Dong Yu, Li Deng, Frank Seide |
| 2012 | Learning When to Listen: Detecting System-Addressed Speech in Human-Human-Computer Dialog. Elizabeth Shriberg, Andreas Stolcke, Dilek Hakkani-Tür, Larry P. Heck |
| 2012 | Learning an Artificial F0-Contour for ALT Speech. Anna Katharina Fuchs, Martin Hagmüller |
| 2012 | Lenition of /d/ in spontaneous Spanish and Catalan. Miguel Simonet, José Ignacio Hualde, Marianna Nadeu |
| 2012 | Less errors with TTS? A dictation experiment with foreign language learners. Thomas Pellegrini, Ângela Costa, Isabel Trancoso |
| 2012 | Leveraging Social Annotation for Topic Language Model Adaptation. Youzheng Wu, Kazuhiko Abe, Paul R. Dixon, Chiori Hori, Hideki Kashioka |
| 2012 | Lexical Story Co-Segmentation of Chinese Broadcast News. Wei Feng, Xuecheng Nie, Liang Wan, Lei Xie, Jianmin Jiang |
| 2012 | Lexical-phonetic automata for spoken utterance indexing and retrieval. Julien Fayolle, Murat Saraclar, Fabienne Moreau, Christian Raymond, Guillaume Gravier |
| 2012 | Likability Classification - A Not so Deep Neural Network Approach. Raymond Brueckner, Björn W. Schuller |
| 2012 | Local-feature-map Integration Using Convolutional Neural Networks for Music Genre Classification. Toru Nakashika, Christophe Garcia, Tetsuya Takiguchi |
| 2012 | Log-spectral feature reconstruction based on an occlusion model for noise robust speech recognition. José A. González, Antonio M. Peinado, Angel M. Gomez, Ning Ma |
| 2012 | Longer Features: They do a speech detector good. T. J. Tsai, Nelson Morgan |
| 2012 | Low latency combination of parallelized single-pass LVCSR systems. Fethi Bougares, Mickael Rouvier, Yannick Estève, Georges Linarès |
| 2012 | Low-SNR, Speaker-Dependent Speech Enhancement using GMMs and MFCCs. Laura E. Boucheron, Phillip L. De Leon |
| 2012 | Low-rank Audio Signal Classification Under Soft Margin and Trace Norm Constraints. Ziqiang Shi, Tieran Zheng, Jiqing Han, Shiwen Deng |
| 2012 | MAP Estimation of Whole-Word Acoustic Models with Dictionary Priors. Keith Kintzley, Aren Jansen, Hynek Hermansky |
| 2012 | Making Conversational Vowels More Clear. Seyed Hamidreza Mohammadi, Alexander Kain, Jan P. H. van Santen |
| 2012 | Mask Estimation and Refinement for MFT-based Robust Speaker Verification. Yali Zhao, Lei Xie, Zhonghua Fu |
| 2012 | Maximising objective speech intelligibility by local f0 modulation. Julián Villegas, Martin Cooke |
| 2012 | Maximum Entropy Language Model Adaptation for Mobile Speech Input. Tanel Alumäe, Kaarel Kaljurand |
| 2012 | Maximum F1-Score Discriminative Training for Automatic Mispronunciation Detection in Computer-Assisted Language Learning. Huang Hao, Jianming Wang, Halidan Abudureyimu |
| 2012 | Mean Hilbert Envelope Coefficients (MHEC) for Robust Speaker Recognition. Seyed Omid Sadjadi, Taufiq Hasan, John H. L. Hansen |
| 2012 | Meaning inhibition and sentence processing in Chinese: Evidence from negative priming. Michael C. W. Yip |
| 2012 | Measuring prosodic alignment in cooperative task-based conversations. Khiet P. Truong, Dirk Heylen |
| 2012 | Mel cepstral coefficient modification based on the Glimpse Proportion measure for improving the intelligibility of HMM-generated synthetic speech in noise. Cassia Valentini-Botinhao, Junichi Yamagishi, Simon King |
| 2012 | Methodological Issues in Assessing Perceptual Representation of Consonant Sounds in Thai. Charturong Tantibundhit, Chutamanee Onsuwan, P. Phienphanich, Chai Wutiwiwatchai |
| 2012 | Microphone Array Post-filter based on Spatially-Correlated Noise Measurements for Distant Speech Recognition. Ken'ichi Kumatani, Bhiksha Raj, Rita Singh, John W. McDonough |
| 2012 | Mitigating Effects of Recording Condition Mismatch in Speaker Recognition Using Partial Least Squares. Jeremiah Remus, Jenniffer Estrada, Stephanie A. C. Schuckers |
| 2012 | Mixed probabilistic and deterministic dependency parsing. Christophe Cerisara, Alejandra Lorenzo |
| 2012 | Mixture Component Clustering for Efficient Speaker Verification. Richard D. McClanahan, Phillip L. De Leon |
| 2012 | Model-Based Approaches for Degraded Channel Modelling in Robust ASR. Mark J. F. Gales, Federico Flego |
| 2012 | Model-based Duration-difference Approach on Accent Evaluation of L2 Learner. Chatchawarn Hansakunbuntheung, Ananlada Chotimongkol, Sumonmas Thatphithakkul, Patcharika Chootrakool |
| 2012 | Model-based Single-Channel Dereverberation in Noisy Acoustical Environments. Xulei Bao, Jie Zhu |
| 2012 | Model-based approaches to adaptive training in reverberant environments. Yongqiang Wang, Mark J. F. Gales |
| 2012 | Modeling Cue Trading in Human Word Recognition. Louis ten Bosch, Odette Scharenborg |
| 2012 | Modeling Pause-Duration for Style-Specific Speech Synthesis. Alok Parlikar, Alan W. Black |
| 2012 | Modeling source-tract interaction in speech production: Voicing onset vs. vowel height after a voiceless obstruent. Jorge C. Lucero, Laura L. Koenig, Susanne Fuchs |
| 2012 | Modeling spoken language acquisition with a generic cognitive architecture for associative learning. Okko Räsänen, Heikki Rasilo, Unto K. Laine |
| 2012 | Modeling the Creaky Excitation for Parametric Speech Synthesis. Thomas Drugman, John Kane, Christer Gobl |
| 2012 | Modelling a Noisy-channel for Voice Conversion Using Articulatory Features. Bajibabu Bollepalli, Alan W. Black, Kishore Prahallad |
| 2012 | Modelling pause duration as a function of contextual length. David Doukhan, Albert Rilliard, Sophie Rosset, Christophe d'Alessandro |
| 2012 | Modulation Spectrum Analysis for Speaker Personality Trait Recognition. Alexei Ivanov, Xin Chen |
| 2012 | Modulation domain blind source separation for noisy speech mixture. Yi Zhang, Yunxin Zhao |
| 2012 | More on the Normalization of Syllable Prominence Ratings. Christopher Sappok, Denis Arnold |
| 2012 | Morpheme Level Feature-based Language Models for German LVCSR. Amr El-Desoky Mousa, M. Ali Basha Shaik, Ralf Schlüter, Hermann Ney |
| 2012 | Multi-System Fusion of Extended Context Prosodic and Cepstral Features for Paralinguistic Speaker Trait Classification. Michelle Hewlett Sanchez, Aaron Lawson, Dimitra Vergyri, Harry Bratt |
| 2012 | N-gram FST Indexing for Spoken Term Detection. Chao Liu, Dong Wang, Javier Tejedor |
| 2012 | Nasal Coarticulation and Contrastive Stress. Georgia Zellou, Rebecca Scarborough |
| 2012 | Nasality from Moroccan Arabic Nasal and Pharyngeal Consonants: Patterns of Airflow and Nasalance. Georgia Zellou |
| 2012 | Nativeness Classification with Suprasegmental Features on the Accent Group Level. Mahnoosh Mehrabani, Joseph Tepperman, Emily Nava |
| 2012 | Naturalness Judgement of Prosodic Variation of Japanese Utterances with Prosody Modified Stimuli. Chiharu Tsurutani, Shunichi Ishihara |
| 2012 | Noise Compensation for Subspace Gaussian Mixture Models. Liang Lu, K. K. Chin, Arnab Ghoshal, Steve Renals |
| 2012 | Noise Robust Pitch Tracking by Subband Autocorrelation Classification. Byung Suk Lee, Daniel P. W. Ellis |
| 2012 | Non-auditory cognitive capabilities in computational modeling of early language acquisition. Okko Räsänen |
| 2012 | Normalization of Text Messages Using Character- and Phone-based Machine Translation Approaches. Chen Li, Yang Liu |
| 2012 | Normalization of spectro-temporal Gabor filter bank features for improved robust automatic speech recognition systems. Marc René Schädler, Birger Kollmeier |
| 2012 | Novel Approach to Live Captioning Through Re-speaking: Tailoring Speech Recognition to Re-speaker's Needs. Ales Prazák, Zdenek Loose, Jan Trmal, Josef V. Psutka, Josef Psutka |
| 2012 | Novel Metrics of Speech Rhythm for the Assessment of Emotion. Fabien Ringeval, Mohamed Chetouani, Björn W. Schuller |
| 2012 | OOV Word Detection using Hybrid Models with Mixed Types of Fragments. Long Qin, Alexander I. Rudnicky |
| 2012 | Objective Child Vocal Development Measurement with Naturalistic Daylong Audio Recording. Dongxin Xu, Jill Gilkerson, Jeffery Richards |
| 2012 | Objective Intelligibility Assessment of Text-to-Speech System using Template Constrained Generalized Posterior Probability. Linfang Wang, Lijuan Wang, Yan Teng, Zhe Geng, Frank K. Soong |
| 2012 | Objective, Subjective and Linguistic Roads to Perceptual Prominence - How are they compared and why? Petra Wagner, Fabio Tamburini, Andreas Windmann |
| 2012 | Obtaining prominence judgments from naïve listeners - Influence of rating scales, linguistic levels and normalisation. Denis Arnold, Petra Wagner, Bernd Möbius |
| 2012 | On Speaker-Independent Personality Perception and Prediction from Speech. Tim Polzehl, Katrin Schoenenberg, Sebastian Möller, Florian Metze, Gelareh Mohammadi, Alessandro Vinciarelli |
| 2012 | On the Dynamics of Overlap in Multi-Party Conversation. Kornel Laskowski, Mattias Heldner, Jens Edlund |
| 2012 | On the Modeling of Voiceless Stop Sounds of Speech using Adaptive Quasi-Harmonic Models. George P. Kafentzis, Olivier Rosec, Yannis Stylianou |
| 2012 | On the Role of Binary Mask Pattern in Automatic Speech Recognition. Arun Narayanan, DeLiang Wang |
| 2012 | On the Use of Non-Linear Polynomial Kernel SVMs in Language Recognition. Sibel Yaman, Jason W. Pelecanos, Mohamed Kamal Omar |
| 2012 | On the Use of Spectral and Iterative Methods for Speaker Diarization. Stephen Shum, Najim Dehak, Jim Glass |
| 2012 | On the acoustics of overlapping laughter in conversational speech. Khiet P. Truong, Jürgen Trouvain |
| 2012 | On the assessment of audiovisual cues to speaker confidence by preteens with typical development (TD) and a-typical development (AD). Marc Swerts, Cees de Bie |
| 2012 | On the effect of the acoustic environment on the accuracy of perception of speaker orientation from auditory cues alone. Jens Edlund, Mattias Heldner, Joakim Gustafson |
| 2012 | On the use of Machine Learning Methods for Speech and Voicing Classification. Philip Harding, Ben Milner |
| 2012 | On-the-fly Topic Adaptation for YouTube Video Transcription. Kapil Thadani, Fadi Biadsy, Daniel M. Bikel |
| 2012 | Online Story Segmentation of Multilingual Streaming Broadcast News. Amit Srivastava, Saurabh Khanwalkar, Gretchen Markiewicz, Guruprasad Saikumar |
| 2012 | Open-Vocabulary Retrieval of Spoken Content with Shorter/Longer Queries Considering Word/Subword-based Acoustic Feature Similarity. Hung-yi Lee, Po-Wei Chou, Lin-Shan Lee |
| 2012 | Optimised spectral weightings for noise-dependent speech intelligibility enhancement. Yan Tang, Martin Cooke |
| 2012 | Optimization of Dialog Strategies using Automatic Dialog Simulation and Statistical Dialog Management Techniques. Zoraida Callejas, Ramón López-Cózar |
| 2012 | Optimization-Based Control for the Extended Baum-Welch Algorithm. Janne Pylkkönen, Mikko Kurimo |
| 2012 | Overlapped Speech Detection in Meeting Using Cross-Channel Spectral Subtraction and Spectrum Similarity. Ryo Yokoyama, Yu Nasu, Koichi Shinoda, Koji Iwano |
| 2012 | Overlapping Sound Event Recognition using Local Spectrogram Features with the Generalised Hough Transform. Jonathan William Dennis, Tran Huy Dat, Engsiong Chng |
| 2012 | PLDA Modeling in I-Vector and Supervector Space for Speaker Verification. Ye Jiang, Kong-Aik Lee, Zhenmin Tang, Bin Ma, Anthony Larcher, Haizhou Li |
| 2012 | PLDA using Gaussian Restricted Boltzmann Machines with application to Speaker Verification. Themos Stafylakis, Patrick Kenny, Mohammed Senoussaoui, Pierre Dumouchel |
| 2012 | Parallel Training for Deep Stacking Networks. Li Deng, Brian Hutchinson, Dong Yu |
| 2012 | Parallel combination of multilingual speech streams for improved ASR. João Miranda, João Paulo Neto, Alan W. Black |
| 2012 | Paraphrastic Language Models. Xunying Liu, Mark J. F. Gales, Philip C. Woodland |
| 2012 | Patrol Team Language Identification System for DARPA RATS P1 Evaluation. Pavel Matejka, Oldrich Plchot, Mehdi Soufifar, Ondrej Glembek, Luis Fernando D'Haro, Karel Veselý, Frantisek Grézl, Jeff Z. Ma, Spyros Matsoukas, Najim Dehak |
| 2012 | Pauses and respiratory markers of the structure of book reading. Gérard Bailly, Cécilia Gouvernayre |
| 2012 | Perceived prosodic boundaries in Taiwanese and their acoustic correlates. Grace Kuo |
| 2012 | Perception of Pitch Contours among Native Tone Listeners. Ratree Wayland, Donruethai Laphasradakul, Edith Kaan, Cao Rui |
| 2012 | Perception of Synthetic Speech in Adult Users of Cochlear Implants. Kyoko Nagao, Mark Paullin, James B. Polikoff, Jason Lilley, H. Timothy Bunnell |
| 2012 | Perception of the moraic obstruent /Q/: a cross-linguistic study. Makiko Sadakata, Mizuki Shingai, Alex Brandmeyer, Kaoru Sekiyama |
| 2012 | Perceptual Assimilation of Arabic Voiceless Fricatives by English Monolinguals. Michael D. Tyler, Sarah Fenwick |
| 2012 | Perceptual Foundations for Naturalistic Variability in the Prosody of Synthetic Speech. Nanette Veilleux, Jonathan Barnes, Alejna Brugos, Stefanie Shattuck-Hufnagel |
| 2012 | Perceptual Importance of the Phase Related Information in Speech. Ibon Saratxaga, Inma Hernáez, Michael Pucher, Eva Navas, Iñaki Sainz |
| 2012 | Perceptual Learning of /f/-/s/ by Older Listeners. Odette Scharenborg, Esther Janse, Andrea Weber |
| 2012 | Perceptual compensation for the effects of reverberation on consonant identification: A comparison of human and machine performance. Guy J. Brown, Amy V. Beeston, Kalle J. Palomäki |
| 2012 | Performance Comparison of Intrusive Objective Speech Intelligibility and Quality Metrics for Cochlear Implant Users. João Felipe Santos, Stefano Cosentino, Oldooz Hazrati, Philipos C. Loizou, Tiago H. Falk |
| 2012 | Performance Comparison of Training Algorithms for Semi-Supervised Discriminative Language Modeling. Erinç Dikici, Arda Çelebi, Murat Saraclar |
| 2012 | PermA and Balloon: Tools for string alignment and text processing. Uwe D. Reichel |
| 2012 | Personality traits detection using a parallelized modified SFFS algorithm. Clément Chastagnol, Laurence Devillers |
| 2012 | Phase estimation for signal reconstruction in single-channel source separation. Pejman Mowlaee, Rahim Saeidi, Rainer Martin |
| 2012 | Phone Adaptive Training for Speaker Diarization. Simon Bozonnet, Ravichander Vipperla, Nicholas W. D. Evans |
| 2012 | Phone recognition in critical bands using sub-band temporal modulations. Feipeng Li, Sri Harish Reddy Mallidi, Hynek Hermansky |
| 2012 | Phoneme Class Based Adaptation for Mismatch Acoustic Modeling of Distant Noisy Speech. Seçkin Uluskan, John H. L. Hansen |
| 2012 | Phoneme resistance during speech-in-speech comprehension. Léo Varnet, Julien Meyer, Michel Hoen, Fanny Meunier |
| 2012 | Phonetic Foreignization of Mandarin for Dubbing in Imported Western Movies. Luying Hou, Yuan Jia, Aijun Li |
| 2012 | Phonological complexity and vocabulary size in 30-month-old Swedish children. Ulrika Marklund, Ulla Sundberg, Iris-Corinna Schwarz, Francisco Lacerda |
| 2012 | Phonology & the Interpretation of Fine Phonetic Detail in Berlin German. Stefanie Jannedy, Melanie Weirich |
| 2012 | Phonotactic Language Recognition Using MLP Features. Mohamed Faouzi BenZeghiba, Jean-Luc Gauvain, Lori Lamel |
| 2012 | Phonotactic Language Recognition using i-vectors and Phoneme Posteriogram Counts. Luis Fernando D'Haro, Ondrej Glembek, Oldrich Plchot, Pavel Matejka, Mehdi Soufifar, Ricardo de Córdoba, Jan Cernocký |
| 2012 | Phrasal Cohort Based Unsupervised Discriminative Language Modeling. Puyang Xu, Brian Roark, Sanjeev Khudanpur |
| 2012 | Phrase Boundary Assignment from Text in Multiple Domains. Andrew Rosenberg, Raul Fernandez, Bhuvana Ramabhadran |
| 2012 | Physiological and acoustic study of word initial post-lexical gemination in Moroccan Arabic. Chakir Zeroual, Diamantis Gafos, Phil Hoole, John H. Esling |
| 2012 | Pipelined Back-Propagation for Context-Dependent Deep Neural Networks. Xie Chen, Adam Eversole, Gang Li, Dong Yu, Frank Seide |
| 2012 | Pitch Estimation Based on Long Frame Harmonic Model and Short Frame Average Correlation Coefficient. Dongmei Wang, Philipos C. Loizou |
| 2012 | Pitch and Intonation Contribution to Speakers' Traits Classification. Claude Montacié, Marie-José Caraty |
| 2012 | Pitch and phonological perception of tone in the Suruí language of Rondônia (Brazil): identification task of LHL and LHH tonal patterns. Julien Meyer |
| 2012 | Pitch range control of Japanese boundary pitch movements. Yosuke Igarashi, Hanae Koiso |
| 2012 | Pitch-Scaled Analysis based Residual Reconstruction for Speech Analysis and Synthesis. Zhengqi Wen, Hideki Kawahara, Jianhua Tao |
| 2012 | Plagiarism Detection in Polyphonic Music using Monaural Signal Separation. Soham De, Indradyumna Roy, Tarunima Prabhakar, Kriti Suneja, Sourish Chaudhuri, Rita Singh, Bhiksha Raj |
| 2012 | PodCastle: Collaborative Training of Language Models on the Basis of Wisdom of Crowds. Jun Ogata, Masataka Goto |
| 2012 | Pooling Robust Shift-Invariant Sparse Representations of Acoustic Signals. Po-Sen Huang, Jianchao Yang, Mark Hasegawa-Johnson, Feng Liang, Thomas S. Huang |
| 2012 | Portability of Semantic Annotations for Fast Development of Dialogue Corpora. Bassam Jabaian, Fabrice Lefèvre, Laurent Besacier |
| 2012 | Posterior-Scaled MPE: Novel Discriminative Training Criteria. Markus Nußbaum-Thom, Zoltán Tüske, Georg Heigold, Ralf Schlüter, Hermann Ney |
| 2012 | Power Mean Pyramid Scores for Summarization Evaluation. Sameer Maskey, Andrew Rosenberg |
| 2012 | Practice and feedback in L2 speaking: an evaluation of the DISCO CALL system. Catia Cucchiarini, Joost van Doremalen, Helmer Strik |
| 2012 | Predictability affects vowel dispersion and dynamics in the Buckeye Corpus. Michael McAuliffe, Molly Babel |
| 2012 | Predicting Character-Appropriate Voices for a TTS-based Storyteller System. Erica Greene, Taniya Mishra, Patrick Haffner, Alistair Conkie |
| 2012 | Predicting Likability of Speakers with Gaussian Processes. Dingchao Lu, Fei Sha |
| 2012 | Prediction of Turn-Taking by Combining Prosodic and Eye-Gaze Information in Poster Conversations. Tatsuya Kawahara, Takuma Iwatate, Katsuya Takanashi |
| 2012 | Preference-learning based Inverse Reinforcement Learning for Dialog Control. Hiroaki Sugiyama, Toyomi Meguro, Yasuhiro Minami |
| 2012 | ProTK: An Improved Prosody Toolkit. Jacob Okamoto, Serguei V. S. Pakhomov, Elizabeth Shriberg, Andreas Stolcke |
| 2012 | Probabilistic Speaker-Class based Acoustic Modeling for Large Vocabulary Continuous Speech Recognition. Xiangang Li, Dan Su, Zaihu Pang, Xihong Wu |
| 2012 | Production and Perception of Focus in PFC and non-PFC Languages: Comparing Beijing Mandarin and Hainan Tsat. Bei Wang, Chenxia Li, Qian Wu, Xiaxia Zhang, Baofeng Wang, Yi Xu |
| 2012 | Prof-Life-Log: Audio Environment Detection for Naturalistic Audio Streams. Ali Ziaei, Abhijeet Sangwan, John H. L. Hansen |
| 2012 | Pronunciation quality evaluation of sentences by combining word based scores. Jorge Wuth, Néstor Becerra Yoma, Leopoldo Benavides, Hiram Vivanco |
| 2012 | Proper Name Splicing in Computer Games with TTS. Blaise Potard, Matthew P. Aylett, Christopher J. Pidcock |
| 2012 | Prosodic Cues to Disengagement and Uncertainty in Physics Tutorial Dialogues. Diane J. Litman, Heather Friedberg, Katherine Forbes-Riley |
| 2012 | Prosodic Entrainment in an Information-Driven Dialog System. Andrew Fandrianto, Maxine Eskénazi |
| 2012 | Prosodic Marking of Continuation versus Completion in Children's Narratives. Melissa A. Redford, Laura Dilley, Jessica Gamache, Elizabeth Wieland |
| 2012 | Prosodic Realization of Focus in Statement and Question in Tibetan (Lhasa Dialect). Xiaxia Zhang, Bei Wang, Qian Wu, Yi Xu |
| 2012 | Prosodic contex-based analysis of disfluencies. Helena Moniz, Fernando Batista, Isabel Trancoso, Ana Isabel Mata |
| 2012 | Prosodic measurements and question types in the Spontal corpus of Swedish dialogues. Sofia Strömbergsson, Jens Edlund, David House |
| 2012 | Psychoacoustic Segment Scoring for Multi-Form Speech Synthesis. Alexander Sorin, Slava Shechtman, Vincent Pollet |
| 2012 | Q-Gaussian based spectral subtraction for robust speech recognition. Hilman Ferdinandus Pardede, Koichi Shinoda, Koji Iwano |
| 2012 | Quality Analysis of Macroprosodic F0 Dynamics in Text-to-Speech Signals. Christoph Norrenbrock, Florian Hinterleitner, Ulrich Heute, Sebastian Möller |
| 2012 | Quantitative Analysis of Pitch in Speech of Children with Neurodevelopmental Disorders. Géza Kiss, Jan P. H. van Santen, Emily Tucker Prud'hommeaux, Lois M. Black |
| 2012 | Query-by-Example using Speaker Content Graphs. William M. Campbell, Elliot Singer |
| 2012 | RSR2015: Database for Text-Dependent Speaker Verification using Multiple Pass-Phrases. Anthony Larcher, Kong-Aik Lee, Bin Ma, Haizhou Li |
| 2012 | Rapid Nonlinear Speaker Adaptation for Large-Vocabulary Continuous Speech Recognition. Zoi Roupakia, Anton Ragni, Mark J. F. Gales |
| 2012 | Real-Time Lecture Transcription using ASR for Czech Hearing Impaired or Deaf Students. Petr Cerva, Jan Silovský, Jindrich Zdánský, Jan Nouza, Jirí Málek |
| 2012 | Real-time Implementation of Multi-band Frequency Compression for Listeners with Moderate Sensorineural Impairment. Nitya Tiwari, Prem C. Pandey, Pandurangarao N. Kulkarni |
| 2012 | Real-time Visualization of English Pronunciation on an IPA Chart Based on Articulatory Feature Extraction. Yurie Iribe, Takurou Mori, Kouichi Katsurada, Goh Kawai, Tsuneo Nitta |
| 2012 | Recurrent Neural Networks for Noise Reduction in Robust ASR. Andrew L. Maas, Quoc V. Le, Tyler M. O'Neil, Oriol Vinyals, Patrick Nguyen, Andrew Y. Ng |
| 2012 | Relative Importance of Temporal Envelope and Fine Structure Cues in Low- and High-Order Harmonic Regions for Mandarin Lexical-tone Recognition. Guangting Mai |
| 2012 | Residual Phase Cepstrum Coefficients with Application to Cross-lingual Speaker Verification. Michael T. Johnson, Jianglin Wang |
| 2012 | Resonator-based creaky voice detection. Thomas Drugman, John Kane, Christer Gobl |
| 2012 | Rethinking The Corpus: Moving towards Dynamic Linguistic Resources. Andrew Rosenberg |
| 2012 | Robust Event Detection From Spoken Content In Consumer Domain Videos. Stavros Tsakalidis, Xiaodan Zhuang, Roger Hsiao, Shuang Wu, Pradeep Natarajan, Rohit Prasad, Prem Natarajan |
| 2012 | Robust Feature Extraction for Speech Recognition by Enhancing Auditory Spectrum. Md. Jahangir Alam, Patrick Kenny, Douglas D. O'Shaughnessy |
| 2012 | Robust Pitch Estimation Using l1-regularized Maximum Likelihood Estimation. Feng Huang, Tan Lee |
| 2012 | Robust Tracking for Automatic Reading Tutors. Emre Yilmaz, Dirk Van Compernolle, Hugo Van hamme |
| 2012 | Robust phoneme recognition based on biomimetic speech contours. Michael A. Carlin, Kailash Patil, Sridhar Krishna Nemala, Mounya Elhilali |
| 2012 | Robust triphone mapping for acoustic modeling. Milos Cernak, David Imseng, Hervé Bourlard |
| 2012 | Role of Prosody in Automatic Modality Recognition of Bangla Speech. Anal Warsi, Tulika Basu, Debasis Mazumdar |
| 2012 | Scalable Minimum Bayes Risk Training of Deep Neural Network Acoustic Models Using Distributed Hessian-free Optimization. Brian Kingsbury, Tara N. Sainath, Hagen Soltau |
| 2012 | Search Space Pruning Based on Anticipated Path Recombination in LVCSR. David Nolden, Ralf Schlüter, Hermann Ney |
| 2012 | Selection of TDOA Parameters for MDM Speaker Diarization. Beatriz Martínez-González, José Manuel Pardo, Julián D. Echeverry-Correa, José A. Vallejo-Pinto, Roberto Barra-Chicote |
| 2012 | Semi-Blind Model Adaptation using Piece-wise Energy Decay Curve for Large Reverberant Environments. Abdul Waheed Mohammed, Marco Matassoni, Hari Krishna Maganti, Maurizio Omologo |
| 2012 | Semi-Supervised Methods for Improving Keyword Search of Unseen Terms. Scott Novotney, Ivan Bulyko, Richard M. Schwartz, Sanjeev Khudanpur, Owen Kimball |
| 2012 | Sentence Detection Using Multiple Annotations. Ann Lee, James R. Glass |
| 2012 | Sibilant Speech Detection in Noise. Sira Gonzalez, Mike Brookes |
| 2012 | Similar Speaker Selection Technique Based on Distance Metric Learning with Perceptual Voice Quality Similarity. Yusuke Ijima, Mitsuaki Isogai, Hideyuki Mizuno |
| 2012 | Similarities in fundamental frequency in infant speech segmentation models. Ellen Marklund, Francisco Lacerda, Iris-Corinna Schwarz, Ulla Sundberg |
| 2012 | Simultaneous Discriminative Training and Mixture Splitting of HMMs for Speech Recognition. Muhammad Ali Tahir, Markus Nußbaum-Thom, Ralf Schlüter, Hermann Ney |
| 2012 | Smile with a smile. Hugo Quené, Will Schuerman |
| 2012 | Sparse Bayesian Factor Analysis for Stereo-based Stochastic Mapping. Xiaodong Cui, Mohamed Afify, George Saon, Vaibhava Goel |
| 2012 | Sparse Probabilistic Linear Discriminant Analysis for Speaker Verification. Hai Yang, Chunyan Liang, Yunfei Xu, Lin Yang, Yonghong Yan |
| 2012 | Speaker Adaptation Using Variational Bayesian Linear Regression in Normalized Feature Space. Seong-Jun Hahm, Atsunori Ogawa, Masakiyo Fujimoto, Takaaki Hori, Atsushi Nakamura |
| 2012 | Speaker Clustering for a Mixture of Singing and Reading. Mahnoosh Mehrabani, John H. L. Hansen |
| 2012 | Speaker Clustering in Emotion Recognition. Ni Ding, Julien Epps |
| 2012 | Speaker Discrimination Ability of Glottal Waveform Features. Juan F. Torres, Elliot Moore |
| 2012 | Speaker Independent Single Channel Source Separation using Sinusoidal Features. Shivesh Ranjan, Karen L. Payton, Pejman Mowlaee |
| 2012 | Speaker Personality Classification Using Systems Based on Acoustic-Lexical Cues and an Optimal Tree-Structured Bayesian Network. Kartik Audhkhasi, Angeliki Metallinou, Ming Li, Shrikanth S. Narayanan |
| 2012 | Speaker Recognition for Children's Speech. Saeid Safavi, Maryam Najafian, Abualsoud Hanani, Martin J. Russell, Peter Jancovic, Michael J. Carey |
| 2012 | Speaker Verification Using Neighborhood Preserving Embedding. Chunyan Liang, Jinchao Yang, Lin Yang, Yonghong Yan |
| 2012 | Speaker diarization of overlapping speech based on silence distribution in meeting recordings. Sree Harsha Yella, Fabio Valente |
| 2012 | Speaker idiosyncratic rhythmic features in the speech signal. Volker Dellwo, Adrian Leemann, Marie-José Kolly |
| 2012 | Speaker-Dependent Voice Activity Detection Robust to Background Speech Noise. Shigeki Matsuda, Naoya Ito, Kosuke Tsujino, Hideki Kashioka, Shigeki Sagayama |
| 2012 | Speaker-adaptive visual speech synthesis in the HMM-framework. Dietmar Schabus, Michael Pucher, Gregor Hofer |
| 2012 | Spectral Intersections for Non-Stationary Signal Separation. Trausti T. Kristjansson, Thad Hughes |
| 2012 | Speech Activity Detection for Noisy Data Using Adaptation Techniques. Mohamed Omar |
| 2012 | Speech Data Clustering Based on Phoneme Error Trend for Unsupervised Acoustic Model Adaptation. Taichi Asami, Satoshi Kobashikawa, Hirokazu Masataki, Osamu Yoshioka, Satoshi Takahashi |
| 2012 | Speech Enhancement Using Sparse Convolutive Non-negative Matrix Factorization with Basis Adaptation. Michael Carlin, Nicolas Malyska, Thomas F. Quatieri |
| 2012 | Speech Enhancement With Bivariate Gamma Model. Atanu Saha, Tetsuya Shimamura |
| 2012 | Speech Enhancement by Online Non-negative Spectrogram Decomposition in Non-stationary Noise Environments. Zhiyao Duan, Gautham J. Mysore, Paris Smaragdis |
| 2012 | Speech Enhancement for Android (SEA): A Speech Processing Demonstration Tool for Android Based Smart Phones and Tablets. Roger Chappel, Kuldip K. Paliwal |
| 2012 | Speech Pattern Discovery using Audio-Visual Fusion and Canonical Correlation Analysis. Lei Xie, Yinqing Xu, Lilei Zheng, Qiang Huang, Bingfeng Li |
| 2012 | Speech Production-Perception Relationships in Children with Speech Delay. Kyoko Nagao, Mark Paullin, Vilena Livinsky, James B. Polikoff, Linda D. Vallino, Thierry G. Morlet, N. Carolyn Schanen, H. Timothy Bunnell |
| 2012 | Speech Recognition by Denoising and Dereverberation Based on Spectral Subtraction in a Real Noisy Reverberant Environment. Kyohei Odani, Longbiao Wang, Atsuhiko Kai |
| 2012 | Speech and speaker separation in human auditory cortex. Nima Mesgarani, Edward Chang |
| 2012 | Speech factorization for HMM-TTS based on cluster adaptive training. Javier Latorre, Vincent Wan, Mark J. F. Gales, Langzhou Chen, K. K. Chin, Kate M. Knill, Masami Akamine |
| 2012 | Speech modeling and processing by low-dimensional dynamic glottal models. Carlo Drioli, Andrea Calanca |
| 2012 | Speech restoration based on deep learning autoencoder with layer-wised pretraining. Xugang Lu, Shigeki Matsuda, Chiori Hori, Hideki Kashioka |
| 2012 | Speech synthesis using a non-maximally decimated filter bank for embedded systems. Nobuyuki Nishizawa, Tsuneo Kato |
| 2012 | Speech-in-noise intelligibility improvement based on spectral shaping and dynamic range compression. Tudor-Catalin Zorila, Varvara Kandia, Yannis Stylianou |
| 2012 | Speech/Nonspeech Segmentation in Web Videos. Ananya Misra |
| 2012 | SpeechMark: Landmark Detection Tool for Speech Analysis. Suzanne Boyce, Harriet J. Fell, Joel MacAuslan |
| 2012 | Spelling as a Complementary Strategy for Speech Recognition. Keith Vertanen, Per Ola Kristensson |
| 2012 | Spoken Dialogs With a Virtual Science Tutor. Wayne H. Ward, Daniel Bolaños, Ronald A. Cole |
| 2012 | Spoken Document Clustering Using Word Confusion Networks. Shajith Ikbal, Sachindra Joshi, Ashish Verma, Om D. Deshmukh |
| 2012 | Spoken Inquiry Discrimination Using Bag-of-Words for Speech-Oriented Guidance System. Haruka Majima, Rafael Torres, Yoko Fujita, Hiromichi Kawanami, Tomoko Matsui, Hiroshi Saruwatari, Kiyohiro Shikano |
| 2012 | Spontaneous-Speech Acoustic-Prosodic Features of Children with Autism and the Interacting Psychologist. Daniel Bone, Matthew P. Black, Chi-Chun Lee, Marian E. Williams, Pat Levitt, Sungbok Lee, Shrikanth S. Narayanan |
| 2012 | Spoofing countermeasures for the protection of automatic speaker recognition systems against attacks with artificial signals. Federico Alegre, Ravichander Vipperla, Nicholas W. D. Evans |
| 2012 | Study of Different Backends in a State-Of-the-Art Language Recognition System. Mikel Peñagarikano, Amparo Varona, Mireia Díez, Luis Javier Rodríguez-Fuentes, Germán Bordel |
| 2012 | Study of the Effect of I-vector Modeling on Short and Mismatch Utterance Duration for Speaker Verification. Achintya Kumar Sarkar, Driss Matrouf, Pierre-Michel Bousquet, Jean-François Bonastre |
| 2012 | Study on Integration of Speaker Diarization with Speaker Adaptive Speech Recognition for Broadcast Transcription. Jan Silovský, Petr Cerva, Jindrich Zdánský, Jan Nouza |
| 2012 | Sub-band based Log-energy and Its Dynamic Range Stretching for Robust In-car Speech Recognition. Weifeng Li, Hervé Bourlard |
| 2012 | Subspace Gaussian Mixture Models Based on Noise Compensation for Speech Recognition. Mohamed Bouallegue, Driss Matrouf, Georges Linarès, Mickael Rouvier |
| 2012 | Subspace-Based Feature Representation and Learning for Language Recognition. Yu-Chin Shih, Hung-Shin Lee, Hsin-Min Wang, Shyh-Kang Jeng |
| 2012 | Subword speech recognition for detection of unseen words. Ivan Bulyko, Jose Herrero, Chris Mihelich, Owen Kimball |
| 2012 | Supervector LDA: A New Approach to Reduced-Complexity I-vector Language Recognition. Alan McCree, Bengt J. Borgstrom |
| 2012 | Supervised Spoken Document Summarization jointly Considering Utterance Importance and Redundancy by Structured Support Vector Machine. Hung-yi Lee, Yu-Yu Chou, Yow-Bang Wang, Lin-Shan Lee |
| 2012 | Supervised and unsupervised Web-based language model domain adaptation. Gwénolé Lecorvé, John Dines, Thomas Hain, Petr Motlícek |
| 2012 | Supervized Mixture of PLDA Models for Cross-Channel Speaker Verification. Konstantin Simonchik, Timur Pekhovsky, Andrey Shulipa, Anton Afanasyev |
| 2012 | Syllable perception depends on tone perception. Iris Chuoying Ouyang, Khalil Iskarous |
| 2012 | Synthetic F0 Can Effectively Convey Speaker ID in Delexicalized Speech. Eric Morley, Esther Klabbers, Jan P. H. van Santen, Alexander Kain, Seyed Hamidreza Mohammadi |
| 2012 | Synthetic References for Template-based ASR using posterior features. Serena Soldo, Mathew Magimai-Doss, Hervé Bourlard |
| 2012 | Synthetic Speech Discrimination using Pitch Pattern Statistics Derived from Image Analysis. Phillip L. De Leon, Bryan Stewart, Junichi Yamagishi |
| 2012 | Synthetic correction of deviant speech - children's perception of phonologically modified recordings of their own speech. Sofia Strömbergsson |
| 2012 | TDOA Estimation for Multiple Speakers in Underdetermined Case. Mariem Bouafif, Zied Lachiri |
| 2012 | Temporal and Situational Context Modeling for Improved Dominance Recognition in Meetings. Martin Wöllmer, Florian Eyben, Björn W. Schuller, Gerhard Rigoll |
| 2012 | Temporal entrainment in overlapped speech: Cross-linguistic study. Marcin Wlodarczak, Juraj Simko, Petra Wagner |
| 2012 | Text-To-Speech Intelligibility Across Speech Rates. Ann K. Syrdal, H. Timothy Bunnell, Susan R. Hertz, Taniya Mishra, Murray F. Spiegel, Corine A. Bickley, Deborah Rekart, Matthew J. Makashay |
| 2012 | Text-dependent pathological voice detection. Gopala Krishna Anumanchipalli, Hugo Meinedo, Miguel M. F. Bugalho, Isabel Trancoso, Luís C. Oliveira, Alan W. Black |
| 2012 | The 'Audio-Visual Face Cover Corpus': Investigations into audio-visual speech and speaker recognition when the speaker's face is occluded by facewear. Natalie Fecher |
| 2012 | The 2011 NIST Language Recognition Evaluation. Craig S. Greenberg, Alvin F. Martin, Mark A. Przybocki |
| 2012 | The Automatic Assessment of Non-native Prosody: Combining Classical Prosodic Analysis with Acoustic Modelling. Florian Hönig, Tobias Bocklet, Korbinian Riedhammer, Anton Batliner, Elmar Nöth |
| 2012 | The BLZ Submission to the NIST 2011 LRE: Data Collection, System Development and Performance. Luis Javier Rodríguez-Fuentes, Mikel Peñagarikano, Amparo Varona, Mireia Díez, Germán Bordel, Alberto Abad, David Martínez González, Jesús Antonio Villalba López, Alfonso Ortega, Eduardo Lleida |
| 2012 | The EHU Systems for the NIST 2011 Language Recognition Evaluation. Mikel Peñagarikano, Amparo Varona, Luis Javier Rodríguez-Fuentes, Mireia Díez, Germán Bordel |
| 2012 | The Effect of Spectral Estimator on Common Spectral Measures for Sibilant Fricatives. Patrick Reidy, Mary E. Beckman |
| 2012 | The Effect of Use of Drugs on Speaker's Fundamental Frequency and Formants. Andrey N. Raev, Yuri Matveev, Tatiana Goloshchapova |
| 2012 | The Effects of Lexical Tones and Nasal Coda /-n/ to Sadness in Taiwan Hakka. Shao-Ren Lyu |
| 2012 | The F0 fall delay of lexical pitch accent in Japanese Infant-directed speech. Yoko Saikachi, Mafuyu Kitahara, Ken'ya Nishikawa, Ai Kanato, Reiko Mazuka |
| 2012 | The IIIT-H Indic Speech Databases. Kishore Prahallad, Naresh Kumar Elluru, Venkatesh Keri, Rajendran S, Alan W. Black |
| 2012 | The INTERSPEECH 2012 Speaker Trait Challenge. Björn W. Schuller, Stefan Steidl, Anton Batliner, Elmar Nöth, Alessandro Vinciarelli, Felix Burkhardt, Rob van Son, Felix Weninger, Florian Eyben, Tobias Bocklet, Gelareh Mohammadi, Benjamin Weiss |
| 2012 | The Intelligibility of Lombard Speech: Communicative setting matters. Michael Fitzpatrick, Jeesun Kim, Chris Davis |
| 2012 | The Role of Creaky Voice in Mandarin Tone 2 and Tone 3 Perception. Rui Cao, Ratree Wayland, Edith Kaan |
| 2012 | The Role of Score Calibration in Speaker Recognition. George R. Doddington |
| 2012 | The Speech Recognition Virtual Kitchen: An Initial Prototype. Florian Metze, Eric Fosler-Lussier |
| 2012 | The Use of DBN-HMMs for Mispronunciation Detection and Diagnosis in L2 English to Support Computer-Aided Pronunciation Training. Xiaojun Qian, Helen M. Meng, Frank K. Soong |
| 2012 | The effect of dichotic processing on the perception of binaural cues. Akiko Amano-Kusumoto, Justin M. Aronoff, Motokuni Itoh, Sigfrid D. Soli |
| 2012 | The entropy of intoxicated speech - lexical creativity and heavy tongues. Uwe D. Reichel |
| 2012 | The log-Gabor method: speech classification using spectrogram image analysis. Harm Buisman, Eric O. Postma |
| 2012 | The processes underlying two frequent casual speech phenomena in Dutch: A production experiment. Iris Hanique, Mirjam Ernestus |
| 2012 | The production and perception of Estonian quantity degrees by native and non-native speakers. Lya Meister, Einar Meister |
| 2012 | Tied-State Mixture Language Model for WFST-based Speech Recognition. Hitoshi Yamamoto, Paul R. Dixon, Shigeki Matsuda, Chiori Hori, Hideki Kashioka |
| 2012 | Time Delay Estimation for Speech Signal Based on FOC-Spectrum. Hong Liu, Xiaofei Li |
| 2012 | Toward an Optimum Feature Set and HMM Model Parameters for Automatic Phonetic Alignment of Spontaneous Speech. Montri Karnjanadecha, Stephen A. Zahorian |
| 2012 | Towards Automated Annotation of Audio and Video Recordings by Application of Advanced Web-services. Przemyslaw Lenkiewicz, Dieter Van Uytvanck, Peter Wittenburg, Sebastian Drude |
| 2012 | Towards Empirical Dialog-State Modeling and its Use in Language Modeling. Nigel G. Ward, Alejandro Vega |
| 2012 | Towards Glottal Source Controllability in Expressive Speech Synthesis. Jaime Lorenzo-Trueba, Roberto Barra-Chicote, Tuomo Raitio, Nicolas Obin, Paavo Alku, Junichi Yamagishi, Juan Manuel Montero |
| 2012 | Towards Hierarchical Prosodic Prominence Generation in TTS Synthesis. Leonardo Badino, Robert A. J. Clark |
| 2012 | Towards Recurrent Neural Networks Language Models with Linguistic and Contextual Features. Yangyang Shi, Pascal Wiggers, Catholijn M. Jonker |
| 2012 | Towards an Unsupervised Speaking Style Voice Building Framework: Multi-Style Speaker Diarization. Jaime Lorenzo-Trueba, Beatriz Martínez-González, Roberto Barra-Chicote, Verónica López-Ludeña, Javier Ferreiros, Junichi Yamagishi, Juan Manuel Montero |
| 2012 | Training Deep Nets with Imbalanced and Unlabeled Data. Jeffrey Berry, Ian R. Fasel, Luciano Fadiga, Diana Archangeli |
| 2012 | Turning a Monolingual Speaker into Multilingual for a Mixed-language TTS. Ji He, Yao Qian, Frank K. Soong, Sheng Zhao |
| 2012 | Ultrax: An Animated Midsagittal Vocal Tract Display for Speech Therapy. Korin Richmond, Steve Renals |
| 2012 | Uncertainty driven Compensation of Multi-Stream MLP Acoustic Models for Robust ASR. Ramón Fernandez Astudillo, Alberto Abad, João Paulo Neto |
| 2012 | Unconstrained Speech Separation by Composition of Longest Segments. Ji Ming, Ramji Srinivasan, Danny Crookes |
| 2012 | Unsupervised Acoustic Analyses of Normal and Lombard Speech, with Spectral Envelope Transformation to Improve Intelligibility. Elizabeth Godoy, Yannis Stylianou |
| 2012 | Unsupervised Deep Belief Features for Speech Translation. Sameer Maskey, Bowen Zhou |
| 2012 | Unsupervised NAP Training Data Design for Speaker Recognition. Hanwu Sun, Bin Ma |
| 2012 | Unsupervised Speaker Identification using Overlaid Texts in TV Broadcast. Johann Poignant, Hervé Bredin, Viet Bac Le, Laurent Besacier, Claude Barras, Georges Quénot |
| 2012 | Unveiling the Acoustic Properties that Describe the Valence Dimension. Carlos Busso, Tauhidur Rahman |
| 2012 | Using Bayesian Networks to find relevant context features for HMM-based speech synthesis. Heng Lu, Simon King |
| 2012 | Using Blob Detection in Missing Feature Linear-Frequency Cepstral Coefficients for Robust Sound Event Recognition. Yi Ren Leng, Tran Huy Dat |
| 2012 | Using HMM-based Speech Synthesis to Reconstruct the Voice of Individuals with Degenerative Speech Disorders. Christophe Veaux, Junichi Yamagishi, Simon King |
| 2012 | Using Prominence and Phrasing Predictions to Improve Weighted Dictionary Pronunciation Models. Andrew Rosenberg |
| 2012 | Using Quality Ratings to Predict Modality Choice in Multimodal Systems. Ina Wechsung, Klaus-Peter Engelbrecht, Sebastian Möller |
| 2012 | Using Reinforcement Learning for Dialogue Management Policies: Towards Understanding MDP Violations and Convergence. Peter A. Heeman, Jordan Fryer, Rebecca Lunsford, Andrew Rueckert, Ethan Selfridge |
| 2012 | Using Sparse Classification Outputs as Feature Observations for Noise-robust ASR. Yang Sun, Bert Cranen, Jort F. Gemmeke, Louis ten Bosch, Lou Boves, Mathew M. Doss |
| 2012 | Using Sub-word-level Information for Confidence Estimation with Conditional Random Field Models. Matthew Stephen Seigel, Philip C. Woodland |
| 2012 | Using Time-Synchronous Phone Co-occurrences in a SVM-Phonotactic Dialect Recognition System. Amparo Varona, Mikel Peñagarikano, Luis Javier Rodríguez-Fuentes, Germán Bordel, Mireia Díez |
| 2012 | Using broad phonetic classes to guide search in automatic speech recognition. Stefan Ziegler, Bogdan Ludusan, Guillaume Gravier |
| 2012 | Using context-free grammars for embedded speech recognition with Weighted Finite-State Transducers. Frank Duckhorn, Rüdiger Hoffmann |
| 2012 | Using i-Vector Space Model for Emotion Recognition. Rui Xia, Yang Liu |
| 2012 | Using magnetic resonance to image the pharynx during Arabic speech: Static and dynamic aspects. Ryan Shosted, Bradley P. Sutton, Abbas Benmamoun |
| 2012 | Using spectral measures to differentiate Mandarin and Korean sibilant fricatives. Jeffrey Kallay, Jeffrey J. Holliday |
| 2012 | Utilization of the Lombard effect in post-filtering for intelligibility enhancement of telephone speech. Emma Jokinen, Paavo Alku, Martti Vainio |
| 2012 | Utilizing Markov Chain Monte Carlo (MCMC) Method for Improved Glottal Inverse Filtering. Harri Auvinen, Tuomo Raitio, Samuli Siltanen, Paavo Alku |
| 2012 | Verifying Session Level Pronunciation Accuracy in a Speech Therapy Application. Shou-Chun Yin, Richard C. Rose, Yun Tang |
| 2012 | VisArtico: a visualization tool for articulatory data. Slim Ouni, Loic Mangeonjean, Ingmar Steiner |
| 2012 | Visualizing tool for evaluating inter-label similarity in prosodic labeling experiments. David Escudero Mancebo, Eva Estebas-Vilaplana |
| 2012 | Vocal Tremor Measurement Based on Autocorrelation of Contours. Markus Brückl |
| 2012 | Vocal-Source Biomarkers for Depression: A Link to Psychomotor Activity. Thomas F. Quatieri, Nicolas Malyska |
| 2012 | Voice Activity Detection Using Speech Recognizer Feedback. Kit Thambiratnam, Weiwu Zhu, Frank Seide |
| 2012 | Voice Production Mechanisms of Vibrato in Noh. Ikuyo Yoshinaga, Jiangping Kong |
| 2012 | Voice Query Refinement. Cyril Allauzen, Edward Benson, Ciprian Chelba, Michael Riley, Johan Schalkwyk |
| 2012 | Voice source analysis using biomechanical modeling and glottal inverse filtering. Alan Pinheiro, Tuomo Raitio, Danyane Gomes, Paavo Alku |
| 2012 | Vowel Creation by Articulatory Control in HMM-based Parametric Speech Synthesis. Zhen-Hua Ling, Korin Richmond, Junichi Yamagishi |
| 2012 | Vowels Produced by Sliding Three-tube Model with Different Lengths. Takayuki Arai |
| 2012 | Ways to Implement Global Variance in Statistical Speech Synthesis. Hanna Silén, Elina Helander, Jani Nurminen, Moncef Gabbouj |
| 2012 | Where did I go wrong?: Identifying troublesome segments for speaker diarization systems. Mary Tai Knox, Nikki Mirghafori, Gerald Friedland |
| 2012 | Where to associate stressed additive particles? Evidence from speech prosody. Bettina Braun |
| 2012 | White Listing and Score Normalization for Keyword Spotting of Noisy Speech. Bing Zhang, Richard M. Schwartz, Stavros Tsakalidis, Long Nguyen, Spyros Matsoukas |
| 2012 | Whole-Word Recognition from Articulatory Movements for Silent Speech Interfaces. Jun Wang, Ashok Samal, Jordan R. Green, Frank Rudzicz |
| 2012 | Wideband Parametric Speech Synthesis Using Warped Linear Prediction. Tuomo Raitio, Antti Suni, Martti Vainio, Paavo Alku |
| 2012 | Word Discovery with Beta Process Factor Analysis. Niklas Vanhainen, Giampiero Salvi |
| 2012 | Word Prominence Detection using Robust yet Simple Prosodic Features. Taniya Mishra, Vivek Kumar Rangarajan Sridhar, Alistair Conkie |
| 2012 | Word Relevance Modeling for Speech Recognition. Kuan-Yu Chen, Hao-Chin Chang, Berlin Chen, Hsin-Min Wang |
| 2012 | sparse banded precision matrices for low resource speech recognition. Weibin Zhang, Pascale Fung |