In this book, we introduce the background and mainstream methods of probabilistic modeling and discriminative parameter optimization for speech recognition. The specific models treated in depth include the widely used exponential-family distributions and the hidden Markov model. A detailed study is presented on unifying the common objective functions for discriminative learning in speech recognition, namely maximum mutual information (MMI), minimum classification error, and minimum phone/word error. The unification is presented, with rigorous mathematical analysis, in a common rational-function form. This common form enables the use of the growth transformation (or extended Baum-Welch) optimization framework in discriminative learning of model parameters. In addition to all the necessary introduction of the background and tutorial material on the subject, we also included technical details on the derivation of the parameter optimization formulas for exponential-family distributions, discrete hidden Markov models (HMMs), and continuous-density HMMs in discriminative learning. Selected experimental results obtained by the authors in firsthand are presented to show that discriminative learning can lead to superior speech recognition performance over conventional parameter learning. Details on major algorithmic implementation issues with practical significance are provided to enable the practitioners to directly reproduce the theory in the earlier part of the book into engineering practice. Table of Contents: Introduction and Background / Statistical Speech Recognition: A Tutorial / Discriminative Learning: A Unified Objective Function / Discriminative Learning Algorithm for Exponential-Family Distributions / Discriminative Learning Algorithm for Hidden Markov Model / Practical Implementation of Discriminative Learning / Selected Experimental Results / Epilogue / Major Symbols Used in the Book and Their Descriptions / Mathematical Notation / Bibliography
Autorentext
Xiaodong He received his bachelor's degree from Tsinghua University, Beijing, China, in 1996, and earned his master's degree from the Chinese Academy of Sciences in 1999, and his doctoral degree from the University of Missouri-Columbia in 2003. He joined the Speech and Natural Language group of Microsoft in 2003, and the Natural Language Processing group of Microsoft Research, Redmond, WA, in 2006, where he currently serves as researcher. His research areas include statistical machine learning, automatic speech recognition, natural language processing, machine translation, signal processing, nonnative speech processing, and human-computer interaction. In these areas, he has authored/coauthored more than 30 refereed papers in leading international conferences and journals. He has filed more than 10 U.S. or international patents in the areas of speech recognition, language processing, and machine translation. He served as a reviewer for major conferences and journals in the areas of speech recognition, natural language processing, signal processing, and pattern recognition. He also served on program committees of various conferences in these areas. He is a member of ACL, IEEE, ISCA, and Sigma Xi.