BIO: Hynek Hermansky is the Julian S. Smith Professor of the Electrical Engineering and the Director of Centre for Language and Speech Processing at the Johns Hopkins University in Baltimore, Maryland, and a Research Professor at the Brno University of Technology, Czech Republic. His main research interests are in bio-inspired speech processing. He has been working in speech research for over 30 years, previously as a Director of Research at the IDIAP Research Institute, Martigny and a Titular Professor at the Swiss Federal Institute of Technology in Lausanne, Switzerland, a Professor and Director of the Center for Information Processing at OHSU Portland, Oregon, a Senior Member of Research Staff at U S WEST Advanced Technologies in Boulder, Colorado, a Research Engineer at Panasonic Technologies in Santa Barbara, California, a Research Fellow at the University of Tokyo, and an Assistant Professor at the Brno University of Technology, Czech Republic. He is a Fellow of IEEE, a Fellow of International Speech Communication, and an External Fellow of the International Computer Science Institute in Berkeley, California. He is also the holder of the 2013 International Speech Communication Association Medal for Scientific Achievement, is a Member of The Board of the International Speech Communication Association, and a Member of the Editorial Board of Speech Communication. He was the General Chair of the 2013 ICASSP Workshop on Automatic Speech Recognition and Understanding, a Member of the Organizing Committee at the 2011 ICASSP in Prague, Technical Chair at the 1998 ICASSP in Seattle and an Associate Editor for IEEE Transaction on Speech and Audio. He holds 10 US patents and authored or co-authored over 250 papers in reviewed journals and conference proceedings. Prof. Hermansky holds Dr.Eng. degree from the University of Tokyo, and Dipl. Ing. degree from Brno University of Technology, Czech Republic,
ABSTRACT: There has been a great deal of work over the last decade on problems where features are assumed to lie on a low dimensional manifold and where this manifold can be approximated using an adjacency graph computed from available data. The data characterizations obtained from these manifold based techniques have been applied to feature space dimensionality reduction, data visualization, classifier regularization, and semi-supervised learning. This presentation investigates how these approaches can be applied to acoustic modeling in speech processing. Feature representations derived from speech are generally assumed to lie on a possibly nonlinear low dimensional submanifold of the ambient parameter space. We will demonstrate how preserving local neighborhood relationships among speech vectors under various transformations has the effect of improving overall performance in automatic speech recognition (ASR) and spoken term detection (STD) applications.
After introducing this class of manifold based acoustic modeling techniques, several applications will be presented. First, manifold based discriminative feature space dimensionality reduction for ASR is presented. It is described as a technique for maximizing interclass separability while at the same time preserving local neighborhood relationships. Second, manifold regularization of deep network training is presented as a means for improving the performance of DNN based acoustic models for ASR. Finally, an approach is presented for semi-supervised manifold learning based verification of spoken term hypotheses in spoken term detection using graph spectral clustering. The performance of all of these techniques is evaluated using publicly available speech corpora taken from meetings, read newspaper, and connected digit utterance task domains.
BIO: Richard Rose is an Associate Professor of Electrical and Computer Engineering (ECE) at McGill University in Montreal, Quebec and a Research Scientist at Google in New York, NY. He served as Associate Chair and Graduate Program Director of ECE at McGill from 2012 through 2014. His major area of research is in speech and language processing. Over his career, he has published over 150 articles in refereed international journals and conference proceedings. Before coming to McGill in 2004, Prof. Rose was a senior member of technical staff at AT&T Labs Research where he contributed to AT&T's speech enabled services. Prof. Rose is an IEEE Fellow. His professional service has included General Chair of the IEEE Automatic Speech Recognition and Understanding Workshop, membership in the IEEE Speech Technical Committee, elected membership on the IEEE Signal Processing Society Board of Governors, and associate editorships of the IEEE Transactions on Speech and Audio Processing and the IEEE Transactions on Audio, Speech, and Language Processing