2016 Proceedings Online Now >>

Save the Date!! Join us again next year  July 3, 2017

Keynote Speakers

Li Deng,
Research Manager and Chief Scientist of AI,
Microsoft Research, USA
TITLE: Deep Learning Revolution: Speech Recognition and Beyond

Abstract: Deep learning has profoundly reshaped the landscape of speech recognition (since 2010) and image understanding (since 2012), two major areas of artificial intelligence. Since about two years ago, this rapid progress in machine perception has advanced towards a number of more challenging areas of artificial intelligence. These new areas are central to the cognitive functions in human intelligence, including natural language, reasoning, attention, memory, knowledge, action, and decision making, many of which involve the analysis of sequential signals and of other forms of structured information expressed as symbolic entities and their relations. This keynote lecture will provide an overview of the recent history and current status of deep learning research, as well as its industrial deployment, in speech recognition and other select areas of machine perception and cognition. The lecture will end with a discussion on the challenges and directions for the future development of deep learning aiming to reach brain-like competence. One future direction comprises ways to exploit millions of hours of unlabeled audio data for unsupervised speech recognition. This novel type of unsupervised learning is enabled by the use of ultra-strong priors on the label sequences in a data-dependent manner via utterance-level adaptation. The technique includes the current fully-supervised deep speech recognition, based on cross-entropy or maximal mutual information learning with paired audio-label data, as a special case, where the data-dependent delta distribution on the labels is replaced by the smooth prior distribution using strong language models trained without acoustic data. The same unsupervised learning principle can be generalized to other machine perception and cognition tasks.

Bio: Li Deng received a Ph.D. from the University of Wisconsin-Madison. He was an assistant and then tenured full professor at the University of Waterloo, Ontario, Canada during 1989-1999. Immediately afterward he joined Microsoft Research, Redmond, USA as a Principal Researcher, where he currently directs the R&D of its Deep Learning Technology Center he founded in early 2014. Dr. Deng’s current activities are centered on business-critical applications involving big data analytics, natural language text, semantic modeling, speech, image, and multimodal signals. Outside his main responsibilities, Dr. Deng’s research interests lie in solving fundamental problems of machine learning, artificial and human intelligence, cognitive and neural computation with their biological connections, and multimodal signal/information processing. In addition to over 70 granted patents and over 300 scientific publications in leading journals and conferences, Dr. Deng has authored or co-authored 5 books including 2 latest books: Deep Learning: Methods and Applications (NOW Publishers, 2014) and Automatic Speech Recognition: A Deep-Learning Approach (Springer, 2015), both with English and Chinese editions. Dr. Deng is a Fellow of the IEEE, the Acoustical Society of America, and the ISCA. He served on the Board of Governors of the IEEE Signal Processing Society. More recently, he was the Editor-In-Chief for the IEEE Signal Processing Magazine and for the IEEE/ACM Transactions on Audio, Speech, and Language Processing; he also served as a general chair of ICASSP and area chair of NIPS. Dr. Deng’s technical work in industry-scale deep learning and AI has impacted various areas of information processing, especially Microsoft speech products and text- and big-data related products/services. His work helped initiate the resurgence of (deep) neural networks in the modern big-data, big-compute era, and has been recognized by several awards, including the 2013 IEEE SPS Best Paper Award and the 2015 IEEE SPS Technical Achievement Award “for outstanding contributions to deep learning and to automatic speech recognition.”

Haizhou Li,
 Research Director, Institute for Infocomm Research (I2R), A*STAR, Singapore

TITLE: Speech Synthesis Perfects Everyone’s Singing

Abstract: Singing is more expressive than speaking. While singing is popular, singing well is nontrivial. This is especially true for songs that require high vocal skills. A singer needs to overcome two challenges among others - to sing in the right tune and at the correct rhythm. Even professional singers need intensive practice to perfect their vocal skills and to proficiently present particular singing styles, such as vibrato and resonance tuning. In this talk, we will discuss the basics of Singing Synthesis and I2R’s Speech2Singing system that converts speech into perfect singing. Similar to Photoshop that perfects graphics, Speech2Singing helps perfect singing vocals.

Bio: Haizhou Li  received his B.Sc., M.Sc., and Ph.D degrees in electrical and electronic engineering from South China University of Technology, Guangzhou, China in 1984, 1987, and 1990 respectively. Prof. Li is currently the Research Director of the Institute for Infocomm Research (I2R), and an adjunct Professor at the National University of Singapore.

Industry Perspective

Jeff Adams,
 CEO & Founder of Cobalt Speech and Language

TITLE: How Market-driven Speech Research Led to a Breakthrough in Noisy ASR Read more...

Abstract: When Cobalt was founded in 2014, we intentionally did not choose a particular area of speech technology to specialize in.  Instead, we wanted to let our (potential) customers tell us what they needed, and let the market take us where it wanted us.  Over the last 2 years, we have learned a lot about what the market wants.  In some cases it has been predictable, and in other cases it has been fairly surprising.  In this presentation, I will discuss one of those novel applications in some detail, where we were asked to develop a 2-channel ASR model to incorporate the VocalZoom optical microphone, in addition to a standard acoustic channel.  I will show how we built a hybrid DNN recognizer that reduced WER in noisy conditions by nearly 60% relative.

Bio: Jeff Adams has been managing top-level speech & language technology research for more than 20 years, at Kurzweil AI, Nuance / Dragon, Yap, and Amazon, before founding Cobalt Speech & Language.  He led the teams that developed the core technology for Dragon NaturallySpeaking, Yap Voicemail, and Amazon Echo. Now, Cobalt’s team of elite speech scientists & engineers build custom applications to meet the needs of their clients.  Jeff is the author of 25 patents and several published research papers.

Satellite Event | Unsupervised Learning Seminar Keynote

Christian Hennig,
 Department of Statistical Science, University College London

TITLE: Cluster Validation: How to Think and What to Do? Read more...

Abstract: Cluster analysis is about finding groups in data. There are many cluster analysis methods and on most datasets clusterings from different methods will not agree. Cluster validation concerns the evaluation of the quality of a clustering. This is often used for comparing different clusterings on a dataset, stemming from different methods or with different parameters such as the number of clusters.

There are many aspects of cluster validity. Some of these aspects are mostly informal, such as the question whether a clustering makes substantive sense, and the visual evaluation of a clustering. There are also various measurements for cluster validity. Often these are used in such a way that the validity of the whole clustering is measured by a single number. But the quality of a clustering is rather multivariate; within-cluster homogeneity, between-cluster separation, representation of cluster members by a centroid object or stability could be measured, and what is most important depends on the aim of clustering.

In this presentation I will give an overview of techniques for cluster validation particularly focusing on a number of new measurements of different aspects of cluster validity. I will also discuss the issue what the "true clusters" are that we want to find and how this depends on the specific application and the aims and concepts of the researcher, so that these can be connected to specific techniques for cluster validation.

Bio: Christian Hennig is Senior Lecturer at the Department of Statistical Science, University College London. Previous affiliations were the Seminar fuer Statistik, ETH Zuerich and the Faculty of Mathematics, University of Hamburg. He is currently Secretary on the International Federation of Classification Societies. His main research interests are cluster analysis, philosophy of statistics, robust statistics, multivariate analysis, data visualization, and model selection. His work covers theoretical and applied statistics as well as the foundations and the philosophical background of statistics. He has published more than 50 papers in journals including the Annals of Statistics, the Journal of the Royal Statistical Society, Journal of the American Statistical Association and Foundations of Science. He was invited speaker and session organiser on many conferences including the ISI 2015 in Rio de Janeiro and the IFCS 2006 in Ljubljana. He is Associate Editor of four journals in the area of statistics and data analysis. He gave statistical advice to more than 100 clients from a wide range of application areas.

Platinum Sponsor

Vocal Zoom


Silver Sponsor



Satellite Event Sponsor



Professional Association Sponsor



In Collaboration with



                                               ACLP - Afeka Center for Language Processing