On Architectural Issues of Neural Networks in Speech Recognition
published: July 31, 2016, recorded: July 2016, views: 1494
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Recently, artificial neural networks (ANN) were able to improve the performance of speech recognition systems dramatically. There have been more than 25 years of extensive research on neural networks in speech recognition. Despite this huge effort, there are a number of open issues concerning the architecure of ANN based systems for speech recognition. Examples of such issues are: 1) Unlike the hybrid approach of replacing the emission probability function by an ANN, there is the possibility of a direct approach that models the posterior state sequence of (phonetic) labels directly without using the generative concepts of classicial hidden Markov models (HMM). 2) In the CTC approach (connectionist temporal classification), the HMM is simplified by using a single label per phoneme (or character in handwriting recognition) only. The CTC training criterion is the sum over all possible posterior distributions of label sequences. 3) Recently there have been so-called attention based approaches that replace the conventional HMM formalism by a recurrent neural network. In these three cases, we are faced with the questions of how these ANN based approaches compare with the conventional discriminative framework of hybrid HMMs. We will discuss the advantages and disadvantages of these approaches in more detail and compare them with conventional hybrid HMMs.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !