Recurrent Neural Networks
published: Aug. 23, 2016, recorded: August 2016, views: 5457
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
This lecture will cover recurrent neural networks, the key ingredient in the deep learning toolbox for handling sequential computation and modelling sequences. It will start by explaining how gradients can be computed (by considering the time-unfolded graph) and how different architectures can be designed to summarize a sequence, generate a sequence by ancestral sampling in a fully-observed directed model, or learn to map a vector to a sequence, a sequence to a sequence (of the same or different length) or a sequence to a vector. The issue of long-term dependencies, why it arises, and what has been proposed to alleviate it will be core subject of the discussion in this lecture. This includes changes in the architecture and initialization, as well as how to properly characterize the architecture in terms of recurrent or feedforward depth and its ability to create shortcuts or fast propagation of gradients in the unfolded graph. Open questions regarding the limitations of training by maximum likelihood (teacher forcing) and ideas towards towards making learning online (not requiring backprop through time) will also be discussed.
Download slides: deeplearning2016_bengio_neural_networks_01.pdf (23.4 MB)
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !