Recurrent Neural Networks

Published on 2016-08-2342904 Views

Yoshua Bengio

This lecture will cover recurrent neural networks, the key ingredient in the deep learning toolbox for handling sequential computation and modelling sequences. It will start by explaining how gradient

Deep Learning Summer School 2016 - Montreal

Related categories

Presentation

Recurrent Neural Networks00:00

Recurrent Neural Networks - 101:04

Recurrent Neural Networks - 209:17

Generative RNNs09:23

Conditional Distributions11:10

Maximum Likelihood = Teacher Forcing13:23

Ideas to reduce the train/generate mismatch in teacher forcing17:32

Increasing the Expressive Power of RNNs with more Depth20:51

Bidirectional RNNs, Recursive Nets, Multidimensional RNNs, etc.23:24

Multiplicative Interactions27:18

Learning Long-Term Dependencies with Gradient Descent is Difficult28:04

Simple Experiments from 1991 while I was at MIT28:34

How to store 1 bit? 30:20

Robustly storing 1 bit in the presence of bounded noise34:19

Storing Reliably - Vanishing gradients35:09

Vanishing or Exploding Gradients36:44

Why it hurts gradient-based learning37:40

Vanishing Gradients in Deep Nets are Different from the Case in RNNs47:01

To store information robustly the dynamics must be contractive48:36

RNN Tricks48:41

Dealing with Gradient Explosion by Gradient Norm Clipping48:44

Conference version (1993) of the 1994 paper by the same authors51:50

Fighting the vanishing gradient: LSTM & GRU53:04

Delays & Hierarchies to Reach Farther54:50

Fast Forward 20 years: Attention Mechanisms for Memory Access57:25

Large Memory Networks: Sparse Access Memory for Long-Term Dependencies57:54

Designing the RNN Architecture58:47

It makes a difference01:00:36

Near-Orthogonality to Help Information Propagation01:00:43

Variational Generative RNNs01:01:31

Variational Hierarchical RNNs for Dialogue Generation01:03:28

Variational Hierarchical RNN Results - Twite 01:04:01

Other Fully-Observed Neural Directed Graphical Models01:04:02

Neural Auto-Regressive Models01:04:20

NADE: Neural AutoRegressive Density Estimator01:07:29

Pixel RNNs01:09:01

Forward Computation of the Gradient01:11:57

Montreal Institute for Learning Algorithms01:22:35