en-es
en-fr
en-sl
en
0.25
0.5
0.75
1.25
1.5
1.75
2
Recurrent Neural Networks (RNNs)
Published on Jul 27, 201721336 Views
Related categories
Chapter list
Recurrent Neural Networks00:00
Recurrent Neural Networks - 100:02
Recurrent Neural Networks - 210:10
Generative RNNs11:54
Conditional Distributions12:28
Maximum Likelihood = Teacher Forcing15:25
Ideas to reduce the train/generate mismatch in teacher forcing18:15
Multiplicative Interactions26:02
Bidirectional RNNs, Recursive Nets, Multidimensional RNNs, etc.26:38
Increasing the Expressive Power of RNNs with more Depth26:38
Learning Long-Term Dependencies with Gradient Descent is Difficult27:27
Simple Experiments from 1991 while I was at MIT27:54
Robustly storing 1 bit in the presence of bounded noise30:45
Storing Reliably - Vanishing gradients32:24
Vanishing or Exploding Gradients34:01
Why it hurts gradient-based learning34:46
Vanishing Gradients in Deep Nets are Different from the Case in RNNs38:18
To store information robustly the dynamics must be contractive39:49
RNN Tricks40:44
How to store 1 bit?42:00
Dealing with Gradient Explosion by Gradient Norm Clipping42:41
Conference version (1993) of the 1994 paper by the same authors44:11
Fighting the vanishing gradient: LSTM & GRU49:12
Fast Forward 20 years: Attention Mechanisms for Memory Access52:06
Large Memory Networks: Sparse Access Memory for Long-Term Dependencies54:32
Attention Mechanism for Deep Learning56:39
End-to-End Machine Translation with Recurrent Nets and Attention Mechanism01:00:18
Google-Scale NMT Success01:00:29
Pointing the Unknown Words01:02:33
It makes a difference01:06:00
Designing the RNN Architecture01:06:24
Near-Orthogonality to Help Information Propagation01:06:41
Variational Generative RNNs01:07:20
Variational Hierarchical RNNs for Dialogue Generation01:08:01
VHRNN Results – Twitter Dialogues01:08:22
Other Fully-Observed Neural Directed Graphical Models01:08:23
Neural Auto-Regressive Models01:08:24
NADE: Neural AutoRegressive Density Estimator01:11:36
Pixel RNNs01:12:19
Forward Computation of the Gradient01:12:53
Delays & Hierarchies to Reach Farther01:24:50