Beyond Seq2Seq with Augmented RNNs

Published on 2016-08-2320835 Views

Edward Grefenstette

Sequence to sequence models in their most basic form, following an encoder-decoder paradigm, compressively encode source sequence representations into a single vector representation and decode this re

Deep Learning Summer School 2016 - Montreal

Related categories

Presentation

Beyond Sequence to Sequence with Augmented RNNs00:00

The plan02:22

The Bottleneck04:06

Some Preliminaries: RNNs - 104:06

Some Preliminaries: RNNs - 204:50

Some Obvious RNN Applications05:27

Transduction with Conditional Models - 106:40

Transduction with Conditional Models - 207:50

Sequence to Sequence Mapping with RNNs - 107:57

Sequence to Sequence Mapping with RNNs - 208:41

Sequence to Sequence Mapping with RNNs - 308:54

A Simple Encoder-Decoder Model09:34

Deep LSTMs for Translation09:45

Learning to Execute10:37

The Bottleneck for Simple RNNs11:30

Limitations of RNNs: A Computational Perspective14:22

RNNs and Turing Machines - 116:53

RNNs and Turing Machines - 219:07

RNNs and Finite State Machines - 124:22

RNNs and Finite State Machines - 229:21

Why more than FSM?30:36

Questions? - 133:30

RNNs Revisited33:41

RNNs: More API than Model - 134:11

RNNs: More API than Model - 234:36

RNNs: More API than Model - 335:00

RNNs: More API than Model - 435:08

RNNs: More API than Model - 535:26

RNNs: More API than Model - 636:25

The Controller-Memory Split - 136:55

The Controller-Memory Split - 237:54

Attention: ROM38:25

Attention38:33

Attention (Early Fusion)39:25

Attention (Late Fusion)41:47

ROM for Encoder-Decoder Models - 142:49

RNN: X ⨉ P → Y ⨉ N44:36

ROM for Encoder-Decoder Models - 244:50

Skipping the bottleneck - 144:51

Skipping the bottleneck - 245:07

Recognizing Textual Entailment (RTE)45:41

Word-by-Word Attention46:40

Girl + Boy = Kids47:21

Large-scale Supervised Reading Comprehension48:15

Machine Reading with Attention48:58

Example QA Heatmap49:23

Attention Summary49:58

Questions? - 251:45

Untitled52:46

The Controller-Memory Split52:48

Controlling a Neural Stack53:15

Controller API53:45

Controller + Stack Interaction54:49

Example: A Continuous Stack57:55

Synthetic Transduction Tasks01:03:07

Synthetic ITG Transduction Tasks01:03:53

Rapid Convergence01:11:54

Differentiable Stacks / Queues / Etc01:12:10

Results01:13:03

Neural PDA Summary01:13:44

Computational Hierarchy01:15:59

Attention as ROM01:16:10

Neural RAM: General Idea01:16:44

RNN: X ⨉ P → Y ⨉ N (an example of)01:19:39

Extensions01:20:33

Relation to actual Turing Machines01:20:35

Conclusions01:23:31

Thanks for listening!01:25:23