From Language Modelling to Machine Translation

Published on 2015-09-136999 Views

Phil Blunsom

Deep Learning Summer School 2015 - Montreal

Related categories

Presentation

From Language Modelling to Machine Translation00:00

Language models: the traditional view - 100:36

Language models: the traditional view - 202:21

History: cryptography04:23

N-gram language models05:28

The Traditional Markov Chain06:30

Estimating N-Gram Probabilities07:30

How good is a LM?08:15

Comparison 1–4-Gram12:00

Unseen N-Grams12:56

Add-One Smoothing13:39

Add-α Smoothing14:37

Example: 2-Grams in Europarl14:54

Good-Turing Smoothing16:33

Good-Turing for 2-Grams in Europarl18:06

Back-Off - 118:56

Back-Off - 219:53

Back-Off with Good-Turing Smoothing20:59

Diversity of Predicted Words21:47

Diversity of Histories22:56

Evaluation23:57

Provisional Summary - 126:01

Provisional Summary - 226:45

Neural language models28:33

Log-linear models for classiﬁcation29:11

A simple log-linear (tri-gram) language model - 130:23

A simple log-linear (tri-gram) language model - 231:56

Learning the features: the log-bilinear language model - 132:24

Learning the features: the log-bilinear language model - 233:14

Learning the features: the log-bilinear language model - 333:58

Learning the features: the log-bilinear language model - 434:22

Learning the features: the log-bilinear language model - 534:33

Learning the features: the log-bilinear language model - 634:48

Adding non-linearities: the neural language model - 235:28

Inﬁnite context: a recurrent neural language model - 135:47

Inﬁnite context: a recurrent neural language model - 237:35

Inﬁnite context: a recurrent neural language model - 337:43

LSTM LM39:01

Inﬁnite context: a recurrent neural language model41:33

Deep LSTM LM - 141:45

Deep LSTM LM - 242:27

Deep LSTM LM - 342:31

Efficiency - 144:14

Efficiency - 246:03

Efficiency - 347:23

Efficiency - 448:51

Efficiency - 549:35

Efficiency - 650:08

Comparison with traditional n-gram LMs50:20

Learning better representations for rich morphology - 151:32

Learning better representations for rich morphology - 251:36

Learning representations directly51:36

Intro to MT51:37

Intro to MT: Language Divergence52:39

Models of translation53:29

MT History - 154:02

MT History - 254:57

Parallel Corpora - 155:40

Parallel Corpora - 256:35

MT History: Statistical MT at IBM - 157:38

MT History: Statistical MT at IBM - 258:26

IBM Model 1: The ﬁrst translation attention model! 01:00:46

Models of translation - 101:02:27

Models of translation - 201:02:45

Models of translation - 301:02:47

Models of translation - 401:02:53

Models of translation - 501:02:56

Models of translation - 601:03:21

Encoder-Decoders - 101:03:54

Encoder-Decoders - 201:05:00

Encoder-Decoders: A naive additive model - 101:06:13

Encoder-Decoders: A naive additive model - 201:06:52

Encoder-Decoders: A naive additive model - 301:06:53

Encoder-Decoders: A naive additive model - 401:06:55

Encoder-Decoders: A naive additive model - 501:07:11

Encoder-Decoders: A naive additive model - 601:07:28

Encoder-Decoders: A naive additive model - 701:07:33

Encoder-Decoders: A naive additive model - 901:07:35

Encoder-Decoders: A naive additive model - 1001:07:42

Encoder-Decoders: A naive additive model - 1101:07:49

Encoder-Decoders: A naive additive model - 1201:07:56

Encoder-Decoders: A naive additive model - 1301:08:05

Recurrent Encoder-Decoders for MT - 101:10:05

Recurrent Encoder-Decoders for MT - 201:11:49

Recurrent Encoder-Decoders for MT - 301:12:33

Attention Models for MT - 101:13:46

Attention Models for MT - 201:14:41

Attention Models for MT - 301:16:34

Attention Models for MT - 401:16:44

Attention Models for MT - 501:16:48

Attention Models for MT - 601:16:57

Montreal WMT Bleu Scores01:18:40

Issues, advantages and the future of MT01:25:22