A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning

Published on 2011-05-066192 Views

Stephane Ross

Sequential prediction problems such as imitation learning, where future observations depend on previous predictions (actions), violate the common i.i.d. assumptions made in statistical learning. T

AISTATS 2011 - Ft. Lauderdale

Related categories

On-line Learning

Presentation

Reduction of Imitation Learning to No-Regret Online Learning00:00

Imitation Learning (1)00:14

Imitation Learning (2)00:54

Example Scenario01:20

Supervised Training Procedure02:04

Poor Performance in Practice02:45

# Mistakes Grows Quadratically in T!04:09

Reduction-Based Approach & Analysis05:29

Previous Work: Forward Training06:37

Previous Work: SMILe08:27

DAgger: Dataset Aggregation (1)09:53

DAgger: Dataset Aggregation (2)10:06

DAgger: Dataset Aggregation (3)10:09

DAgger: Dataset Aggregation (4)10:18

DAgger: Dataset Aggregation (5)10:50

DAgger: Dataset Aggregation (6)10:56

DAgger: Dataset Aggregation (7)11:01

DAgger: Dataset Aggregation (8)11:13

DAgger: Dataset Aggregation (9)11:22

Online Learning (1)11:55

Online Learning (2)12:14

Online Learning (3)12:21

Online Learning (4)12:24

Online Learning (5)12:28

Online Learning (6)12:54

DAgger as Online Learning (1)13:37

DAgger as Online Learning (2)14:08

DAgger as Online Learning (3)14:45

Theoretical Guarantees of DAgger (1)14:58

Theoretical Guarantees of DAgger (2)15:25

Theoretical Guarantees of DAgger (3)16:01

Theoretical Guarantees of DAgger (4)16:44

Experiments: 3D Racing Game17:11

DAggerTest-Time Execution17:26

Average Falls/Lap18:08

Experiments: Super Mario Bros18:30

Test-Time Execution (missing video)19:23

Average Distance/Stage20:18

Conclusion20:57

Questions21:53