Learning About Sensorimotor Data

Published on 2012-01-255950 Views

Richard S. Sutton

Temporal-difference (TD) learning of reward predictions underlies both reinforcement-learning algorithms and the standard dopamine model of reward-based learning in the brain. This confluence of compu

NIPS Conference 2011 - Granada

Related categories

Presentation

Learning About Sensorimotor Data00:00

Outline01:19

Intelligence - 101:44

Intelligence - 202:15

Examples of stuff to know03:41

The Sensorimotor View04:10

It’s hard to implement the Sensorimotor View well05:42

Robot experiments07:42

The iRobot Create07:46

“Wall ahead” is a sensorimotor fact08:41

Predicting: Will rolling forward soon result in a bump?10:20

Predicting right and left bumps12:29

Strategy13:37

Temporal-difference (TD) learning15:51

TD Learning in Engineering and Biology17:17

TD is in no way specific to reward19:02

The Horde Architecture19:32

The Critterbot21:29

Infrared-sensor data and predictions22:19

Scaling up - 123:28

Scaling up -224:28

Learning is fast enough26:26

Conclusions from robot experiments27:03

The Horde-of-demons architecture28:02

Inside a GTD(λ) Demon28:16

General value functions as a language for multi-step predictive questions - 131:21

General value functions as a language for multi-step predictive questions - 232:11

General value functions as a language for multi-step predictive questions - 333:36

General value functions as a language for multi-step predictive questions - 435:26

General value functions - Fundamental or idiosyncratic?35:57

Remarks on gradient-TD algorithms39:14

TD with FA39:24

TD and GD: Headlines40:30

TD(0) can diverge: A simple example41:38

TD with FA: Non - GD solutions?41:44

The Gradient -TD Family43:01

Gradient -TD convergence theorem43:34

TD vs Gradient -TD43:47

My message in one sentence43:49

Further frontiers44:41

Thank you for your attention44:45