en-de
en-es
en-fr
en-sl
en
en-zh
0.25
0.5
0.75
1.25
1.5
1.75
2
Learning About Sensorimotor Data
Published on Jan 25, 20125926 Views
Temporal-difference (TD) learning of reward predictions underlies both reinforcement-learning algorithms and the standard dopamine model of reward-based learning in the brain. This confluence of compu
Related categories
Chapter list
Learning About Sensorimotor Data00:00
Outline01:19
Intelligence - 101:44
Intelligence - 202:15
Examples of stuff to know03:41
The Sensorimotor View04:10
It’s hard to implement the Sensorimotor View well05:42
Robot experiments07:42
The iRobot Create07:46
“Wall ahead” is a sensorimotor fact08:41
Predicting: Will rolling forward soon result in a bump?10:20
Predicting right and left bumps12:29
Strategy13:37
Temporal-difference (TD) learning15:51
TD Learning in Engineering and Biology17:17
TD is in no way specific to reward19:02
The Horde Architecture19:32
The Critterbot21:29
Infrared-sensor data and predictions22:19
Scaling up - 123:28
Scaling up -224:28
Learning is fast enough26:26
Conclusions from robot experiments27:03
The Horde-of-demons architecture28:02
The Horde Architecture28:03
Inside a GTD(λ) Demon28:16
General value functions as a language for multi-step predictive questions - 131:21
General value functions as a language for multi-step predictive questions - 232:11
General value functions as a language for multi-step predictive questions - 333:36
General value functions as a language for multi-step predictive questions - 435:26
General value functions - Fundamental or idiosyncratic?35:57
Remarks on gradient-TD algorithms39:14
TD with FA39:24
TD and GD: Headlines40:30
TD(0) can diverge: A simple example41:38
TD with FA: Non - GD solutions?41:44
The Gradient -TD Family43:01
Gradient -TD convergence theorem43:34
TD vs Gradient -TD43:47
My message in one sentence43:49
Further frontiers44:41
Thank you for your attention44:45