Direct Policy Ranking with Robot Data Streams thumbnail
Pause
Mute
Subtitles
Playback speed
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Full screen

Direct Policy Ranking with Robot Data Streams

Published on Nov 30, 20113468 Views

Many machine learning approaches in robotics, based on reinforcement learning, inverse optimal control or direct policy learning, critically rely on robot simulators. This paper investigates a simulat

Related categories

Chapter list

Preference-based Policy Learning00:00
Setting00:28
Motivations01:33
State of art - 102:08
Issues in RL 03:02
State of art - 203:49
Issues in IRL04:20
Preference-based Policy Learning04:32
Outline - 105:15
Policy Return Estimate - 105:37
Policy Return Estimate - 206:21
Behavioral representation07:23
Exploration/Exploitation08:27
Self-training09:24
Preference-based Policy Learning PPL Algorithm10:38
Outline - 211:16
Experimental goal and setting11:20
The maze problem13:05
Synchronized exploration14:07
Outline - 314:56
Preference Policy Learning14:59
Future work15:44
References16:52