Autonomous Exploration in Reinforcement Learning

Published on 2012-01-254428 Views

Peter Auer

One of the striking differences between current reinforcement learning algorithms and early human learning is that animals and infants appear to explore their environments with autonomous purpose, in

New Frontiers in Model Order Selection

Related categories

Reinforcement Learning

Presentation

Autonomous Exploration in Reinforcement Learning00:00

Motivation00:18

Evaluation of autonomous exploration algorithms - 0101:54

Evaluation of autonomous exploration algorithms - 0202:16

Evaluation of autonomous exploration algorithms - 0303:24

Evaluation of autonomous exploration algorithms - 0405:55

Learning to navigate06:48

Reaching all states that are reachable in L steps - 0109:15

Reaching all states that are reachable in L steps - 0209:19

Excluding unreachable states - 0110:19

Excluding unreachable states - 0211:02

Excluding unreachable states (counterexample)11:10

Excluding intermediate states - 0113:53

Excluding intermediate states - 0214:44

Reinforcement learning - 0118:54

Reinforcement learning - 0219:14

Discounted and undiscounted rewards - 0119:29

Discounted and undiscounted rewards - 0220:00

PAC-MDP bounds for discounted rewards - 0120:42

PAC-MDP bounds for discounted rewards - 0221:33

Regret bounds21:40

PAC-MDP bounds from regret bounds23:34

Optimistic policies for regret bounds in RL - 0124:55

Optimistic policies for regret bounds in RL - 0226:00

Intuition about optimistic policies26:28

Consistent MDPs - 0127:07

Consistent MDPs - 0228:10

Main quantities in the proof - 0129:22

Main quantities in the proof - 0230:55

Summing over episodes (discounted UCRL)32:15

Optimistic algorithm for autonomous exploration32:18

Analysis (1)33:52

Analysis (2)35:00

Analysis (3): Consistent MDPs36:03

Summary36:05

(Why) is autonomous exploration useful?37:26