Thompson Sampling for Learning Parameterized Markov Decision Processes thumbnail
Pause
Mute
Subtitles not available
Playback speed
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Full screen

Thompson Sampling for Learning Parameterized Markov Decision Processes

Published on Aug 20, 20151726 Views

We consider reinforcement learning in parameterized Markov Decision Processes (MDPs), where the parameterization may induce correlation across transition probabilities or rewards. Consequently, obser

Related categories

Chapter list

Thompson Sampling for Learning Parameterized Markov Decision Processes00:00
Online Reinforcement Learning - 100:02
Online Reinforcement Learning - 200:15
Online Reinforcement Learning - 300:19
Online Reinforcement Learning - 400:28
Online Reinforcement Learning - 500:34
Online Reinforcement Learning - 600:38
Online Reinforcement Learning - 700:42
Online Reinforcement Learning - 800:44
Online Reinforcement Learning - 900:45
Online Reinforcement Learning - 1000:47
Online Reinforcement Learning - 1100:48
Online Reinforcement Learning - 1200:49
Online Reinforcement Learning - 1300:50
Online Reinforcement Learning - 1400:51
Online Reinforcement Learning - 1500:51
Online Reinforcement Learning - 1600:53
Online Reinforcement Learning - 1700:54
Online Reinforcement Learning - 1800:54
Online Reinforcement Learning - 1900:55
Online Reinforcement Learning - 2000:56
Thompson Sampling [Thompson 1933] - 101:36
Thompson Sampling - 101:57
Thompson Sampling - 202:01
Thompson Sampling - 302:16
Main Result - 102:27
Main Result - 202:53