Using upper confidence bounds to control exploration and exploitation thumbnail
Pause
Mute
Subtitles
Playback speed
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Full screen

Using upper confidence bounds to control exploration and exploitation

Published on Feb 25, 20076482 Views

Related categories

Chapter list

Using upper confidence bounds to control exploration and exploitation00:01
Contents00:09
Exploration vs. Exploitation01:15
Exploration vs. Exploitation: Some Applications02:08
Bandit Problems – “Optimism in the Face Uncertainty”03:06
Parametric Bandits [Lai&Robbins]05:32
Bounds07:16
UCB1 Algorithm (Auer et al., 2002)08:26
TITLE10:41
Bandits in Continuous Time11:02
Formal framework13:00
Evaluating allocation rules (policies)14:24
Gain, action values and regret16:19
Model-based UCB19:11
Algorithm 22:07
Regret bound24:52
Key proposition28:37
Open problems29:40
Levente Kocsis Remi Munos31:06
Bandits with large action-spaces31:23
Structure helps!31:46
UCT Upper Confidence based Tree search32:51
Example (t=1)33:13
Example (t=2)34:18
Example (t=3)34:45
Example (t=4)35:04
What is the next time a suboptimal action is sampled?35:17
UCT variations36:01
UCT variations37:29
Theoretical results37:53
Planning in MDPs: Sailing39:16
Planning in MDPs: Sailing39:24
Planning in MDPs: Sailing39:32
Results in games40:22
Thank you!41:02