event thumbnail image
On-line Trading of Exploration and Exploitation
Pascal

Using upper confidence bounds to control exploration and exploitation

author: Csaba Szepesvari, Department of Computing Science, University of Alberta
You might be experiencing some problems with Your Video player.
Slides
0:01 Using upper confidence bounds to control exploration and exploitation
0:09 Contents
1:15 Exploration vs. Exploitation
2:08 Exploration vs. Exploitation: Some Applications
3:06 Bandit Problems – “Optimism in the Face Uncertainty”
5:32 Parametric Bandits [Lai&Robbins]
7:16 Bounds
8:26 UCB1 Algorithm (Auer et al., 2002)
10:41 TITLE
11:02 Bandits in Continuous Time
13:00 Formal framework
14:24 Evaluating allocation rules (policies)
16:19 Gain, action values and regret
19:11 Model-based UCB
22:07 Algorithm
24:52 Regret bound
28:37 Key proposition
29:40 Open problems
31:06 Levente Kocsis Remi Munos
31:23 Bandits with large action-spaces
31:46 Structure helps!
32:51 UCT Upper Confidence based Tree search
33:13 Example (t=1)
34:18 Example (t=2)
34:45 Example (t=3)
35:04 Example (t=4)
35:17 What is the next time a suboptimal action is sampled?
36:01 UCT variations
37:29 UCT variations
37:53 Theoretical results
39:16 Planning in MDPs: Sailing
39:24 Planning in MDPs: Sailing
39:32 Planning in MDPs: Sailing
40:22 Results in games
41:02 Thank you!

Lecture rating

People found this lecture:
Worth seeing
because it is:
 Valuable and informative
Well presented
Easily understandable
Acceptably recorded
You need to login to cast your vote.

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment: