0.25
0.5
0.75
1.25
1.5
1.75
2
Models for Trading Exploration and Exploitation using Upper Confidence Bounds
Published on Feb 25, 20073678 Views
Related categories
Chapter list
Models for Trading00:00
Overview00:39
The bandit problem with linear side information01:25
Goal03:08
Relation to other models04:42
Results06:54
Remarks09:16
Algorithm: Using upper confidence bounds09:53
Why does this work?11:28
Algorithm: Using upper confidence bounds12:58
Why does this work?13:12
Algorithm: Using upper confidence bounds13:35
Why does this work?15:40
Bounding the widths of the confidence intervals16:05
Random bandit problem (cont.)17:55
Random bandit problem (cont.)20:30
Random bandit problem: Improved bounds22:13
Random bandit problem: Improved bounds23:08
Random bandit problem: Improved bounds23:20
Random bandit problem: Improved bounds23:41
Random bandit problem: Improved bounds23:47
Random bandit problem: Improved bounds24:10
Random bandit problem: Improved bounds24:29
Random bandit problem (cont.)26:20
Linear side information27:26
Calculating the variance31:25
Optimizing the regret32:21
Linear side information33:20
Optimizing the regret33:58
Calculating34:20
Optimizing the regret35:50
Calculating35:58
Reinforcement learning37:41
Calculating37:55
Reinforcement learning40:09
Motivation for online reinforcement learning43:05
Discounted and undiscounted returns44:31
Regret46:48
Episodic reinforcement learning47:41
The algorithm: Upper confidence bounds again49:18
Why it works (1)53:17
Why it works (2)55:07
Why it works (3)56:04
Conclusion59:46
auer-slides_Page_3401:02:28