video thumbnail
Pause
Mute
Subtitles not available
Playback speed
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Full screen

Models for Trading Exploration and Exploitation using Upper Confidence Bounds

Published on Feb 4, 20253682 Views

Related categories

Presentation

Models for Trading00:00
Overview00:39
The bandit problem with linear side information01:25
Goal03:08
Relation to other models04:42
Results06:54
Remarks09:16
Algorithm: Using upper confidence bounds09:53
Why does this work?11:28
Bounding the widths of the confidence intervals16:05
Random bandit problem (cont.)17:55
Random bandit problem: Improved bounds22:13
Linear side information27:26
Calculating the variance31:25
Optimizing the regret32:21
Calculating34:20
Reinforcement learning37:41
Motivation for online reinforcement learning43:05
Discounted and undiscounted returns44:31
Regret46:48
Episodic reinforcement learning47:41
The algorithm: Upper confidence bounds again49:18
Why it works (1)53:17
Why it works (2)55:07
Why it works (3)56:04
Conclusion59:46
auer-slides_Page_3401:02:28