Models for Trading Exploration and Exploitation using Upper Confidence Bounds

Published on 2007-02-253685 Views

Workshop on Principled Methods 2005 - London

Models for Trading00:00

Overview00:39

The bandit problem with linear side information01:25

Goal03:08

Relation to other models04:42

Results06:54

Remarks09:16

Algorithm: Using upper confidence bounds09:53

Why does this work?11:28

Bounding the widths of the confidence intervals16:05

Random bandit problem (cont.)17:55

Random bandit problem: Improved bounds22:13

Linear side information27:26

Calculating the variance31:25

Optimizing the regret32:21

Calculating34:20

Reinforcement learning37:41

Motivation for online reinforcement learning43:05

Discounted and undiscounted returns44:31

Regret46:48

Episodic reinforcement learning47:41

The algorithm: Upper confidence bounds again49:18

Why it works (1)53:17

Why it works (2)55:07

Why it works (3)56:04

Conclusion59:46

auer-slides_Page_3401:02:28