en-de
en-es
en-fr
en-sl
en
en-zh
0.25
0.5
0.75
1.25
1.5
1.75
2
Use of variance estimation in the multi-armed bandit problem
Published on Feb 25, 20074528 Views
An important aspect of most decision making problems concerns the appropriate balance between exploitation (acting optimally according to the partial knowledge acquired so far) and exploration of th
Related categories
Chapter list
Use of variance estimation in the multi-armed<br> bandit problem00:01
Outline00:20
The multi-armed bandit problem00:41
Notation (1/2)01:33
Notation (2/2)02:32
UCB policies05:45
UCB policies0106:27
Bernstein’s type inequalities07:51
Sketch of the proof09:39
Sketch of the proof0111:03
Definition11:33
Definition0112:10
Definition0212:28
A deviation inequality for the number of plays of<br> non-optimal arms13:37
A deviation inequality for the number of plays of non-optimal arms0114:03
A deviation inequality for the number of plays of <br>non-optimal arms0214:48
Cumulative regret bounds14:56
Cumulative regret bounds0118:16
Cumulative regret bounds0218:55
Discussion on the 1/n-UCB policy19:49
Discussion on the 1/n-UCB policy0120:21
Discussion on the 1/n-UCB policy0220:27
Definition21:15
Expected cumulative regret bound21:30
Expected cumulative regret bound0122:10
Sketch of the proof (1/3)22:47
Sketch of the proof (2/3)23:15
Sketch of the proof (2/3)23:37
Sketch of the proof (3/3)23:53
Conclusion24:12