Multiarmed Bandits and Partial Monitoring Exploration and Exploitation using Upper Confidence Bounds thumbnail
slide-image
Pause
Mute
Subtitles not available
Playback speed
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Full screen

Multiarmed Bandits and Partial Monitoring Exploration and Exploitation using Upper Confidence Bounds

Published on Feb 25, 20073961 Views

Related categories

Chapter list

MULTIARMED BANDITS00:00
THE BANDIT PROBLEM01:36
FINITE-TIME REGRET05:07
HORIZON-DEPENDENT REWARD DISTRIBUTIONS13:02
HORIZON-DEP. REWARD DISTRIBUTIONS (CONT.)16:55
THE NONSTOCHASTIC BANDIT PROBLEM22:21
THE NONSTOCHASTIC BANDIT PROBLEM27:12
A NEARLY OPTIMAL RANDOMIZED POLICY29:33
THE NONSTOCHASTIC BANDIT PROBLEM30:15
A NEARLY OPTIMAL RANDOMIZED POLICY31:00
PROOF 1/237:42
A NEARLY OPTIMAL RANDOMIZED POLICY39:16
PROOF 1/239:20
PROOF 2/239:35
PROOF 1/239:41
PROOF 2/239:47
A NEARLY OPTIMAL RANDOMIZED POLICY40:15
PROOF 2/240:21
A NEARLY OPTIMAL RANDOMIZED POLICY40:56
PROOF 2/241:18
PROOF 1/242:03
A POINTWISE BOUND42:05
REGRET BOUNDS43:20
VARIANCE PROBLEM47:12
REGRET BOUNDS47:23
VARIANCE PROBLEM47:28
REGRET BOUNDS THAT HOLD W.H.P.48:17
COMPETING AGAINST ARBITRARY POLICIES49:43
TRACKING REGRET51:33
A BOUND ON THE TRACKING REGRET51:51
PARTIAL MONITORING57:48
FORECASTING A SEQUENCE57:53
PREDICTION WITH EXPERT ADVICE01:00:25
MULTIARMED BANDIT01:01:20
PARTIAL MONITORING01:02:23
EXAMPLES: APPLE TASTING01:04:12
EXAMPLES: LABEL EFFICIENT FORECASTING01:06:26
EXAMPLES: DYNAMIC PRICING01:07:53
CONTROLLING THE REGRET01:10:23
THE GENERAL FORECASTER FOR PARTIAL MONITORING01:12:55
CONTROLLING THE REGRET01:13:10
REGRET BOUNDS01:13:16
LOWER BOUNDS01:14:03
EXAMPLES: APPLE TASTING01:14:24
EXAMPLES: LABEL EFFICIENT FORECASTING01:14:38
EXAMPLES: DYNAMIC PRICING01:14:47
EXAMPLES: LABEL EFFICIENT FORECASTING01:15:44
CONTROLLING THE REGRET01:17:01
LOWER BOUNDS01:17:24
A STRATEGY FOR REVEALING ACTIONS01:18:14