video thumbnail
Pause
Mute
Subtitles not available
Playback speed
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Full screen

Multiarmed Bandits and Partial Monitoring Exploration and Exploitation using Upper Confidence Bounds

Published on Feb 4, 20253966 Views

Related categories

Presentation

MULTIARMED BANDITS00:00
THE BANDIT PROBLEM01:36
FINITE-TIME REGRET05:07
HORIZON-DEPENDENT REWARD DISTRIBUTIONS13:02
HORIZON-DEP. REWARD DISTRIBUTIONS (CONT.)16:55
THE NONSTOCHASTIC BANDIT PROBLEM22:21
A NEARLY OPTIMAL RANDOMIZED POLICY29:33
PROOF 1/237:42
PROOF 2/239:35
A POINTWISE BOUND42:05
REGRET BOUNDS43:20
VARIANCE PROBLEM47:12
REGRET BOUNDS THAT HOLD W.H.P.48:17
COMPETING AGAINST ARBITRARY POLICIES49:43
TRACKING REGRET51:33
A BOUND ON THE TRACKING REGRET51:51
PARTIAL MONITORING57:48
FORECASTING A SEQUENCE57:53
PREDICTION WITH EXPERT ADVICE01:00:25
MULTIARMED BANDIT01:01:20
EXAMPLES: APPLE TASTING01:04:12
EXAMPLES: DYNAMIC PRICING01:07:53
CONTROLLING THE REGRET01:10:23
THE GENERAL FORECASTER FOR PARTIAL MONITORING01:12:55
LOWER BOUNDS01:14:03
EXAMPLES: LABEL EFFICIENT FORECASTING01:14:38
A STRATEGY FOR REVEALING ACTIONS01:18:14