event thumbnail image
Reinforcement Learning

Exploration Scavenging

author: Alexander Strehl, Yahoo! Research

Description

We examine the problem of evaluating a policy in the contextual bandit setting using only observations collected during the execution of another policy. We show that policy evaluation can be impossible if the exploration policy chooses actions based on the side information provided at each time step. We then propose and prove the correctness of a principled method for policy evaluation which works when this is not the case, even when the exploration policy is deterministic, as long as each action is explored sufficiently often. We apply this general technique to the problem of offline evaluation of internet advertising policies. Although our theoretical results hold only when the exploration policy chooses ads independent of side information, an assumption that is typically violated by commercial systems, we show how clever uses of the theory provide non-trivial and realistic applications. We also provide an empirical demonstration of the effectiveness of our techniques on real ad placement data.

You might be experiencing some problems with Your Video player.
Slides
0:00 Exploration Scavenging
0:10 Offline Policy Learning Problem
1:23 Formalization
4:13 Importance-weighting approach
5:46 Outline
6:53 Estimator
9:52 Multiple Policies
11:33 Impossibility Example
13:07 Overcoming Determinism
14:05 Outline
14:54 Internet Advertising Application
15:25 Attention Decay Coefficients
17:41 Empirical Results
18:48 Evaluation on Yahoo!’s data set.
20:15 Conclusion
20:44 Thanks for Listening
21:12 - Questions
21:15 - Questions
21:19 - Questions
22:52 - Questions
22:54 - Questions

Lecture rating

People found this lecture:
Worth seeing
because it is:
 Valuable and informative
Well presented
Easily understandable
Acceptably recorded
You need to login to cast your vote.

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: