Exploration Scavenging
Description
We examine the problem of evaluating a policy in the contextual bandit setting using only observations collected during the execution of another policy. We show that policy evaluation can be impossible if the exploration policy chooses actions based on the side information provided at each time step. We then propose and prove the correctness of a principled method for policy evaluation which works when this is not the case, even when the exploration policy is deterministic, as long as each action is explored sufficiently often. We apply this general technique to the problem of offline evaluation of internet advertising policies. Although our theoretical results hold only when the exploration policy chooses ads independent of side information, an assumption that is typically violated by commercial systems, we show how clever uses of the theory provide non-trivial and realistic applications. We also provide an empirical demonstration of the effectiveness of our techniques on real ad placement data.
| Slides | |
| 0:00 | Exploration Scavenging |
| 0:10 | Offline Policy Learning Problem |
| 1:23 | Formalization |
| 4:13 | Importance-weighting approach |
| 5:46 | Outline |
| 6:53 | Estimator |
| 9:52 | Multiple Policies |
| 11:33 | Impossibility Example |
| 13:07 | Overcoming Determinism |
| 14:05 | Outline |
| 14:54 | Internet Advertising Application |
| 15:25 | Attention Decay Coefficients |
| 17:41 | Empirical Results |
| 18:48 | Evaluation on Yahoo!’s data set. |
| 20:15 | Conclusion |
| 20:44 | Thanks for Listening |
| 21:12 | - Questions |
| 21:15 | - Questions |
| 21:19 | - Questions |
| 22:52 | - Questions |
| 22:54 | - Questions |
Lecture rating
| People found this lecture: | ||
| Worth seeing | ||
| because it is: | ||
| Valuable and informative | ||
| Well presented | ||
| Easily understandable | ||
| Acceptably recorded | ||
| You need to login to cast your vote. | ||
Report a problem or upload files
If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Related content
SEE ALSO:
Link this page
Would you like to put a link to this lecture on your homepage?Go ahead! Copy the HTML snippet !


