Exploration Scavenging
Published on Aug 06, 20083041 Views
We examine the problem of evaluating a policy in the contextual bandit setting using only observations collected during the execution of another policy. We show that policy evaluation can be impossibl