
en-de
en-es
en-fr
en-pt
en-sl
en
en-zh
0.25
0.5
0.75
1.25
1.5
1.75
2
Exploration Scavenging
Published on 2008-08-063049 Views
We examine the problem of evaluating a policy in the contextual bandit setting using only observations collected during the execution of another policy. We show that policy evaluation can be impossibl