Counterfactual Risk Minimization: Learning from Logged Bandit Feedback thumbnail
Pause
Mute
Subtitles not available
Playback speed
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Full screen

Counterfactual Risk Minimization: Learning from Logged Bandit Feedback

Published on Dec 05, 20151684 Views

We develop a learning principle and an efficient algorithm for batch learning from logged bandit feedback. This learning setting is ubiquitous in online systems (e.g., ad placement, web search, recomm

Related categories