Off-policy Learning for Multiple Loggers thumbnail
Pause
Mute
Subtitles not available
Playback speed
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Full screen

Off-policy Learning for Multiple Loggers

Published on Feb 18, 20247 Views

It is well known that the historical logs are used for evaluating and learning policies in interactive systems, e.g. recommendation, search, and online advertising. Since direct online policy learning

Related categories