Off-policy Learning for Multiple Loggers
Published on Feb 18, 20247 Views
It is well known that the historical logs are used for evaluating and learning policies in interactive systems, e.g. recommendation, search, and online advertising. Since direct online policy learning