Off-policy Model-based Learning under Unknown Factored Dynamics
Published on Sep 27, 20151748 Views
Off-policy learning in dynamic decision problems is essential for providing strong evidence that a new policy is better than the one in use. But how can we prove superiority without testing the new po