High Confidence Policy Improvement
Published on Dec 05, 20151563 Views
We present a batch reinforcement learning (RL) algorithm that provides probabilistic guarantees about the quality of each policy that it proposes, and which has no hyper-parameter that requires expert