Optimistic Initialization and Greediness Lead to Polynomial Time Learning in Factored MDPs
Published on Aug 26, 20093070 Views
In this paper we propose an algorithm for polynomial-time reinforcement learning in factored Markov decision processes (FMDPs). The factored optimistic initial model (FOIM) algorithm, maintains an emp