Optimistic Initialization and Greediness Lead to Polynomial Time Learning in Factored MDPs thumbnail
Pause
Mute
Subtitles not available
Playback speed
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Full screen

Optimistic Initialization and Greediness Lead to Polynomial Time Learning in Factored MDPs

Published on Aug 26, 20093069 Views

In this paper we propose an algorithm for polynomial-time reinforcement learning in factored Markov decision processes (FMDPs). The factored optimistic initial model (FOIM) algorithm, maintains an emp

Related categories