
Active Reinforcement Learning
Published on 2008-08-065235 Views
When the transition probabilities and rewards of a Markov Decision Process (MDP) are known, the agent can obtain the optimal policy without any interaction with the environment. However, exact transit