Active Reinforcement Learning
Published on Aug 06, 20085231 Views
When the transition probabilities and rewards of a Markov Decision Process (MDP) are known, the agent can obtain the optimal policy without any interaction with the environment. However, exact transit