Active Reinforcement Learning thumbnail
Pause
Mute
Subtitles
Playback speed
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Full screen

Active Reinforcement Learning

Published on Aug 06, 20085231 Views

When the transition probabilities and rewards of a Markov Decision Process (MDP) are known, the agent can obtain the optimal policy without any interaction with the environment. However, exact transit

Related categories