Separated Trust Regions Policy Optimization Method

Published on 2020-03-0231 Views

Luobao Zou

In this work, we propose a moderate policy update method for reinforcement learning, which encourages the agent to explore more boldly in early episodes but updates the policy more cautious. Based on

KDD 2019 - Anchorage

Related categories