Action-Gap Phenomenon in Reinforcement Learning

author: Amir-massoud Farahmand, Vector Institute
published: Sept. 6, 2012,   recorded: December 2011,   views: 2985


Related Open Educational Resources

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.


Many practitioners of reinforcement learning problems have observed that oftentimes the performance of the agent reaches very close to the optimal performance even though the estimated (action-)value function is still far from the optimal one. The goal of this paper is to explain and formalize this phenomenon by introducing the concept of the action-gap regularity. As a typical result, we prove that for an agent following the greedy policy \hat{\pi} with respect to an action-value function \hat{Q}, the performance loss E[V^*(X) - V^{\hat{X}} (X)] is upper bounded by O(|| \hat{Q} - Q^*||_\infty^{1+\zeta}), in which \zeta >= 0 is the parameter quantifying the action-gap regularity. For \zeta > 0, our results indicate smaller performance loss compared to what previous analyses had suggested. Finally, we show how this regularity affects the performance of the family of approximate value iteration algorithms.

See Also:

Download slides icon Download slides: nips2011_farahmand_actiongap_01.pdf (1.5 MB)

Download article icon Download article: nips2011_0138.pdf (330.6 KB)

Help icon Streaming Video Help

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: