Non-Parametric Policy Gradients: A Unified Treatment of Propositional and Relational Domains
Description
Policy gradient approaches are a powerful instrument for learning how to interact with the environment.Existing approaches have focused on propositional and continuous domains only. Without extensive feature engineering, it is difficult -- if not impossible -- to apply them within structured domains, in which e.g. there is a varying number of objects and relations among them. In this paper, we describe a non-parametric policy gradient approach -- called NPPG -- that overcomes this limitation. The key idea is to apply Friedmann's gradient boosting: policies are represented as a weighted sum of regression models grown in an stage-wise optimization. Employing off-the-shelf regression learners, NPPG can deal with propositional, continuous, and relational domains in a unified way. Our experimental results show that it can even improve on established results.
| Slides | |
| 0:00 | Non-Parametric Policy Gradient |
| 0:36 | Take Away Message |
| 1:28 | Overview |
| 2:04 | Reinforcement Learning |
| 2:56 | World Value |
| 3:48 | (steady) State Distribution |
| 4:24 | Value Functions |
| 5:00 | Direct Policy Learning |
| 6:08 | Policy Gradients with Function Approximation |
| 7:15 | Non-Parametric Policy Gradient |
| 8:42 | Functional Gradient Boosting |
| 9:49 | Functional Gradient Boosting (2) |
| 11:35 | In Practice |
| 12:08 | Local Evaluation |
| 12:59 | Gradient Tree Boosting |
| 14:09 | Some Results |
| 20:21 | Future Work |
| 21:15 | Summary |
| 21:58 | The End! |
| 22:35 | - Questions |
Lecture rating
| People found this lecture: | ||
| Worth seeing | ||
| because it is: | ||
| Valuable and informative | ||
| Well presented | ||
| Easily understandable | ||
| Acceptably recorded | ||
| You need to login to cast your vote. | ||
Report a problem or upload files
If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Related content
Link this page
Would you like to put a link to this lecture on your homepage?Go ahead! Copy the HTML snippet !



