Practical RL: Representation, interaction, synthesis, and morality (PRISM)

author: Peter Stone, Department of Computer Science, University of Texas at Austin
published: July 28, 2015,   recorded: June 2015,   views: 2761


Related Open Educational Resources

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.


When scaling up Reinforcement Learning (RL) to large continuous domains with imperfect representations and hierarchical structure, we often try applying algorithm that are proven to converge in small finite domains, and then just hope for the best. This talk will advocate instead designing algorithms that adhere to the constraints, and indeed take advantage of the opportunities, that might come with the problem at hand. Drawing on several different research threads within the Learning Agents Research Group at UT Austin, I will touch on four types of issues that arise from these constraints and opportunities: 1) Representation -choosing the algorithm for the problem’s representation and adapting the representation to fit the algorithm; 2) Interaction - with other agents and with human trainers; 3) Synthesis - of different algorithms for the same problem and of different concepts in the same algorithm; and 4) Mortality - dealing with the constraint that when the environment is large relative to the number of action opportunities available, one cannot explore exhaustively. Within this context, I will focus on two specific RL approaches, namely the TEXPLORE algorithm for real-time sample-efficient reinforcement learning for robots; and layered learning, a hierarchical machine learning paradigm that enables learning of complex behaviors by incrementally learning a series of sub-behaviors. TEXPLORE has been implemented and tested on a full-size fully autonomous robot car, and layered learning was the key deciding factor in our RoboCup 2014 3D simulation league championship.

See Also:

Download slides icon Download slides: rldm2015_stone_practical_rl_01.pdf (2.2 MB)

Help icon Streaming Video Help

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: