Imitation Learning and Purposeful Prediction: Probabilistic and Non-probabilistic Methods
published: Jan. 19, 2010, recorded: December 2009, views: 4503
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Programming robot behavior remains a challenging task. While it is often easy to abstractly define or even demonstrate a desired behavior, designing a controller that embodies the same behavior is difficult, time consuming, and ultimately expensive. The machine learning paradigm offers the promise of enabling "programming by demonstration" for developing high-performance robotic systems. Unfortunately, many "behavioral cloning" approaches that utilize the classical tools of supervised learning (e.g. decision trees, neural networks, or support vector machines) do not fit the needs of modern robotic systems. Classical statistics and supervised machine learning exist in a vacuum: predictions made by these algorithms are explicitly assumed to not affect the world in which they operate. In practice, robotic systems are often built atop sophisticated planning algorithms that efficiently reason far into the future; consequently, ignoring these planning algorithms in lieu of a supervised learning approach often leads to myopic and poor-quality robot performance. While planning algorithms have shown success in many real-world applications ranging from legged locomotion to outdoor unstructured navigation, such algorithms rely on fully specified cost functions that map sensor readings and environment models to quantifiable costs. Such cost functions are usually manually designed and programmed. Recently, our group has developed a set of techniques that learn these functions from human demonstration. These algorithms apply an Inverse Optimal Control (IOC) approach to find a cost function for which planned behavior mimics an expert's demonstration. I'll discuss these methodologies, both probabilistic and otherwise, for imitation learning. I'll focus on the Principle of Causal Maximum Entropy that generalizes the classical Maximum Entropy Principle, widely used in many fields including physics, statistics, and computer vision, to problems of decision making and control. This generalization enables MaxEnt to apply to a new class of problems including Inverse Optimal Control and activity forecasting. This approach further elucidates the intimate connections between probabilistic inference and optimal control. I'll consider case studies in activity forecasting of drivers and pedestrians as well as the imitation learning of robotic locomotion and rough-terrain navigation. These case-studies highlight key challenges in applying the algorithms in practical settings that utilize state-of-the-art planners and are constrained by efficiency requirements and imperfect expert demonstration.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !