Detecting Actions, Poses, and Objects with Relational Phraselets
chairman: Michael J. Black, Max Planck Institute for Intelligent Systems, Max Planck Institute
chairman: Ivan Laptev, INRIA - The French National Institute for Research in Computer Science and Control
published: Nov. 12, 2012, recorded: October 2012, views: 5034
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
We present a novel approach to modeling human pose, together with interacting objects, based on compositional models of local visual interactions and their relations. Skeleton models, while flexible enough to capture large articulations, fail to accurately model selfocclusions and interactions. Poselets and Visual Phrases address this limitation, but do so at the expense of requiring a large set of templates. We combine all three approaches with a compositional model that is flexible enough to model detailed articulations but still captures occlusions and object interactions. Unlike much previous work on action classification, we do not assume test images are labeled with a person, and instead present results for action detection in an unlabeled image. Notably, for each detection, our model reports back a detailed description including an action label, articulated human pose, object poses, and occlusion flags. We demonstrate that modeling occlusion is crucial for recognizing human-object interactions. We present results on the PASCAL Action Classification challenge that shows our unified model advances the state-of-the-art for detection, action classification, and articulated pose estimation.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !