Learning How to Move and Where to Look from Unlabeled Video
published: Aug. 23, 2017, recorded: February 2017, views: 1499
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
The status quo in visual recognition is to learn from batches of unrelated Web photos labeled by human annotators. Yet cognitive science tells us that perception develops in the context of acting and moving in the world — and without intensive supervision. How can unlabeled video augment computational visual learning? I'll describe our recent work exploring how a system can learn effective representations by watching unlabeled video. Fist we consider how the ego-motion signals accompanying a video provide a valuable cue during learning, allowing the system to internalize the link between “how I move” and “what I see.” Next, I explore how the temporal coherence of video permits new forms of invariant feature learning, whether by capturing how object-centric regions evolve over time or by representing higher order consistency in visual changes. Incorporating these ideas into various recognition tasks, we demonstrate the power in learning from ongoing, unlabeled visual observations — even overtaking traditional heavily supervised approaches in some cases. Finally, I examine how simply having seen unlabeled human-taken videos, a system can learn to mimic human videographer tendencies, automatically creating normal field of view video out of unedited 360 degree panoramas.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !