Automated Character Annotation in Multimedia
published: Feb. 14, 2008, recorded: February 2008, views: 922
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
We describe progress in automatically identifying characters in films and TV series using their detected faces together with readily available annotation in the form of subtitles and transcripts. We describe how the subtitles and transcript can be aligned to give weak supervision on the characters present in a shot (as well as on the actions, emotions, locations etc). The supervision is weak because of correspondence problems and the character may not be visible. The visual problem of face recognition is challenging because faces appear in images at various sizes and pose, and also vary considerably in expression. Fortunately, videos contain multiple face examples of each person in a form that can easily be associated automatically using straightforward visual tracking. These multiple examples reduce the ambiguity of recognition. We show that the text supervision can be strengthened by speaker detection. Although the labelling is still incomplete and noisy, it is then sufficient to learn visual models for recognition, and achieve successful character identification. This is joint work with Mark Everingham and Josef Sivic.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !