Automated Character Annotation in Multimedia
Description
We describe progress in automatically identifying characters in films and TV series using their detected faces together with readily available annotation in the form of subtitles and transcripts. We describe how the subtitles and transcript can be aligned to give weak supervision on the characters present in a shot (as well as on the actions, emotions, locations etc). The supervision is weak because of correspondence problems and the character may not be visible. The visual problem of face recognition is challenging because faces appear in images at various sizes and pose, and also vary considerably in expression. Fortunately, videos contain multiple face examples of each person in a form that can easily be associated automatically using straightforward visual tracking. These multiple examples reduce the ambiguity of recognition. We show that the text supervision can be strengthened by speaker detection. Although the labelling is still incomplete and noisy, it is then sufficient to learn visual models for recognition, and achieve successful character identification. This is joint work with Mark Everingham and Josef Sivic.
| Slides | |
| 0:00 | Automated Character Annotation in Multimedia |
| 0:18 | The Objective - 1 |
| 0:44 | The Objective - 2 |
| 1:08 | Multimedia (Vision and Text) Approach |
| 1:52 | The Need |
| 2:57 | Outline |
| 4:02 | Names and Faces in the News |
| 5:08 | Weak Supervision from Text |
| 5:18 | Running Example: Use Episodes from Buffy the Vampire Slayer |
| 5:56 | Textual Annotation: Subtitles/Closed-Captions |
| 6:33 | Textual Annotation: Script |
| 7:07 | Alignment by Dynamic Time Warping |
| 7:18 | Subtitle/Script Alignment |
| 8:01 | Virtually Free Source of Annotation |
| 8:27 | Ambiguity |
| 10:23 | Face Representation and Matching |
| 10:28 | Why This is Difficult: Uncontrolled Viewing Conditions |
| 10:55 | Matching Faces - 1 |
| 11:09 | Matching Faces - 2 |
| 12:06 | The Benefits of Video |
| 12:39 | Three Steps |
| 12:56 | Obtaining Sets of Faces Using Tracking within Shots |
| 12:57 | Face Detection |
| 13:29 | "Tracking" by Face Detection |
| 13:44 | Face Association |
| 14:21 | Connecting Face Detections Temporally |
| 14:46 | Face Association |
| 14:56 | Example Face Tracks |
| 15:22 | Face Vector Representation |
| 15:23 | Matching Faces |
| 15:51 | Detect Face Features for Rectification |
| 16:08 | Eyes/Nose/Mouth Detectors |
| 16:15 | Constellation Like Appearance/Shape Model |
| 16:24 | Face Normalization |
| 17:21 | Representing Faces |
| 17:31 | SIFT Descriptor |
| 17:44 | Face Feature Vector - Summary |
| 17:58 | Matching Face Sets - 1 |
| 18:00 | Matching Face Sets - 2 |
| 18:12 | Matching Face Sets - 3 |
| 18:28 | Matching Face Sets within a Shot |
| 18:51 | Example: Buffy the Vampire Slayer |
| 20:04 | Raw Face Detections |
| 20:37 | Face Tubes (Tracking Only) |
| 21:15 | Intra-Shot Matching |
| 21:17 | Face Tubes (Tracking Only) |
| 21:41 | Intra-Shot Matching |
| 22:24 | Ambiguity Again |
| 23:01 | Speaker Detection - 1 |
| 23:20 | Speaker Detection - 2 |
| 24:15 | Correct "Non-Speaking" Classifications |
| 24:37 | Error in Speaker Classification |
| 24:58 | Resolved Ambiguity |
| 25:40 | Semi-Supervised Learning |
| 26:07 | Exemplar Extraction |
| 26:37 | Classification by Exemplar Sets |
| 27:22 | "Refusal to Predict" |
| 27:54 | Experiments |
| 28:18 | Example Results - 1 |
| 28:39 | Example Results - 2 |
| 28:48 | Precision/Recall |
| 29:33 | Example Video |
| 31:08 | Quantitative Results |
| 31:33 | Using an SVM Classifier – Noisy Labels |
| 32:45 | Classification Results (Inter-Episode) |
| 32:47 | Extensions |
| 32:48 | Improving Coverage – Beyond Frontal Faces |
| 32:58 | Feature Localization & Speaker Detection |
| 33:08 | Profile Speaker Detection |
| 33:41 | - Questions |
Lecture rating
| People found this lecture: | ||
| Worth seeing | ||
| because it is: | ||
| Valuable and informative | ||
| Well presented | ||
| Easily understandable | ||
| Acceptably recorded | ||
| You need to login to cast your vote. | ||
Report a problem or upload files
If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Related content
Link this page
Would you like to put a link to this lecture on your homepage?Go ahead! Copy the HTML snippet !





