event thumbnail image
MUSCLE Conference joint with VITALAS Conference

Automated Character Annotation in Multimedia

author: Andrew Zisserman, University of Oxford

Description

We describe progress in automatically identifying characters in films and TV series using their detected faces together with readily available annotation in the form of subtitles and transcripts. We describe how the subtitles and transcript can be aligned to give weak supervision on the characters present in a shot (as well as on the actions, emotions, locations etc). The supervision is weak because of correspondence problems and the character may not be visible. The visual problem of face recognition is challenging because faces appear in images at various sizes and pose, and also vary considerably in expression. Fortunately, videos contain multiple face examples of each person in a form that can easily be associated automatically using straightforward visual tracking. These multiple examples reduce the ambiguity of recognition. We show that the text supervision can be strengthened by speaker detection. Although the labelling is still incomplete and noisy, it is then sufficient to learn visual models for recognition, and achieve successful character identification. This is joint work with Mark Everingham and Josef Sivic.

You might be experiencing some problems with Your Video player.
Slides
0:00 Automated Character Annotation in Multimedia
0:18 The Objective - 1
0:44 The Objective - 2
1:08 Multimedia (Vision and Text) Approach
1:52 The Need
2:57 Outline
4:02 Names and Faces in the News
5:08 Weak Supervision from Text
5:18 Running Example: Use Episodes from Buffy the Vampire Slayer
5:56 Textual Annotation: Subtitles/Closed-Captions
6:33 Textual Annotation: Script
7:07 Alignment by Dynamic Time Warping
7:18 Subtitle/Script Alignment
8:01 Virtually Free Source of Annotation
8:27 Ambiguity
10:23 Face Representation and Matching
10:28 Why This is Difficult: Uncontrolled Viewing Conditions
10:55 Matching Faces - 1
11:09 Matching Faces - 2
12:06 The Benefits of Video
12:39 Three Steps
12:56 Obtaining Sets of Faces Using Tracking within Shots
12:57 Face Detection
13:29 "Tracking" by Face Detection
13:44 Face Association
14:21 Connecting Face Detections Temporally
14:46 Face Association
14:56 Example Face Tracks
15:22 Face Vector Representation
15:23 Matching Faces
15:51 Detect Face Features for Rectification
16:08 Eyes/Nose/Mouth Detectors
16:15 Constellation Like Appearance/Shape Model
16:24 Face Normalization
17:21 Representing Faces
17:31 SIFT Descriptor
17:44 Face Feature Vector - Summary
17:58 Matching Face Sets - 1
18:00 Matching Face Sets - 2
18:12 Matching Face Sets - 3
18:28 Matching Face Sets within a Shot
18:51 Example: Buffy the Vampire Slayer
20:04 Raw Face Detections
20:37 Face Tubes (Tracking Only)
21:15 Intra-Shot Matching
21:17 Face Tubes (Tracking Only)
21:41 Intra-Shot Matching
22:24 Ambiguity Again
23:01 Speaker Detection - 1
23:20 Speaker Detection - 2
24:15 Correct "Non-Speaking" Classifications
24:37 Error in Speaker Classification
24:58 Resolved Ambiguity
25:40 Semi-Supervised Learning
26:07 Exemplar Extraction
26:37 Classification by Exemplar Sets
27:22 "Refusal to Predict"
27:54 Experiments
28:18 Example Results - 1
28:39 Example Results - 2
28:48 Precision/Recall
29:33 Example Video
31:08 Quantitative Results
31:33 Using an SVM Classifier – Noisy Labels
32:45 Classification Results (Inter-Episode)
32:47 Extensions
32:48 Improving Coverage – Beyond Frontal Faces
32:58 Feature Localization & Speaker Detection
33:08 Profile Speaker Detection
33:41 - Questions

Lecture rating

People found this lecture:
Worth seeing
because it is:
 Valuable and informative
Well presented
Easily understandable
Acceptably recorded
You need to login to cast your vote.

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment: