Dynamic Bayesian Networks for Multimodal Interaction
published: Feb. 25, 2007, recorded: June 2005, views: 1405
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Dynamic Bayesian networks (DBNs) offer a natural upgrade path beyond classical hidden Markov models and become especially relevant when temporal data contains higher order structure, multiple modalities or multi-person interaction. We describe several instantiations of dynamic Bayesian networks that are useful for modeling temporal phenomena spanning audio, video and haptic channels in single, two-person and multi-person activity. These models include input-output hidden Markov models, switched Kalman filters and, most generally, dynamical systems trees (DSTs). These models are used to learn audio-video interaction in social activities, video interaction in multi-person game playing and haptic-video interaction in robotic laparoscopy. Model parameters are estimated from data in an unsupervised setting using generalized expectation maximization methods. Subsequently, these models can predict, synthesize and classify various types of rich multimodal human activity. Experiments in gesture interaction, audio-video conversation, football game playing and surgical drill evaluation are shown.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !