Parameter Learning Using Approximate MAP Inference
published: Jan. 19, 2010, recorded: December 2009, views: 3977
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
In recent years, machine learning has seen the development of a series of algorithms for parameter learning that avoid estimating the partition function and instead, rely on accurate approximate MAP inference. Within this framework, we consider two new topics.
In the first part, we discuss parameter learning in a semi-supervised scenario. Specifically, we focus on a region-based scene segmentation model that explains an image in terms of its underlying regions (a set of connected pixels that provide discriminative features) and their semantic labels (such as sky, grass or foreground). While it is easy to obtain (partial) ground-truth labeling for the pixels of a training image, it is not possible for a human annotator to provide us with the best set of regions (those that result in the most discriminative features). To address this issue, we develop a novel iterative MAP inference algorithm which selects the best subset of regions from a large dictionary using convex relaxations. We use our algorithm to "complete" the ground-truth labeling (i.e. infer the regions) which allows us to employ the highly successful max-margin training regime. We compare our approach with the state of the art methods and demonstrate significant improvements.
In the second part, we discuss a new learning framework for general log-linear models based on contrastive objectives. A contrastive objective considers a set of "interesting" assignments and attempts to push up the probability of the correct instantiation at the expense of the other interesting assignments. In contrast to our approach, related methods such as pseudo-likelihood and contrastive divergence compare the correct instantiation only to nearby instantiations, which can be problematic when there is a high-scoring instantiation far away from the correct one. We present some of the theoretical properties and practical advantages of our method, including the ability to learn a log-linear model using only (approximate) MAP inference. We the theoretical properties and practical advantages of our method, including the ability to learn a log-linear model using only (approximate) MAP inference. We also show results of applying our method to some simple synthetic examples, where it significantly outperforms pseudo-likelihood.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !