Latent Variable Sparse Bayesian Models
published: May 6, 2009, recorded: April 2009, views: 5676
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
A variety of practical approaches have recently been introduced for performing estimation and inference using linear models with sparse priors on the unknown coefficients, a process that can have wide-ranging implications in diverse areas such as model selection and compressive sensing. While not always derived or marketed as such, many of these methods can be viewed as arising from Bayesian models capitalizing on latent structure, expressible via hyperparameters, inherent in sparse distributions. Here we focus on four such strategies: (i) standard MAP estimation, (ii) hyperparameter MAP estimation, also called evidence maximization or empirical Bayes, (iii) variational Bayes using a factorial posterior, and (iv) local variational approximation using convex lower bounding. All of these approaches can be used to compute tractable posterior approximations to the underlying full distribution; however, the exact nature of these approximations is frequently unclear and so it is a challenging task to determine which strategy and sparse prior are appropriate. Rather than justifying such selections using the credibility of the full Bayesian model as is sometimes done, we base evaluations on the actual underlying cost functions that emerge from each method. To this end we discuss a common, unifying objective function that encompasses all of the above and then assess its properties with respect to representative applications such as finding maximally sparse (i.e., minimal L0 quasi-norm) representations. This objective function can be expressed in either coefficient space or hyperparameter space, a duality that facilitates direct comparisons between seemingly disparate approaches and naturally leads to theoretical insights and useful optimization strategies such as reweighted L1 and L2 minimization. This perspective also suggests extensions of the sparse linear model, including alternative likelihood functions (e.g., for classification) and more general sparse priors applicable to covariance component estimation, group selection, and the incorporation of explicit coefficient constraints (e.g., non-negativity). Several examples related to neuroimaging and compressive sensing will be considered.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !