Sparse Methods for Machine Learning: Theory and Algorithms

author: Francis R. Bach, INRIA - SIERRA project-team
published: Jan. 19, 2010,   recorded: December 2009,   views: 4199
Categories
You might be experiencing some problems with Your Video player.

Slides

Slides
0:00 Sparse methods for machine learning Theory and algorithms
0:56 Supervised learning and regularization
2:38 Regularizations (1)
2:58 Regularizations (2)
3:43 ℓ2 vs. ℓ1 - Gaussian hare vs. Laplacian tortoise
5:14 Lasso - Two main recent theoretical results (1)
5:53 Lasso - Two main recent theoretical results (2)
6:26 Going beyond the Lasso (1)
7:13 Going beyond the Lasso (2)
7:43 Going beyond the Lasso (3)
7:58 Going beyond the Lasso (4)
8:38 Sparse methods for machine learning, Outline
9:03 Why ℓ1-norm constraints leads to sparsity?
10:26 ℓ1-norm regularization (linear setting)
11:26 A review of nonsmooth convex analysis and optimization
11:53 Optimality conditions for smooth optimization Zero gradient (1)
12:58 Optimality conditions for smooth optimization Zero gradient (2)
13:32 Directional derivatives - convex functions on Rp
15:43 Optimality conditions for convex functions
16:08 Directional derivatives for ℓ1-norm regularization
16:48 Optimality conditions for ℓ1-norm regularization
19:54 First order methods for convex optimization on Rp, Smooth optimization
24:08 First-order methods for convex optimization on Rp, Non smooth optimization
25:16 Counter-example Coordinate descent for nonsmooth objectives
26:13 Counter-example (Bertsekas, 1995) Steepest descent for nonsmooth objectives
26:53 Sparsity-inducing norms Using the structure of the problem
28:47 Cheap (and not dirty) algorithms for all losses (1)
29:03 Cheap (and not dirty) algorithms for all losses (2)
29:08 Cheap (and not dirty) algorithms for all losses (3)
32:48 Special case of square loss (1)
33:23 Special case of square loss (2)
34:28 Optimality conditions for the sign vector s (Lasso)
35:29 Homotopy methods for the square loss (Markowitz, 1956; Osborne et al., 2000; Efron et al., 2004)
37:08 Piecewise linear paths
38:36 Algorithms for ℓ1-norms (square loss): Gaussian hare vs. Laplacian tortoise
39:23 Additional methods - Softwares
40:28 Sparse methods for machine learning, Outline
40:53 Theoretical results - Square loss
41:48 Model selection consistency (Lasso) (1)
42:23 Model selection consistency (Lasso) (2)
44:07 Model selection consistency (Lasso) (3)
44:58 Adaptive Lasso and concave penalization
48:18 Bolasso (Bach, 2008a)
48:18 High-dimensional inference, Going beyond exact support recovery
48:28 Model selection consistency of the Lasso/Bolasso
49:17 High-dimensional inference, Variable selection without computational limits
52:59 High-dimensional inference, Variable selection with orthogonal design (1)
54:33 High-dimensional inference, Variable selection with orthogonal design (2)
55:45 High-dimensional inference (Lasso)
56:52 Mutual incoherence (uniform low correlations)
58:27 Restricted eigenvalue conditions
60:03 Checking sufficient conditions

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.
 
    Delicious Bibliography

 Watch videos:   (click on thumbnail to launch)

Watch Part 1
Part 1 1:00:34
!NOW PLAYING
Watch Part 2
Part 2 1:00:10

Description

Regularization by the L1-norm has attracted a lot of interest in recent years in statistics, machine learning and signal processing. In the context of least-square linear regression, the problem is usually referred to as the Lasso or basis pursuit. Much of the early effort has been dedicated to algorithms to solve the optimization problem efficiently, either through first-order methods, or through homotopy methods that leads to the entire regularization path (i.e., the set of solutions for all values of the regularization parameters) at the cost of a single matrix inversion. A well-known property of the regularization by the L1-norm is the sparsity of the solutions, i.e., it leads to loading vectors with many zeros, and thus performs model selection on top of regularization. Recent works have looked precisely at the model consistency of the Lasso, i.e., if we know that the data were generated from a sparse loading vector, does the Lasso actually recover the sparsity pattern when the number of observations grows? Moreover, how many irrelevant variables could we consider while still being able to infer correctly the relevant ones? The objective of the tutorial is to give a unified overview of the recent contributions of sparse convex methods to machine learning, both in terms of theory and algorithms. The course will be divided in three parts: in the first part, the focus will be on the regular L1-norm and variable selection, introducing key algorithms and key theoretical results. Then, several more structured machine learning problems will be discussed, on vectors (second part) and matrices (third part), such as multi-task learning, sparse principal component analysis, multiple kernel learning and sparse coding.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: