Support vector machines loss with l1 penalty

author: Sara van de Geer, ETH Zurich
published: Feb. 25, 2007,   recorded: October 2004,   views: 5912

Related Open Educational Resources

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.


We consider an i.i.d. sample from (X,Y), where X is a feature and Y a binary label, say with values +1 or -1. We use a high-dimensional linear approximation of the regression of Y on X and support vector machine loss with l1 penalty on the regression coefficients. This procedure does not depend on the (unknown) noise level or on the (unknown) sparseness of approximations of Bayes rule, but nevertheless its prediction error is smaller for smaller noise levels and/or sparser approximations. Thus, it adapts to unknown properties of the underlying distribution. In an example, we show that up to terms logarithmic in the sample size, the procedure yields minimax rates for the excess risk.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: