Support vector machines loss with l1 penalty

author:Sara van de Geer, ETH Zurich
published: Feb. 25, 2007,   recorded: October 2004,   views: 122
Categories
You might be experiencing some problems with Your Video player.

Related content

Visitors who watched this lecture also watched...
03:54:31
Support Vector Machines

12774 views - Chih-Jen Lin, 2006
05:52:18
Support Vector Machines and Kernels

707 views - Bernhard Schölkopf, 2002
37:19
The incoherence condition in additive models

144 views - Sara van de Geer, 2008
01:28:05
Introduction to Support Vector Machines

4663 views - Colin Campbell, 2008
48:35
Tricks of the trade for training SVMs

192 views - Gökhan H. Bakir, 2005
05:18:05
Statistical Learning Theory

2716 views - Olivier Bousquet, 2003
37:59
Some results for the adaptive Lasso

203 views - Sara van de Geer, 2009
01:27:44
Support Vector and Kernel Methods

1057 views - John Shawe-Taylor, 2005
02:37:54
Theory and Applications of Kernel Space

1223 views - Florence d'Alché, 2007
24:56
How to Teach Support Vector Machine to Learn Vector Outputs

350 views - Sandor Szedmak, 2006

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.

Description

We consider an i.i.d. sample from (X,Y), where X is a feature and Y a binary label, say with values +1 or -1. We use a high-dimensional linear approximation of the regression of Y on X and support vector machine loss with l1 penalty on the regression coefficients. This procedure does not depend on the (unknown) noise level or on the (unknown) sparseness of approximations of Bayes rule, but nevertheless its prediction error is smaller for smaller noise levels and/or sparser approximations. Thus, it adapts to unknown properties of the underlying distribution. In an example, we show that up to terms logarithmic in the sample size, the procedure yields minimax rates for the excess risk.

Link this page  

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: