Support vector machines loss with l1 penalty
published: Feb. 25, 2007, recorded: October 2004, views: 5911
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
We consider an i.i.d. sample from (X,Y), where X is a feature and Y a binary label, say with values +1 or -1. We use a high-dimensional linear approximation of the regression of Y on X and support vector machine loss with l1 penalty on the regression coefficients. This procedure does not depend on the (unknown) noise level or on the (unknown) sparseness of approximations of Bayes rule, but nevertheless its prediction error is smaller for smaller noise levels and/or sparser approximations. Thus, it adapts to unknown properties of the underlying distribution. In an example, we show that up to terms logarithmic in the sample size, the procedure yields minimax rates for the excess risk.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !