Large scale multiclass classification based on linear optimization

author: Sandor Szedmak, School of Electronics and Computer Science, University of Southampton
published: Feb. 25, 2007,   recorded: May 2004,   views: 3320

Related Open Educational Resources

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.


One of the most hard tasks in image classification to find a method being applicable on large scale multi-class problems where the sample size and number of the features are huge. Linear discriminant analysis as a classica l method for multi-class classification, which was introduced by Fisher (1936) [4], plays an important role in the machine learning society recently. The kernelized version of this method are discussed in several papers, however they generally deal with the two class version of this approach. Bartlett recognised, in 1938 [2], there is strong relationship between the Fisher Discriminant and the Canonical Correlation Analysis and this statement is valid for the multi-class case as well. Based on this work Barker et al. (2003) [1] and Rosipal et al. (2003) [10] discuss the details about this relationship and show the appropriate kernel approach to this problem. Using Canonical Correlation for multi-class classification in large scale problem suffers from the numerical difficulty to solve the generalised eigenvalue problem to provide the optimum. We present an analogue classificationprocedure based on linear optimisation which is able to extend the scale range of the solvable problems and to give sparse solution. Our method exploits the relationship between the L1 norm SVM and the boosting approach which were presented by Bennett et al. (2000) [3], Mangasarian (1999) [5] and Meir et al. (2003) [6]. Additionally, the formulation based on the soft margin SVM can solve the problem when the number of the features are less than the number of the observations in a given sample.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: