event thumbnail image
Subspace, Latent Structure and Feature Selection techniques: Statistical and Optimisation perspectives Workshop
Pascal

What is the Optimal Number of Features? A learning theoretic perspective

author: Amir Navot, Hebrew University of Jerusalem

Description

In this paper we discuss the problem of feature selection for supervised learning from the standpoint of statistical machine learning. We inquire what subset of features will lead to the best classification accuracy. It is clear that if the statistical model is known, or if there are an unlimited number of training samples, any additional feature can only improve the accuracy. However, we explicitly show that when the training set is finite, using all the features may be suboptimal, even if all the features are independent and carry information on the label. We analyze one setting analytically and show how feature selection can increase accuracy. We also find the optimal number of features as a function of the training set size for a few specific examples. This perspective on feature selection is different from the common approach that focuses on the probability that a specific algorithm will pick a completely irrelevant or redundant feature.

You might be experiencing some problems with Your Video player.
Slides
0:01 What is The Optimal Number of Features? A learning theoretic Perspective
0:18 What is Feature Selection?
1:09 Reasons to do Feature Selection
2:10 The Questions
2:42 Two Gaussians - Problem Setting
4:18 Problem Setting – Cont.
5:42 Illustration
7:23 Result
8:07 Solving for Specific 
10:33 Solving for Specific  - Cont.
11:40 Problem Setting – Cont.
11:59 Solving for Specific  - Cont.
12:13 Proof
14:24 Proof
14:49 Proof – Cont.
15:53 Proof – Cont.
16:37 “Empirical Proof” of the Lemma
17:25 Linear SVM Error (averaged on 200 repeats, c=0.01, using Gavin Cawley’s tool box)
19:34 Conclusions
21:09 What is The Optimal Number of Features? A learning theoretic Perspective
24:30 Proof – Cont.

Lecture rating

People found this lecture:
Worth seeing
because it is:
 Valuable and informative
Well presented
Easily understandable
Acceptably recorded
You need to login to cast your vote.

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment: