Dimensionality Reduction by Feature Selection in Machine Learning

author: Dunja Mladenić, Artificial Intelligence Laboratory, Jožef Stefan Institute
published: Feb. 25, 2007,   recorded: February 2005,   views: 2005
Categories
You might be experiencing some problems with Your Video player.

Slides

Slides
0:01 Dimensionality Reduction by Feature Selection in Machine Learning
0:15 Reasons for dimensionality reduction
0:59 Approaches to dimensionality reduction
3:02 Example for the problem
4:05 Search for feature subset
4:37 Feature subset selection
5:18 Approaches to feature subset selection
6:50 Filtering
7:41 Filters: Distribution-based [Koller & Sahami 1996]
8:23 Filters: Relief [Kira & Rendell 1992]
9:32 Filters: FOCUS [Almallim & Dietterich 1991]
11:00 Illustration of FOCUS
11:37 Filters: Random [Liu & Setiono 1996]
12:43 Filters: MDL-based [Pfahringer 1995]
13:30 Wrapper
14:52 Wrappers: Instance-based learning
15:42 Wrappers: Decision tree induction
16:48 Metric-based model selection
18:56 Embedded
19:15 Embedded
20:19 Embedded: in filters [Cardie 1993]
21:03 Simple Filtering
21:43 Feature subset selection on text data – commonly used methods
25:08 Scoring individual feature
28:46 Influence of feature selection on the classification performance
29:41 Illustration of feature selection
30:22 Illustration on 5 datasets from Yahoo! hierarchy using Naïve Bayes [Mladenic & Grobelnik 2003]
32:25 CrossEntropy
34:28 Rank of the correct category in the list of all categories F2-measure combining precision and recall emphases on recall Ctgs – number of categories looking promising (testing example needs to be class
39:18 Illustration on Reuters-2000 Data [Brank et al 2002]
40:03 Experiments with Naïve Bayes Classifier
41:08 Average number of nonzero components per vector instead of the overall no. of features
41:11 Experiments with Perceptron Classifier
41:46 Experiments with the Linear SVM Classifier
42:30 Discussion Using discarded features can help
44:19 Discussion

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.
 
    Delicious Bibliography

Description

Dimensionality reduction is a commonly used step in machine learning, especially when dealing with a high dimensional space of features. The original feature space is mapped onto a new, reduced dimensioanllyity space and the examples to be used by machine learning algorithms are represented in that new space. The mapping is usually performed either by selecting a subset of the original features or/and by constructing some new features. This persentation deals with the first approach, feature subset selection. We provide a brief overview of the feature subset selection techniques that are commonly used in machine learning and give a more detailed description of feature subset selection used in machine learning on text data. Performance of some methods used is document categorization is illustrated by providing experimental comparison on real-world data collected from the Web.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: