Feature Selection Stability Assessment based on the Jensen-Shannon Divergence
published: Oct. 3, 2011, recorded: September 2011, views: 3228
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Feature selection and ranking techniques play an important role in the analysis of high-dimensional data. In particular, their stability becomes crucial when the feature importance is later studied in order to better understand the underlying process. The fact that a small change in the dataset may affect the outcome of the feature selection/ranking algorithm has been long overlooked in the literature. We propose an information-theoretic approach, using the Jensen-Shannon divergence to assess this stability (or robustness). Unlike other measures, this new metric is suitable for different algorithm outcomes: full ranked lists, partial sublists (top-k lists) as well as the least studied partial ranked lists. This generalized metric attempts to measure the disagreement among a whole set of lists with the same size, following a probabilistic approach and being able to give more importance to the differences that appear at the top of the list. We illustrate and compare it with popular metrics like the Spearman rank correlation and the Kuncheva’s index on feature selection/ranking outcomes artificially generated and on an spectral fat dataset with different filter-based feature selectors.
Download slides: ecmlpkdd2011_alaiz_rodriguez_feauture_01.pdf (1.8 MB)
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !