On the Stratification of Multi-Label Data
published: Oct. 3, 2011, recorded: September 2011, views: 3616
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Stratied sampling is a sampling method that takes into account the existence of disjoint groups within a population and produces samples where the proportion of these groups is maintained. In single-label classication tasks, groups are dierentiated based on the value of the target variable. In multi-label learning tasks, however, where there are multiple target variables, it is not clear how stratied sampling could/should be performed. This paper investigates stratication in the multi-label data context. It considers two stratication methods for multi-label data and empirically compares them along with random sampling on a number of datasets and based on a number of evaluation criteria. The results reveal some interesting conclusions with respect to the utility of each method for particular types of multi-label datasets.
Download slides: ecmlpkdd2011_tsoumakas_stratification_01.pdf (667.6 KB)
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !