Category Detection Using Hierarchical Mean Shift

author: Weng-Keen Wong, School of Electrical Engineering and Computer Science, Oregon State University
published: Aug. 14, 2009,   recorded: June 2009,   views: 4343


Related Open Educational Resources

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.


Many applications in surveillance, monitoring, scientific discovery, and data cleaning require the identification of anomalies. Although many methods have been developed to identify statistically significant anomalies, a more difficult task is to identify anomalies that are both interesting and statistically significant. Category detection is an emerging area of machine learning that can help address this issue using a "human-in-the-loop" approach. In this interactive setting, the algorithm asks the user to label a query data point under an existing category or declare the query data point to belong to a previously undiscovered category. The goal of category detection is to bring to the user's attention a representative data point from each category in the data in as few queries as possible. In a data set with imbalanced categories, the main challenge is in identifying the rare categories or anomalies; hence, the task is often referred to as {\it rare} category detection. We present a new approach to rare category detection based on hierarchical mean shift. In our approach, a hierarchy is created by repeatedly applying mean shift with an increasing bandwidth on the data. This hierarchy allows us to identify anomalies in the data set at different scales, which are then posed as queries to the user. The main advantage of this methodology over existing approaches is that it does not require any knowledge of the dataset properties such as the total number of categories or the prior probabilities of the categories. Results on real-world data sets show that our hierarchical mean shift approach performs consistently better than previous techniques.

See Also:

Download slides icon Download slides: kdd09_wong_cduhms_01.ppt (1.2┬áMB)

Help icon Streaming Video Help

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: