SRMC '07 - Tübingen
Pascal

Stability and Resampling Methods for Clustering

Description

Model assessment is one of the most crucial aspects of statistical data analysis problems. In particular in data clustering it is difficult to devise reasonable tools for this purpose - the most prominent example is the problem of choosing the number k of clusters one wants to construct. Stability-based methods and resampling methods have become a popular choice for model selection in clustering, which is documented by the wealth of literature on this topic. The basic rationale of those approaches is that valid models should be reproducible under perturbation or resampling of the data. If high instability of models is observed, the inferred solution does not seem to be a generally valid model, or at least seems to have missed some important aspects of the data.
Many scientists report that stability and resampling methods work well for clustering model selection. Moreover, for supervised learning there is a wealth of literature that proves that stable classification algorithms have a good generalization performance. On the other hand, it has recently been claimed that stability methods for clustering can be misleading and do not necessarily work the way people believe they do. There is still an ongoing debate on how those results should be interpreted. But many researchers working on clustering stability methods agree that there is a lack of theoretical understanding for stability methods in clustering. In particular it seems unclear in which situations stability works and what the mechanism is which makes it a successful tool in those situations.
This lack of understanding is the motivation for holding a workshop on stability and resampling methods for clustering. We plan to hold a rather small workshop for specialists working on stability questions for clustering, or on stability-related questions in other areas of computer science or mathematics. We want to have a small number of invited talks, but want to dedicate a considerable amount of time to discussions. Hopefully, combining the expertise of people working on different aspects of stability and resampling will lead to a deeper understanding of this tool and its role with respect to clustering.

 

Link this page

Would you like to put a link to this event on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment: