published: April 28, 2010, recorded: March 2010, views: 5303
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Clustering is a classical problem which looks for important segments in an unstructured data set. In general, this is an ill-posed problem. A common approach is to consider the data set as a sample of an unknown probability distribution function on some underlying space. Clustering then becomes a problem of understanding the behaviour of the distribution function.
In this talk, I will introduce persistence-based clustering. Under some mild assumptions, the algorithm comes with a variety of strong theoretical guarantees. In particular, it provably approximates the structure of the underlying distribution function even when underlying space is only approximately known. The approach is based heavily on persistent homology (also refered to as topological persistence), a relatively recent development in the area of computational topology. It is precisely this framework which makes many of the proofs possible. The talk will include a general introduction to persistence so no prior knowledge is expected. On the practical side, the algorithm is efficient, both in memory size and running time, so it can handle large, high dimensional data sets quickly. Finally, it provides visual feedback in addition to the clusters, something which is particularly useful when the data sets cannot be visualized.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !