event thumbnail image
ICML 2007 - The 24th Annual International Conference on Machine Learning
PASCAL

Graph Clustering With Network Structure Indices

author: Matthew J. Rattigan, University of Massachusetts Amherst

Description

Graph clustering has become ubiquitous in the study of relational data sets. We examine two simple algorithms: a new graphical adaptation of the k -medoids algorithm and the Girvan-Newman method based on edge betweenness centrality. We show that they can be effective at discovering the latent groups or communities that are defined by the link structure of a graph. However, both approaches rely on prohibitively expensive computations, given the size of modern relational data sets. Network structure indices (NSIs) are a proven technique for indexing network structure and efficiently finding short paths. We show how incorporating NSIs into these graph clustering algorithms can overcome these complexity limitations. We also present promising quantitative and qualitative evaluations of the modified algorithms on synthetic and real data sets.

You might be experiencing some problems with Your Video player.
Slides
0:00 Graph clustering with network structure indices
0:14 Chris Farley
0:54 Data clustering - 1
1:28 Data clustering - 2
2:05 Graph division & k-means/medoids
3:02 The next 18 minutes…
3:35 Girvan-Newman algorithm
4:53 Original k-means algorithm
5:10 Original k-medoids algorithm
5:26 Graph k-medoids algorithm
6:09 Graph k-medoids with synthetic data - 1
6:42 Clustering difficulty
7:48 Clustering accuracy
8:54 Challenge 1: Clustering instability
9:56 Graph k-medoids with synthetic data - 2
10:04 Challenge 2: Centrality measures require path information and pathfinding is expensive - 1
10:25 A network structure index (NSI)
11:30 Challenge 2: Centrality measures require path information and pathfinding is expensive - 2
11:54 K-medoids efficiency with NSIs
12:36 Graph k-medoids with synthetic data - 3
13:37 Girvan-Newman with synthetic data
14:49 Real data: Cora - 1
15:29 Real data: Cora - 2
16:37 Real data: IMDb - 1
16:52 Real data: IMDb - 2
17:44 Conclusion
18:21 - Questions

Lecture rating

People found this lecture:
Worth seeing
because it is:
 Valuable and informative
Well presented
Easily understandable
Acceptably recorded
You need to login to cast your vote.

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment: