event thumbnail image
The 25th International Conference on Machine Learning (ICML 2008)

Fast Incremental Proximity Search in Large Graphs

author: Purnamrita Sarkar, +Machine Learning Department; School of Computer Science; Carnegie Mellon University

Description

In this paper we investigate two aspects of ranking problems on large graphs. First, we augment the deterministic pruning algorithm in Sarkar and Moore (2007) with sampling techniques to compute approximately correct rankings with high probability under random walk based proximity measures at query time. Second, we prove some surprising locality properties of these proximity measures by examining the short term behavior of random walks. The proposed algorithm can answer queries on the fly without caching any information about the entire graph. We present empirical results on a 600,000 node author-word-citation graph from the Citeseer domain on a single CPU machine where the average query processing time is around 4 seconds. We present quantifiable link prediction tasks. On most of them our techniques outperform Personalized Pagerank, a well-known diffusion based proximity measure.

You might be experiencing some problems with Your Video player.
Slides
0:00 Fast Incremental Proximity Search in Large Graphs
0:00 Content-based search in databases {1,2}
0:56 Proximity measures on graphs
1:37 Outline - Ranking Problem on graphs
1:54 Random walks (1)
2:00 Random walks (2)
2:02 Random walks (3)
2:06 Random walks (4)
2:22 Hitting time from i to j (1)
2:36 Hitting time from i to j (2)
2:45 Hitting time from i to j (3)
2:52 Random walk-based proximity measures
3:13 Hitting & Commute times: Drawbacks Predictive power
3:58 A better proximity measure based on truncated random walks
4:47 Un-truncated vs truncated hitting time from a node
5:41 Truncated Hitting & Commute times
6:20 Nearest neighbors of a node: Computational complexity of truncated measures
7:34 Outline - Previous work
7:37 GRANCH: computing hitting time to a node
8:20 GRANCH: computing hitting time from a node (1)
8:29 GRANCH: computing hitting time from a node (2)
8:35 GRANCH: Pros & Cons
9:17 Outline - Proposed work
9:25 Proposed work
10:26 Proposed work: sketch
11:30 Outline - Theoretical justification
11:33 Number of nodes with hitting time ≤ τ from the query
12:24 Number of nodes with hitting time ≤ τ to the query (1)
13:09 Number of nodes with hitting time ≤ τ to the query (2)
13:38 Outline - Algorithm sketch
13:48 Algorithm sketch : Combining sampling and dynamic programming
14:48 Sampling for computing hitting time from a node (1)
14:57 Sampling for computing hitting time from a node (2)
14:58 Sampling for computing hitting time from a node (3)
14:59 Sampling for computing hitting time from a node (4)
15:00 Sampling for computing hitting time from a node (5)
15:01 Sampling for computing hitting time from a node (6)
15:26 Sampling for computing hitting time from a node (7)
15:54 Sample complexity bounds
16:51 Outline - Experimental results
16:56 Citeseer graph: Average query processing time
17:33 Keyword-author-citation graphs
18:19 Word Task (1)
18:32 Word Task (2)
18:53 Author Task
19:04 Word Task
19:42 Author Task
20:09 Conclusion
20:40 Acknowledgement
21:03 Conclusion

Lecture rating

People found this lecture:
Worth seeing
because it is:
 Valuable and informative
Well presented
Easily understandable
Acceptably recorded
You need to login to cast your vote.

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment: