Fast Incremental Proximity Search in Large Graphs

author:Purnamrita Sarkar, Machine Learning Department, School of Computer Science, Carnegie Mellon University
published: Aug. 1, 2008,   recorded: July 2008,   views: 207
You might be experiencing some problems with Your Video player.

Slides

Slides
0:00 Fast Incremental Proximity Search in Large Graphs
0:00 Content-based search in databases {1,2}
0:56 Proximity measures on graphs
1:37 Outline - Ranking Problem on graphs
1:54 Random walks (1)
2:00 Random walks (2)
2:02 Random walks (3)
2:06 Random walks (4)
2:22 Hitting time from i to j (1)
2:36 Hitting time from i to j (2)
2:45 Hitting time from i to j (3)
2:52 Random walk-based proximity measures
3:13 Hitting & Commute times: Drawbacks Predictive power
3:58 A better proximity measure based on truncated random walks
4:47 Un-truncated vs truncated hitting time from a node
5:41 Truncated Hitting & Commute times
6:20 Nearest neighbors of a node: Computational complexity of truncated measures
7:34 Outline - Previous work
7:37 GRANCH: computing hitting time to a node
8:20 GRANCH: computing hitting time from a node (1)
8:29 GRANCH: computing hitting time from a node (2)
8:35 GRANCH: Pros & Cons
9:17 Outline - Proposed work
9:25 Proposed work
10:26 Proposed work: sketch
11:30 Outline - Theoretical justification
11:33 Number of nodes with hitting time ≤ τ from the query
12:24 Number of nodes with hitting time ≤ τ to the query (1)
13:09 Number of nodes with hitting time ≤ τ to the query (2)
13:38 Outline - Algorithm sketch
13:48 Algorithm sketch : Combining sampling and dynamic programming
14:48 Sampling for computing hitting time from a node (1)
14:57 Sampling for computing hitting time from a node (2)
14:58 Sampling for computing hitting time from a node (3)
14:59 Sampling for computing hitting time from a node (4)
15:00 Sampling for computing hitting time from a node (5)
15:01 Sampling for computing hitting time from a node (6)
15:26 Sampling for computing hitting time from a node (7)
15:54 Sample complexity bounds
16:51 Outline - Experimental results
16:56 Citeseer graph: Average query processing time
17:33 Keyword-author-citation graphs
18:19 Word Task (1)
18:32 Word Task (2)
18:53 Author Task
19:04 Word Task
19:42 Author Task
20:09 Conclusion
20:40 Acknowledgement
21:03 Conclusion

Related content

Visitors who watched this lecture also watched...

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.

Description

In this paper we investigate two aspects of ranking problems on large graphs. First, we augment the deterministic pruning algorithm in Sarkar and Moore (2007) with sampling techniques to compute approximately correct rankings with high probability under random walk based proximity measures at query time. Second, we prove some surprising locality properties of these proximity measures by examining the short term behavior of random walks. The proposed algorithm can answer queries on the fly without caching any information about the entire graph. We present empirical results on a 600,000 node author-word-citation graph from the Citeseer domain on a single CPU machine where the average query processing time is around 4 seconds. We present quantifiable link prediction tasks. On most of them our techniques outperform Personalized Pagerank, a well-known diffusion based proximity measure.

Link this page  

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: