Fast Incremental Proximity Search in Large Graphs
Description
In this paper we investigate two aspects of ranking problems on large graphs. First, we augment the deterministic pruning algorithm in Sarkar and Moore (2007) with sampling techniques to compute approximately correct rankings with high probability under random walk based proximity measures at query time. Second, we prove some surprising locality properties of these proximity measures by examining the short term behavior of random walks. The proposed algorithm can answer queries on the fly without caching any information about the entire graph. We present empirical results on a 600,000 node author-word-citation graph from the Citeseer domain on a single CPU machine where the average query processing time is around 4 seconds. We present quantifiable link prediction tasks. On most of them our techniques outperform Personalized Pagerank, a well-known diffusion based proximity measure.
| Slides | |
| 0:00 | Fast Incremental Proximity Search in Large Graphs |
| 0:00 | Content-based search in databases {1,2} |
| 0:56 | Proximity measures on graphs |
| 1:37 | Outline - Ranking Problem on graphs |
| 1:54 | Random walks (1) |
| 2:00 | Random walks (2) |
| 2:02 | Random walks (3) |
| 2:06 | Random walks (4) |
| 2:22 | Hitting time from i to j (1) |
| 2:36 | Hitting time from i to j (2) |
| 2:45 | Hitting time from i to j (3) |
| 2:52 | Random walk-based proximity measures |
| 3:13 | Hitting & Commute times: Drawbacks Predictive power |
| 3:58 | A better proximity measure based on truncated random walks |
| 4:47 | Un-truncated vs truncated hitting time from a node |
| 5:41 | Truncated Hitting & Commute times |
| 6:20 | Nearest neighbors of a node: Computational complexity of truncated measures |
| 7:34 | Outline - Previous work |
| 7:37 | GRANCH: computing hitting time to a node |
| 8:20 | GRANCH: computing hitting time from a node (1) |
| 8:29 | GRANCH: computing hitting time from a node (2) |
| 8:35 | GRANCH: Pros & Cons |
| 9:17 | Outline - Proposed work |
| 9:25 | Proposed work |
| 10:26 | Proposed work: sketch |
| 11:30 | Outline - Theoretical justification |
| 11:33 | Number of nodes with hitting time ≤ τ from the query |
| 12:24 | Number of nodes with hitting time ≤ τ to the query (1) |
| 13:09 | Number of nodes with hitting time ≤ τ to the query (2) |
| 13:38 | Outline - Algorithm sketch |
| 13:48 | Algorithm sketch : Combining sampling and dynamic programming |
| 14:48 | Sampling for computing hitting time from a node (1) |
| 14:57 | Sampling for computing hitting time from a node (2) |
| 14:58 | Sampling for computing hitting time from a node (3) |
| 14:59 | Sampling for computing hitting time from a node (4) |
| 15:00 | Sampling for computing hitting time from a node (5) |
| 15:01 | Sampling for computing hitting time from a node (6) |
| 15:26 | Sampling for computing hitting time from a node (7) |
| 15:54 | Sample complexity bounds |
| 16:51 | Outline - Experimental results |
| 16:56 | Citeseer graph: Average query processing time |
| 17:33 | Keyword-author-citation graphs |
| 18:19 | Word Task (1) |
| 18:32 | Word Task (2) |
| 18:53 | Author Task |
| 19:04 | Word Task |
| 19:42 | Author Task |
| 20:09 | Conclusion |
| 20:40 | Acknowledgement |
| 21:03 | Conclusion |
Lecture rating
| People found this lecture: | ||
| Worth seeing | ||
| because it is: | ||
| Valuable and informative | ||
| Well presented | ||
| Easily understandable | ||
| Acceptably recorded | ||
| You need to login to cast your vote. | ||
Report a problem or upload files
If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Related content
Link this page
Would you like to put a link to this lecture on your homepage?Go ahead! Copy the HTML snippet !





