Efficient SimRank Computation via Linearization
published: Oct. 7, 2014, recorded: August 2014, views: 2347
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
SimRank, proposed by Jeh and Widom, provides a good similarity measure that has been successfully used in numerous applications. While there are many algorithms proposed for computing SimRank, their computational costs are very high.
In this paper, we propose a framework for computing SimRank accurately. Our framework is based on the novel technique, linearized SimRank, which provides efficient algorithms for the single-pair, single-source, and all-pairs SimRank problems, respectively. More specifically, our framework consists of two phases, a preprocessing phase and a query phase. In the preprocessing phase, we estimate a parameter for a given graph and a desired accuracy by Monte Carlo simulation, and then in the query phase, we efficiently solve SimRank problems using this preprocessed parameter. Our algorithms are efficient in both time and space. The preprocessing phase is performed in nearly linear time and the query phase requires only linear time for single-pair and single-source problems; furthermore, the space complexity is linear for both preprocessing and query phase.
We conducted experiments to evaluate our algorithms. For small networks (n ≤ 1,000,000), our algorithm required only a few minutes for preprocessing, and then answered single-pair and single-source queries in 100 and 300 milliseconds, respectively. All-pairs computation was performed in a few days. For large networks (n ≥ 40,000,000), our algorithm required a few hours for preprocessing, and then answered single-pair and single-source queries approximately in 10 seconds and in a half minutes, respectively.
In comparison to previous studies, for the all-pairs problem, our algorithm uses significantly less memory than any existing algorithm; thus it scales for much larger networks. For the single-pair and the single-source problems, our algorithm requires much fewer random samples than any existing Monte Carlo based algorithm for the same accuracy solution.
Download slides: kdd2014_maehara_simrank_computation_01.pdf (112.5 KB)
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !