Scalable Link Mining and Analysis on Information Networks

author: Philip S. Yu, Department of Computer Science, College of Engineering, The University of Illinois at Chicago
published: Sept. 18, 2009,   recorded: July 2009,   views: 247
Categories
You might be experiencing some problems with Your Video player.

Slides

Slides
0:00 Scalable Link Mining and Analysis on Information Networks
0:19 Information Networks
2:46 Ubiquitous Graphs and Networks
3:26 Talk Outline - 1
3:47 Data Integration, Cleaning and Validation in Information Networks
4:47 Object Reconciliation by Link Analysis
6:09 Challenges of Object Distinction
7:54 Entity Distinction: The “Wei Wang” Challenge in DBLP
10:08 The DISTINCT Methodology
11:20 Neighbor Tuples
11:51 Similarity 1: Link-Based Similarity
12:20 Example of Random Walk
12:44 Similarity 2: Neighborhood Similarity
13:27 Real Cases
14:26 Distinguishing Different “Wei Wang”s
15:21 Truth Validation by Information Network Analysis
16:39 Conflicting Information on the Web
19:26 Our Problem Setting
21:50 Basic Heuristics for Problem Solving
23:25 Overview of the TruthFinder Method
24:06 Analogy to Authority-Hub Analysis
25:54 Inference on Trustworthness
26:30 Computation Model: t(w) and s(f)
27:09 Experiments: Finding Truth of Facts
29:03 Experiments: Trustable Info Providers
29:55 Summary: Data Integration, Cleaning & Truth Validation by Infonet Analysis
30:42 Talk Outline - 2
30:52 Network Summary and Compression
32:52 Why OLAP Info. Networks?
34:17 Two Kinds of OLAP in Information Networks
36:23 Informational OLAP
38:32 Topological OLAP
39:40 Measures in Infonet OLAP
40:32 Network OLAP Operations
41:32 Measure Classification
42:44 Optimizations
44:11 Talk Outline - 3
44:18 Link-Based Clustering: Start from Link-Based Similarity (SimRank)
45:10 Link-Based Similarities: SimRank
45:40 Observation 1: Hierarchical Structures
46:35 Observation 2: Distribution of Similarity
47:30 Our Data Structure: SimTree
47:49 Similarity Defined by SimTree
48:52 Overview of LinkClus
49:23 Initialization of SimTrees
50:07 (continued)
50:31 Updating Similarities Between Nodes
51:03 Adjusting SimTree Structures
51:11 Complexity
51:25 Empirical Study
51:32 Experiment Setup
51:47 Accuracy
52:23 Email Dataset
52:54 - Questions

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.
 
    Delicious Bibliography

Description

With the ubiquity of information networks and their broad applications, there have been numerous studies on the construction, online analytical processing, and mining of information networks in multiple disciplines, including social network analysis, World-Wide Web, database systems, data mining, machine learning, and networked communication and information systems. Algorithms like PageRank and HITS have been developed in late 1990s to explore links among Web pages to discover authoritative pages and hubs. Links have also been popularly used in citation analysis and social network analysis. However, there is a lack of systematic treatment on how to fully explore the power of links in scalable data analysis. In this talk, the power of links are examined in details to improve the effectiveness and efficiency of typical data analysis tasks, including information integration, on-line analytic processing, and other interesting data mining tasks, especially in the multi-relational databases and/or the World-Wid e Web environments.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: