event thumbnail image
The 5th International Workshop on Mining and Learning with Graphs
Pascal

Mining, Indexing, and Searching Graphs in Large Data Sets

author: Jiawei Han , University of Illinois

Description

Recent research on pattern discovery has progressed from mining frequent itemsets and sequences to mining structured patterns including trees, lattices, and graphs. As a general data structure, graph can model complicated relations among data with wide applications in Web, social network analysis, and bioinformatics. However, mining and searching large graphs in graph databases is challenging due to the presence of an exponential number of frequent subgraphs. In this talk, we present our recent progress on developing efficient and scalable methods for mining and searching of graphs in large databases. We introduce gSpan and CloseGraph, two efficient methods for mining frequent graph patterns in graph databases. Then we introduce constraint-based graph mining methods. Further, we introduce a graph indexing method, gIndex, and a graph approximate searching method, grafil, both taking advantages of frequent graph mining to construct a compact but highly effective graph index and perform similarity search with such indexing structures. These methods not only facilitate mining and querying graph patterns in massive datasets but also claim broad applications in other fields, including DB/OS systems and software engineering.

You might be experiencing some problems with Your Video player.
Slides
0:00 Mining, Indexing & Searching Graphs in Large Data Sets
0:47 Research Papers Covered in this Talk
2:21 Graph, Graph, Everywhere
2:33 Why Graph Mining and Searching?
3:44 Outline
4:15 Graph Pattern Mining
5:09 Example: Frequent Subgraphs
6:11 Frequent Subgraph Mining Approaches
6:47 Properties of Graph Mining Algorithms
7:53 Apriori-Based Approach
8:40 Pattern Growth-Based Span and Pruning
9:51 gSpan(Yan and Han ICDM’02)
10:29 DFS Code
11:00 Graph Pattern Explosion Problem
12:32 Closed Frequent Graphs
13:35 CLOSEGRAPH (Yan & Han, KDD’03)
13:53 Experimental Result
14:10 Discovered Patterns
14:18 Number of Patterns: Frequent vs. Closed
15:05 Runtime: Frequent vs. Closed
15:30 Outline
15:42 Constraint-Based Graph Pattern Mining
17:07 Pattern Pruning vs. Data Pruning
18:11 Pruning Properties Overview
18:57 Pruning Pattern Search Space
21:48 Pruning Data Space (I): Pattern-Separable D-Antimonotonicity
23:48 Pruning Data Space (II): Pattern-Inseparable D-Antimonotonicity
25:03 Graph Constraints: A General Picture
25:54 Outline
26:02 Graph Search: Querying Graph Databases
27:46 Scalability Issue
29:16 Indexing Strategy
30:01 Framework
30:20 Cost Analysis
32:13 Path-Based Approach
32:44 Problems of Path-Based Approach
33:52 gIndex: Indexing Graphs by Data Mining
34:48 IDEAS: Indexing with Two Constraints
35:11 Why Discriminative Subgraphs?
35:22 Discriminative Structures
37:45 Why Frequent Structures?
39:58 Experimental Setting
40:04 Experiments: Index Size
40:42 Experiments: Answer Set Size
41:24 Experiments: Incremental Maintenance
41:41 Outline
41:44 Structure Similarity Search
42:29 Some “Straightforward” Methods
42:43 Index: Precise vs. Approximate Search
43:52 Substructure Similarity Measure
45:17 Substructure Similarity Measure
45:23 Intuition: Feature-Based Similarity Search
45:36 Feature-Graph Matrix
46:49 Edge Relaxation – Feature Misses
46:55 Query Processing Framework
47:21 Performance Study
47:37 Comparison of the Three Algorithms
48:40 Outline
48:49 Graph Search vs. Graph Containment Search
51:14 Example: Graph Search vs. Graph Containment Search
52:04 Different Philosophies in Two Searches
53:24 Contrast Features for C-Search Pruning
53:50 The Basic Framework
54:58 Cost Analysis
55:17 Feature Selection
55:29 Feature-Graph Matrix
56:18 Contrast Graph Matrix
56:58 Training by the Query Log
58:12 Maximum Coverage with Cost
58:23 The Basic Containment Search Index
58:38 The Bottom-Up Hierarchical Index
58:43 The Top-Down Hierarchical Index
58:54 Experiment Setting
59:23 Chemical Descriptor Search
60:07 Hierarchical Indices
60:25 Object Recognition Search
60:38 Conclusions
61:24 Pictures for data mining lab

Lecture rating

People found this lecture:
Worth seeing
because it is:
 Valuable and informative
Well presented
Easily understandable
Acceptably recorded
You need to login to cast your vote.

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Reviews and comments:

Comment1 Ed Garcia, February 22, 2008 at 5:05 p.m.:

At the same time VideoLectures is a great initiative,
it is frustrating...
Simply I don't have a full and pleasure experience to watch any video over there...
Although this ones seems great, interesting,
and it surely is, but
it freeze for 3 to 15 seconds all the time... from here where I am: Brazil!
And you don't give me an option to upload it.
I'm really bored because I'd like and need to watch that...

Thank you.


Write your own review or comment: