Mining, Indexing, and Searching Graphs in Large Data Sets
Description
Recent research on pattern discovery has progressed from mining frequent itemsets and sequences to mining structured patterns including trees, lattices, and graphs. As a general data structure, graph can model complicated relations among data with wide applications in Web, social network analysis, and bioinformatics. However, mining and searching large graphs in graph databases is challenging due to the presence of an exponential number of frequent subgraphs.
In this talk, we present our recent progress on developing efficient and scalable methods for mining and searching of graphs in large databases. We introduce gSpan and CloseGraph, two efficient methods for mining frequent graph patterns in graph databases. Then we introduce constraint-based graph mining methods. Further, we introduce a graph indexing method, gIndex, and a graph approximate searching method, grafil, both taking advantages of frequent graph mining to construct a compact but highly effective graph index and perform similarity search with such indexing structures. These methods not only facilitate mining and querying graph patterns in massive datasets but also claim broad applications in other fields, including DB/OS systems and software engineering.
| Slides | |
| 0:00 | Mining, Indexing & Searching Graphs in Large Data Sets |
| 0:47 | Research Papers Covered in this Talk |
| 2:21 | Graph, Graph, Everywhere |
| 2:33 | Why Graph Mining and Searching? |
| 3:44 | Outline |
| 4:15 | Graph Pattern Mining |
| 5:09 | Example: Frequent Subgraphs |
| 6:11 | Frequent Subgraph Mining Approaches |
| 6:47 | Properties of Graph Mining Algorithms |
| 7:53 | Apriori-Based Approach |
| 8:40 | Pattern Growth-Based Span and Pruning |
| 9:51 | gSpan(Yan and Han ICDM’02) |
| 10:29 | DFS Code |
| 11:00 | Graph Pattern Explosion Problem |
| 12:32 | Closed Frequent Graphs |
| 13:35 | CLOSEGRAPH (Yan & Han, KDD’03) |
| 13:53 | Experimental Result |
| 14:10 | Discovered Patterns |
| 14:18 | Number of Patterns: Frequent vs. Closed |
| 15:05 | Runtime: Frequent vs. Closed |
| 15:30 | Outline |
| 15:42 | Constraint-Based Graph Pattern Mining |
| 17:07 | Pattern Pruning vs. Data Pruning |
| 18:11 | Pruning Properties Overview |
| 18:57 | Pruning Pattern Search Space |
| 21:48 | Pruning Data Space (I): Pattern-Separable D-Antimonotonicity |
| 23:48 | Pruning Data Space (II): Pattern-Inseparable D-Antimonotonicity |
| 25:03 | Graph Constraints: A General Picture |
| 25:54 | Outline |
| 26:02 | Graph Search: Querying Graph Databases |
| 27:46 | Scalability Issue |
| 29:16 | Indexing Strategy |
| 30:01 | Framework |
| 30:20 | Cost Analysis |
| 32:13 | Path-Based Approach |
| 32:44 | Problems of Path-Based Approach |
| 33:52 | gIndex: Indexing Graphs by Data Mining |
| 34:48 | IDEAS: Indexing with Two Constraints |
| 35:11 | Why Discriminative Subgraphs? |
| 35:22 | Discriminative Structures |
| 37:45 | Why Frequent Structures? |
| 39:58 | Experimental Setting |
| 40:04 | Experiments: Index Size |
| 40:42 | Experiments: Answer Set Size |
| 41:24 | Experiments: Incremental Maintenance |
| 41:41 | Outline |
| 41:44 | Structure Similarity Search |
| 42:29 | Some “Straightforward” Methods |
| 42:43 | Index: Precise vs. Approximate Search |
| 43:52 | Substructure Similarity Measure |
| 45:17 | Substructure Similarity Measure |
| 45:23 | Intuition: Feature-Based Similarity Search |
| 45:36 | Feature-Graph Matrix |
| 46:49 | Edge Relaxation – Feature Misses |
| 46:55 | Query Processing Framework |
| 47:21 | Performance Study |
| 47:37 | Comparison of the Three Algorithms |
| 48:40 | Outline |
| 48:49 | Graph Search vs. Graph Containment Search |
| 51:14 | Example: Graph Search vs. Graph Containment Search |
| 52:04 | Different Philosophies in Two Searches |
| 53:24 | Contrast Features for C-Search Pruning |
| 53:50 | The Basic Framework |
| 54:58 | Cost Analysis |
| 55:17 | Feature Selection |
| 55:29 | Feature-Graph Matrix |
| 56:18 | Contrast Graph Matrix |
| 56:58 | Training by the Query Log |
| 58:12 | Maximum Coverage with Cost |
| 58:23 | The Basic Containment Search Index |
| 58:38 | The Bottom-Up Hierarchical Index |
| 58:43 | The Top-Down Hierarchical Index |
| 58:54 | Experiment Setting |
| 59:23 | Chemical Descriptor Search |
| 60:07 | Hierarchical Indices |
| 60:25 | Object Recognition Search |
| 60:38 | Conclusions |
| 61:24 | Pictures for data mining lab |
Lecture rating
| People found this lecture: | ||
| Worth seeing | ||
| because it is: | ||
| Valuable and informative | ||
| Well presented | ||
| Easily understandable | ||
| Acceptably recorded | ||
| You need to login to cast your vote. | ||
Report a problem or upload files
If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Related content
SEE ALSO:
Link this page
Would you like to put a link to this lecture on your homepage?Go ahead! Copy the HTML snippet !





At the same time VideoLectures is a great initiative,
it is frustrating...
Simply I don't have a full and pleasure experience to watch any video over there...
Although this ones seems great, interesting,
and it surely is, but
it freeze for 3 to 15 seconds all the time... from here where I am: Brazil!
And you don't give me an option to upload it.
I'm really bored because I'd like and need to watch that...
Thank you.