Web Mining

Published on 2011-08-238040 Views

Aris Gionis

Ricardo Baeza-Yates

The Web continues to grow and evolve very fast, changing our daily lives. This activity represents the collaborative work of the millions of institutions and people that contribute content to the Web

IJCAI 2011 - Barcelona

Related categories

Watch other parts:

Presentation

An Introduction to Web Mining00:00

An introduction to web mining00:00

Contents of the tutorial00:01

Motivation00:55

Internet and the Web Today00:56

Different Views on Data01:57

Query-log mining - 102:03

Query-log mining - 202:09

Query graphs03:38

Different ways to relate queries03:53

The Web04:00

The Different Facets of the Web05:48

The click graph – implicit knowledge – webslang05:49

The click graph 07:32

The Structure of the Web10:07

The query-flow graph - 110:23

Query-flow graph - 111:45

Web Mining13:38

The query-flow graph - 213:42

What for?15:49

A Few Examples16:54

Web in Spain17:21

Building the query-flow graph17:41

Query-flow graph - 219:28

Structure Macro Dynamics20:37

Application: session segmentation22:20

Structure Micro Dynamics22:32

Size Evolution25:06

Recommendations using the query-flow graph - 130:33

Recommendations using the query-flow graph - 231:05

Query recommendations31:11

The general theme31:23

Mirror of the Society32:33

Exports/Imports vs. Domain Links34:38

Example : apple35:18

The Wisdom of Crowds - 135:21

flickr38:01

Flickr: Geo-tagged pictures38:59

The Wisdom of Crowds - 240:31

Example : jeep42:07

Example : banana → apple42:07

The Long Tail42:12

Heavy tail of user interests - 142:50

Example : beatles → apple43:27

Heavy tail of user interests - 243:55

Why the heavy tail matters44:35

Recommendations as shortcuts to QFG44:36

The Wisdom of Crowds45:33

Community structure - 146:49

What is in the Web? - 147:26

What is in the Web? - 247:40

Spam is an Economic Activity48:07

Current challenges (1)48:21

Current challenges (2)48:31

Content match = meeting of Publishers, Advertisers, Users48:32

Contextual ads - 149:01

Contextual ads - 249:23

Click spam49:51

Other Possible Ad Spam50:28

Internet UGC (User Generated Content)50:29

Simple acts create value and opportunity50:39

Community Dynamics50:51

Community structure - 251:02

Community Geography: Live Journal bloggers in US51:41

LJ bloggers world-wide51:47

Who are they?51:50

Small diameter52:16

The Process53:02

Data Recollection53:42

Crawling - 154:33

Measurements on real graphs54:59

Crawling Goals55:16

Crawling - 256:31

Random graphs - 156:36

Random graphs - 257:12

Crawling - 357:29

Other properties57:58

Properties of evolving graphs59:01

Software Architecture - 101:00:05

Software Architecture - 201:00:08

Algorithmic tools01:00:45

Software Architecture - 301:01:40

Formal Problem01:01:42

Crawling Heuristics - 101:02:15

Efficiency considerations01:02:33

Hashing and sketching01:02:51

Locality sensitive hashing01:05:38

Crawling Heuristics - 201:06:00

No Historical Information01:06:41

Locality sensitive hashing: example01:07:03

Historical Information01:07:58

Validation in the Greek domain01:08:17

Locality sensitive hashing: Hamming distance - 101:08:54

Data Cleaning01:08:56

Locality sensitive hashing: Hamming distance - 201:09:06

Locality sensitive hashing: Hamming distance - 301:09:33

Locality sensitive hashing: Hamming distance - 401:09:34

Locality sensitive hashing: Hamming distance - 501:10:12

Locality sensitive hashing: Hamming distance - 601:10:12

Data Processing01:11:19

Data Characteristics01:13:29

Example: Yahoo!01:13:41

Crawled Data01:14:20

Produced data01:14:57

Observed Data01:15:28

Homework - 101:17:30

Quantity & Quality01:17:54

Jaccard coefficient01:18:02

Min-wise independent permutations01:19:36

Homework - 201:21:55

Homework - 301:22:23

Computing statistics on data streams - 101:22:25

Computing statistics on data streams - 201:26:59

E stimating the number of distinct values (F0) - 201:27:32

Estimating number of distinct values (F0)01:31:20

Estimator theorem01:31:21

Applications of the algorithmic tools to real scenarios01:31:23

Web Mining

Aris Gionis

Ricardo Baeza-Yates

IJCAI 2011 - Barcelona

Related categories

Watch other parts:

Web Mining

Aris Gionis,

Ricardo Baeza-Yates

Web Mining

Aris Gionis,

Ricardo Baeza-Yates

Presentation