Web Mining

author: Aris Gionis, Yahoo! Research Barcelona
author: Ricardo Baeza-Yates, Yahoo! Research
published: Aug. 23, 2011,   recorded: July 2011,   views: 569
Categories
You might be experiencing some problems with Your Video player.

Slides

Slides
0:00 An Introduction to Web Mining
0:01 Contents of the tutorial
0:55 Motivation
0:56 Internet and the Web Today
1:57 Different Views on Data
4:00 The Web
5:48 The Different Facets of the Web
10:07 The Structure of the Web
13:38 Web Mining
15:49 What for?
16:54 A Few Examples
17:21 Web in Spain
20:37 Structure Macro Dynamics
22:32 Structure Micro Dynamics
24:43 Structure Macro Dynamics
24:56 Structure Micro Dynamics
25:06 Size Evolution
32:33 Mirror of the Society
34:38 Exports/Imports vs. Domain Links
35:21 The Wisdom of Crowds - 1
38:01 flickr
38:59 Flickr: Geo-tagged pictures
40:31 The Wisdom of Crowds - 2
41:16 Flickr: Geo-tagged pictures
41:26 The Wisdom of Crowds - 2
42:12 The Long Tail
42:50 Heavy tail of user interests - 1
43:55 Heavy tail of user interests - 2
44:35 Why the heavy tail matters
45:33 The Wisdom of Crowds
47:26 What is in the Web? - 1
47:40 What is in the Web? - 2
48:07 Spam is an Economic Activity
48:21 Current challenges (1)
48:31 Current challenges (2)
48:32 Content match = meeting of Publishers, Advertisers, Users
49:01 Contextual ads - 1
49:23 Contextual ads - 2
49:51 Click spam
50:28 Other Possible Ad Spam
50:29 Internet UGC (User Generated Content)
50:39 Simple acts create value and opportunity
50:51 Community Dynamics
51:41 Community Geography: Live Journal bloggers in US
51:47 LJ bloggers world-wide
51:50 Who are they?
53:02 The Process
53:42 Data Recollection
54:33 Crawling - 1
55:16 Crawling Goals
56:31 Crawling - 2
57:29 Crawling - 3
60:05 Software Architecture - 1
60:08 Software Architecture - 2
61:40 Software Architecture - 3
61:42 Formal Problem
62:15 Crawling Heuristics - 1
66:00 Crawling Heuristics - 2
66:41 No Historical Information
67:58 Historical Information
68:17 Validation in the Greek domain
68:56 Data Cleaning
71:19 Data Processing
73:29 Data Characteristics
73:41 Example: Yahoo!
74:20 Crawled Data
74:57 Produced data
75:28 Observed Data
77:54 Quantity & Quality

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.
 
    Delicious Bibliography

 Watch videos:   (click on thumbnail to launch)

Watch Part 1
Part 1 1:20:20
!NOW PLAYING
Watch Part 2
Part 2 1:32:11

Description

The Web continues to grow and evolve very fast, changing our daily lives. This activity represents the collaborative work of the millions of institutions and people that contribute content to the Web as well as the one billion people that use it. In this ocean of hyperlinked data there is explicit and implicit information and knowledge. Web Mining is the task of analyzing this data and extracting information and knowledge for many different purposes. The data comes in three main flavors: content (text, images, etc.), structure (hyperlinks) and usage (navigation, queries, etc.), implying different techniques such as text, graph or log mining. Each case reflects the wisdom of some group of people that can be used to make the Web better, for example, user generated tags in Web 2.0 sites. In this tutorial we will walk through the mining process and will show several applications, ranging from Web site design to search engines. The main goal is to introduce AI researchers to the myriad of challenges in Web mining, where other AI techniques, in addition to machine learning, might be applicable.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: