event thumbnail image
NATO Advanced Study Institute on Mining Massive Data Sets for Security

Web Spam Detection

author: Carlos Castillo, Universita di Roma "La Sapienza"
author: Marcin Sydow, Polish-Japanese Institute of Information Technology

Description

Web spam can significantly deteriorate the quality of search engine results. Thus there is a large incentive for commercial search engines to detect spam pages efficiently and accurately. This talk presents spam detection systems that combine link-based and content-based features, and use the topology of the Web graph by exploiting the link dependencies among the Web pages.

You might be experiencing some problems with Your Video player.
Slides
0:00 Fighting Web Spam
0:29 Collaborators
0:56 Contents
1:43 Part I - Introduction
1:45 Search Engines
1:46 The Web today (1)
1:56 The Web today (2)
2:02 The Web today (3)
2:26 The Web today (4)
2:31 The Web today (5)
2:42 The Web today (6)
2:43 The Web today (7)
2:56 Searching information - among the top Web activities (1)
3:09 Searching information - among the top Web activities (2)
3:18 Searching information - among the top Web activities (3)
3:29 Searching information - among the top Web activities (4)
3:38 Why search engines? (1)
3:54 Why search engines? (2)
3:58 Why search engines? (3)
4:13 Some available statistics (1)
4:35 Some available statistics (2)
4:49 Some available statistics (3)
5:21 Some available statistics (4)
5:45 Search Engine Architecture
7:15 Search Engines - seemingly simple task (1)
7:17 Search Engines - seemingly simple task (2)
8:25 Crawler architecture
9:12 Ranking
11:35 Ranking System
13:03 Text-based Ranking - classic IR approach
15:27 WWW-specic issues concerning text analysis
16:43 A Remedy - Link Analysis
17:57 Example: PageRank - Basic Idea of Authority Flow
19:04 PageRank Equations
20:15 PageRank - summary
22:03 Web Spam
22:08 A bit of Web Economics. . . (1)
22:21 A bit of Web Economics. . . (2)
23:22 Advertising Market Shares (USA, 2006)
23:59 Internet Advertising (USA, 2006)
24:58 The Central Role of Search Engines in WWW
25:52 What is Spam?
27:00 Spam is destructive (1)
27:08 Spam is destructive (2)
27:15 Spam is destructive (3)
27:26 Spam is destructive (4)
27:34 Spam vs SEO
28:38 Spam taxonomy (1)
28:47 Spam taxonomy (2)
28:49 Spam taxonomy (3)
29:08 Spam taxonomy (4)
29:21 Spam taxonomy (5)
29:49 Spam techniques (1)
31:02 Spam techniques (2)
32:34 Spam techniques (3)
32:51 Naïve Web Spam
33:07 Hidden text
33:26 Made for Advertising
33:58 Search engine?
34:14 Fake search engine
34:27 Normal content in link farms
34:47 Cloaking
35:27 Redirection
36:41 Redirects using Javascript
37:11 Problem: obfuscated code
37:43 Problem: really obfuscated code
38:06 Fighting Spam (1)
38:31 Fighting Spam (2)
38:34 Fighting Spam (3)
38:35 Fighting Spam (4)
38:41 Fighting Spam (5)
38:59 Fighting Spam (6)
39:10 Fighting Spam (7)
39:30 Fighting Spam (8)
39:42 ML can help greatly (1)
40:10 ML can help greatly (2)
40:15 ML can help greatly (3)
40:19 ML can help greatly (4)
40:31 ML can help greatly (5)
40:43 ML can help greatly (6)
41:06 Part II - Reference Corpus & State of the Art

Lecture rating

People found this lecture:
Worth seeing
because it is:
 Valuable and informative
Well presented
Easily understandable
Acceptably recorded
You need to login to cast your vote.

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.

 Watch videos:   (click on thumbnail to launch)

Watch Part 1
Part 1 0:45:25
Flash video Windows Media video

!NOW PLAYING
Watch Part 2
Part 2 0:43:03
Windows Media video
Watch Part 3
Part 3 0:18:39
Flash video Windows Media video

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment: