Web Spam Detection
author:
Marcin Sydow,
Department of Intelligent Systems, Web Mining Lab, Polish-Japanese Institute of Information Technology
author: Carlos Castillo, Sapienza University of Rome
author: Carlos Castillo, Sapienza University of Rome
Description
Web spam can significantly deteriorate the quality of search
engine results. Thus there is a large incentive for commercial search engines
to detect spam pages efficiently and accurately. This talk presents
spam detection systems that combine link-based and content-based features,
and use the topology of the Web graph by exploiting the link
dependencies among the Web pages.
You might be experiencing some problems with Your Video player.
| Slides | |
| 0:00 | Fighting Web Spam |
| 0:29 | Collaborators |
| 0:56 | Contents |
| 1:43 | Part I - Introduction |
| 1:45 | Search Engines |
| 1:46 | The Web today (1) |
| 1:56 | The Web today (2) |
| 2:02 | The Web today (3) |
| 2:26 | The Web today (4) |
| 2:31 | The Web today (5) |
| 2:42 | The Web today (6) |
| 2:43 | The Web today (7) |
| 2:56 | Searching information - among the top Web activities (1) |
| 3:09 | Searching information - among the top Web activities (2) |
| 3:18 | Searching information - among the top Web activities (3) |
| 3:29 | Searching information - among the top Web activities (4) |
| 3:38 | Why search engines? (1) |
| 3:54 | Why search engines? (2) |
| 3:58 | Why search engines? (3) |
| 4:13 | Some available statistics (1) |
| 4:35 | Some available statistics (2) |
| 4:49 | Some available statistics (3) |
| 5:21 | Some available statistics (4) |
| 5:45 | Search Engine Architecture |
| 7:15 | Search Engines - seemingly simple task (1) |
| 7:17 | Search Engines - seemingly simple task (2) |
| 8:25 | Crawler architecture |
| 9:12 | Ranking |
| 11:35 | Ranking System |
| 13:03 | Text-based Ranking - classic IR approach |
| 15:27 | WWW-specic issues concerning text analysis |
| 16:43 | A Remedy - Link Analysis |
| 17:57 | Example: PageRank - Basic Idea of Authority Flow |
| 19:04 | PageRank Equations |
| 20:15 | PageRank - summary |
| 22:03 | Web Spam |
| 22:08 | A bit of Web Economics. . . (1) |
| 22:21 | A bit of Web Economics. . . (2) |
| 23:22 | Advertising Market Shares (USA, 2006) |
| 23:59 | Internet Advertising (USA, 2006) |
| 24:58 | The Central Role of Search Engines in WWW |
| 25:52 | What is Spam? |
| 27:00 | Spam is destructive (1) |
| 27:08 | Spam is destructive (2) |
| 27:15 | Spam is destructive (3) |
| 27:26 | Spam is destructive (4) |
| 27:34 | Spam vs SEO |
| 28:38 | Spam taxonomy (1) |
| 28:47 | Spam taxonomy (2) |
| 28:49 | Spam taxonomy (3) |
| 29:08 | Spam taxonomy (4) |
| 29:21 | Spam taxonomy (5) |
| 29:49 | Spam techniques (1) |
| 31:02 | Spam techniques (2) |
| 32:34 | Spam techniques (3) |
| 32:51 | Naïve Web Spam |
| 33:07 | Hidden text |
| 33:26 | Made for Advertising |
| 33:58 | Search engine? |
| 34:14 | Fake search engine |
| 34:27 | Normal content in link farms |
| 34:47 | Cloaking |
| 35:27 | Redirection |
| 36:41 | Redirects using Javascript |
| 37:11 | Problem: obfuscated code |
| 37:43 | Problem: really obfuscated code |
| 38:06 | Fighting Spam (1) |
| 38:31 | Fighting Spam (2) |
| 38:34 | Fighting Spam (3) |
| 38:35 | Fighting Spam (4) |
| 38:41 | Fighting Spam (5) |
| 38:59 | Fighting Spam (6) |
| 39:10 | Fighting Spam (7) |
| 39:30 | Fighting Spam (8) |
| 39:42 | ML can help greatly (1) |
| 40:10 | ML can help greatly (2) |
| 40:15 | ML can help greatly (3) |
| 40:19 | ML can help greatly (4) |
| 40:31 | ML can help greatly (5) |
| 40:43 | ML can help greatly (6) |
| 41:06 | Part II - Reference Corpus & State of the Art |
Lecture rating
| People found this lecture: | ||
| Worth seeing | ||
| because it is: | ||
| Valuable and informative | ||
| Well presented | ||
| Easily understandable | ||
| Acceptably recorded | ||
| You need to login to cast your vote. | ||
Report a problem or upload files
If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Related content
Visitors who watched this lecture also watched...
SEE ALSO:
Link this page
Would you like to put a link to this lecture on your homepage?Go ahead! Copy the HTML snippet !



