event thumbnail image
NATO Advanced Study Institute on Mining Massive Data Sets for Security

The "Real World" Web Search Problem

author: Eric Glover, SearchMe.com

Description

There are numerous papers which present methods to address web-search related challenges such as relevance and ranking, query processing, and classi cation. Unfortunately, many of these methods are ine ective in a large-scale commer- cial setting, despite statistically signi cant experimental results. To help bridge this gap between academic and commercial settings, this lecture examines the components of large-scale commercial search engines, then proposes ve classes of problems encountered by researchers in this area - biases; bad or di erent assumptions about statistics, users, queries or web contents; insucient or miss- ing data; inconsistencies related to evaluations and objectives; and policies or external factors, including resource limitations. Using real stories and personal experiences, the lecture illustrates examples of these problems, along with a few proposed approaches to deal with or reduce their consequences or e ects. In addition to the classes of problems, there are several fundamental prop- erties of the web that are often not considered suciently when performing experiments or de ning problems, resulting in unrealistic experiments or ob- jectives. Even within a search engine, overlooking key properties such as the non-stationarity of the users and the web, can result in ine ective evaluations, and may even lead to failed subsystems. Fortunately, very simple approaches can often be highly e ective. This lec- ture helps put context on how commercial search engines work, what problems they face, what e ective solutions require, and how evaluations and problem de nitions could be changed to more e ectively predict success in a commercial setting - while still retaining interest of researchers.

You might be experiencing some problems with Your Video player.
Slides
0:00 The Real World Web Search Problem: Bridging The Gap Between Academic and Commercial Understanding of Issues and Methods
0:44 Overview and Objectives
1:51 Commercial Plug
2:36 About the Speaker: Dr. Eric Glover
3:37 A True Story
4:32 Talk Flow - Part 1 - theoretical search
4:51 What is a Web Search Engine? (1)
5:30 What is a Web Search Engine? (2)
5:46 What is a Web Search Engine? (3)
6:33 What is a Web Search Engine? (4)
7:16 Search Engine Theory - Crawler
8:16 Search Engine Theory - Indexer
9:25 Search Engine Theory - Relevant Set
12:25 Search Engine Theory - Ranking (Theory)
14:09 What is missing (theory)?
15:55 What is a Web Search Engine?
18:20 Good References
19:28 Part II - Theory Gets Disconnected
21:21 Commercial Search Engine != 10 Blue links
22:40 Important Properties Of Commercial Web Search
28:33 Separating Commercial Web Search from Theory
30:28 Simple Theory vs Cold Reality - Crawling (1)
35:15 Simple Theory vs Cold Reality - Crawling (2)
40:02 Simple Theory vs Cold Reality - Indexing (1)
46:24 Simple Theory vs Cold Reality - Indexing (2)
47:32 Simple Theory vs Cold Reality - Query Processing
53:13 Relevance - Theory
54:20 Relevance - Theory Problem 1: Duplicates
57:40 Relevance - Theory Problem 2: Marginal Value
59:33 Relevance - Theory Problem 3: UI
64:14 Relevance - Theory != Reality
66:37 Relevance - Considerations
67:51 Relevance - Current Approximations
69:09 Relevance: Academic Measures (1)
69:38 Relevance: Academic Measures (2)
70:05 Relevance - How to Evaluate (data)
71:06 “Relevance” - How to Evaluate
71:38 “Relevance” - Concerns
74:26 Relevance - What are the goals?
74:34 Relevance - Challenges
74:56 And the lecture moves on...
75:16 Problems Facing Researchers (an example)
76:13 Problems Facing Researchers (example cont)
77:27 STATISTICS STATISTICS STATISTICS
79:14 Five Important Classes of Problems Faced by Many Researchers
81:05 Biases (1)
85:06 Biases (2)
86:04 Problem: Assumptions (Statistics)
87:42 Problem: Assumptions
90:27 Data (insufficient or missing)
91:44 (Inconsistent) Evaluations and Objectives
92:33 Evaluation Example
94:27 Policies and External Factors
94:35 Dealing With Problems/Challenges
96:46 Dealing With Problems an Example
98:43 How To Improve Things
99:07 Evaluation Example
99:20 How To Improve Things
103:27 How To Improve Things (cont)
104:49 How To Improve Things - cont...(1)
105:50 How To Improve Things - cont... (2)
107:49 General Advice
108:23 Commercial Plug
111:17 - Questions

Lecture rating

People found this lecture:
Worth seeing
because it is:
 Valuable and informative
Well presented
Easily understandable
Acceptably recorded
You need to login to cast your vote.

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment: