The "Real World" Web Search Problem
published: Dec. 3, 2007, recorded: September 2007, views: 482
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
There are numerous papers which present methods to address web-search related challenges such as relevance and ranking, query processing, and classication. Unfortunately, many of these methods are ineective in a large-scale commer- cial setting, despite statistically signicant experimental results. To help bridge this gap between academic and commercial settings, this lecture examines the components of large-scale commercial search engines, then proposes ve classes of problems encountered by researchers in this area - biases; bad or dierent assumptions about statistics, users, queries or web contents; insucient or miss- ing data; inconsistencies related to evaluations and objectives; and policies or external factors, including resource limitations. Using real stories and personal experiences, the lecture illustrates examples of these problems, along with a few proposed approaches to deal with or reduce their consequences or eects. In addition to the classes of problems, there are several fundamental prop- erties of the web that are often not considered suciently when performing experiments or dening problems, resulting in unrealistic experiments or ob- jectives. Even within a search engine, overlooking key properties such as the non-stationarity of the users and the web, can result in ineective evaluations, and may even lead to failed subsystems. Fortunately, very simple approaches can often be highly eective. This lec- ture helps put context on how commercial search engines work, what problems they face, what eective solutions require, and how evaluations and problem denitions could be changed to more eectively predict success in a commercial setting - while still retaining interest of researchers.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !