PSkip: Estimating relevance ranking quality from web search clickthrough data

author: Kuansan Wang, Microsoft Research
published: Sept. 14, 2009,   recorded: June 2009,   views: 4750

Related Open Educational Resources

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.


In this article, we report our efforts in mining the information encoded as clickthrough data in the server logs to evaluate and monitor the relevance ranking quality of a commercial web search engine. We describe a metric called pSkip that aims to quantify the ranking quality by estimating the probability of users encountering non relevant results that cost them the efforts to read and skip. A search engine with a lower pSkip is regarded as having a better ranking quality. A key design goal of pSkip is to integrate the findings from two sets of user studies that utilize eye-tracking devices to track users browsing patterns on the search result pages, and that use specially instrumented browsers to actively solicit users explicit judgments on their search activities. We present the derivation of the maximum likelihood estimation of pSkip and demonstrate its efficacy in describing the user study data. The mathematical properties of pSkip are further analyzed and compared with several objective metrics as well as the cumulated gain method that uses subjective judgments. Experimental data show that pSkip can measure aspects of the search quality that these existing metrics are not designed or fail to address, such as identifying the real search intents expressed in the ambiguous queries. Although effective and superior in many ways, we also report a series of experiments that show pSkip may be influenced by system issues that are not directly related to relevance ranking, suggesting that measurements complementary to pSkip are still needed in order to form a holistic and accurate characterization of the ranking quality.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: