Improving Quality of Training Data for Learning to Rank Using Click-Through Data
published: Oct. 12, 2010, recorded: February 2010, views: 3430
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
In information retrieval, relevance of documents with respect to queries is usually judged by humans, and used in evaluation and/or learning of ranking functions. Previous work has shown that certain level of noise in relevance judgments has little effect on evaluation, especially for comparison purposes. Recently learning to rank has become one of the major means to create ranking models in which the models are automatically learned from the data derived from a large number of relevance judgments. As far as we know, there was no previous work about quality of training data for learning to rank, and this paper tries to study the issue. Specifically, we address three problems. Firstly, we show that the quality of training data labeled by humans has critical impact on the performance of learning to rank algorithms. Secondly, we propose detecting relevance judgment errors using click-through data accumulated at a search engine. Two discriminative models, referred to as sequential dependency model and full dependency model, are proposed to make the detection. Both models consider the conditional dependency of relevance labels and thus are more powerful than the conditionally independent model previously proposed for other tasks. Finally, we verify that using training data in which the errors are detected and corrected by our method, we can improve the performance of learning to rank algorithms.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !