Revisiting Globally Sorted Indexes for Efficient Document Retrieval
published: Oct. 21, 2010, recorded: February 2010, views: 2990
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
There has been a large amount of research on efficient document retrieval in both IR and web search areas. One important technique to improve retrieval efficiency is early termination, which speeds up query processing by avoiding scanning the entire inverted lists. Most early termination techniques first build new inverted indexes by sorting the inverted lists in the order of either the term-dependent information, e.g., term frequencies or term IR scores, or the term-independent information, e.g., static rank of the document; and then apply appropriate retrieval strategies on the resulting indexes. Although the methods based only on the static rank have been shown to be ineffective for the early termination, there are still many advantages of using the methods based on term-independent information. In this paper, we propose new techniques to organize inverted indexes based on the term- independent information beyond static rank and study the new retrieval strategies on the resulting indexes. We perform a detailed experimental evaluation on our new techniques and compare them with the existing approaches. Our results on the TREC GOV and GOV2 data sets show that our techniques can improve query efficiency significantly.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !