Challenges in Building Large-Scale Information Retrieval Systems

author: Jeffrey Dean, Google
published: March 12, 2009,   recorded: February 2009,   views: 11918
Categories
You might be experiencing some problems with Your Video player.

Slides

Slides
0:00 Challenges in Building Large-Scale Information Retrieval Systems
1:13 Why Work on Retrieval System?
2:41 Retrieval System Dimensions
4:05 1999 vs. 2009 (1)
4:23 1999 vs. 2009 (2)
4:28 1999 vs. 2009 (3)
4:32 1999 vs. 2009 (4)
4:47 1999 vs. 2009 (5)
5:07 1999 vs. 2009 (6)
5:10 1999 vs. 2009 (7)
5:30 Constant Change
7:11 Rest of Talk
8:06 “Google” Circa 1997 (google.stanford.edu)
9:07 Research Project, circa 1997
9:44 Ways of Index Partitioning (1)
11:18 Ways of Index Partitioning (2)
12:11 Ways of Index Partitioning (3)
12:20 Basic Principles
14:29 “Corkboards” (1999)
15:37 Serving System, circa 1999
16:13 Caching
18:34 Crawling (circa 1998-1999)
19:38 Indexing (circa 1998-1999)
21:02 Index Updates (circa 1998-1999) (1)
21:33 Index Updates (circa 1998-1999) (2)
21:47 Index Updates (circa 1998-1999) (3)
21:55 Index Updates (circa 1998-1999) (4)
21:57 Index Updates (circa 1998-1999) (5)
22:03 Index Updates (circa 1998-1999) (6)
22:06 Index Updates (circa 1998-1999) (7)
22:39 Google Data Center (2000) (1)
23:24 Google Data Center (2000) (2)
23:30 Google (new data center 2001)
23:42 Google Data Center (3 days later)
23:56 Increasing Index Size and Query Capacity
25:00 Dealing with Growth (1)
25:16 Dealing with Growth (2)
25:21 Dealing with Growth (3)
25:25 Dealing with Growth (4)
25:28 Dealing with Growth (5)
25:30 Dealing with Growth (6)
25:32 Dealing with Growth (7)
25:52 Implications
26:39 Index Encoding circa 1997-1999
27:55 Encoding Techniques
28:56 Block-Based Index Format
31:40 Implications of Ever-Wider Sharding
32:49 Early 2001: In-Memory Index
34:01 In-Memory Indexing Systems
37:36 Larger-Scale Computing
37:58 Current Machines
38:28 Serving Design, 2004 edition
40:05 New Index Format
41:49 Byte-Aligned Variable-length Encodings (1)
43:07 Byte-Aligned Variable-length Encodings (2)
44:02 Group Varint Encoding (1)
44:19 Group Varint Encoding (2)
44:25 Group Varint Encoding (3)
44:31 Group Varint Encoding (4)
44:54 Group Varint Encoding (5)
45:08 Group Varint Encoding (6)
45:58 Group Varint Encoding (7)
46:10 2007: Universal Search
49:16 Index that? Just a minute!
51:23 Flexibility & Experimentation in IR Systems
52:42 Infrastructure for Search Systems
54:01 Experimental Cycle, Part 1
54:54 Experimental Cycle, Part 2
55:09 Experiment Looks Good: Now What?
56:54 Future Directions & Challenges
56:58 Cross-Language Information Retrieval
58:39 ACLs in Information Retrieval Systems
59:37 Automatic Construction of Efficient IR Systems
60:54 Information Extraction from Semi-structured Data
61:48 In Conclusion...
62:31 Thanks! Questions...?

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.
 
    Delicious Bibliography

Description

Building and operating large-scale information retrieval systems used by hundreds of millions of people around the world provides a number of interesting challenges. Designing such systems requires making complex design tradeoffs in a number of dimensions, including (a) the number of user queries that must be handled per second and the response latency to these requests, (b) the number and size of various corpora that are searched, (c) the latency and frequency with which documents are updated or added to the corpora, and (d) the quality and cost of the ranking algorithms that are used for retrieval.

In this talk I'll discuss the evolution of Google's hardware infrastructure and information retrieval systems and some of the design challenges that arise from ever-increasing demands in all of these dimensions. I'll also describe how we use various pieces of distributed systems infrastructure when building these retrieval systems.

Finally, I'll describe some future challenges and open research problems in this area.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Reviews and comments:

Comment1 samiul jahan, October 7, 2009 at 5:27 p.m.:

This lecture has upgraded my knowledge about search engines :)


Comment2 azam, November 8, 2009 at 9:58 a.m.:

من علاقمند به فعالیت در زمینه هوش مصنوعی هستم

Write your own review or comment:

make sure you have javascript enabled or clear this field: