Challenges in Building Large-Scale Information Retrieval Systems
Description
Building and operating large-scale information retrieval systems used by hundreds of millions of people around the world provides a number of interesting challenges. Designing such systems requires making complex design tradeoffs in a number of dimensions, including (a) the number of user queries that must be handled per second and the response latency to these requests, (b) the number and size of various corpora that are searched, (c) the latency and frequency with which documents are updated or added to the corpora, and (d) the quality and cost of the ranking algorithms that are used for retrieval. In this talk I'll discuss the evolution of Google's hardware infrastructure and information retrieval systems and some of the design challenges that arise from ever-increasing demands in all of these dimensions. I'll also describe how we use various pieces of distributed systems infrastructure when building these retrieval systems. Finally, I'll describe some future challenges and open research problems in this area.
Categories
Top: Computer Science: Information RetrievalTop: Computer Science: Text Mining
Top: Computer Science: Search Engines
Top: Computer Science
| Slides | |
| 0:00 | Challenges in Building Large-Scale Information Retrieval Systems |
| 1:13 | Why Work on Retrieval System? |
| 2:41 | Retrieval System Dimensions |
| 4:05 | 1999 vs. 2009 (1) |
| 4:23 | 1999 vs. 2009 (2) |
| 4:28 | 1999 vs. 2009 (3) |
| 4:32 | 1999 vs. 2009 (4) |
| 4:47 | 1999 vs. 2009 (5) |
| 5:07 | 1999 vs. 2009 (6) |
| 5:10 | 1999 vs. 2009 (7) |
| 5:30 | Constant Change |
| 7:11 | Rest of Talk |
| 8:06 | “Google” Circa 1997 (google.stanford.edu) |
| 9:07 | Research Project, circa 1997 |
| 9:44 | Ways of Index Partitioning (1) |
| 11:18 | Ways of Index Partitioning (2) |
| 12:11 | Ways of Index Partitioning (3) |
| 12:20 | Basic Principles |
| 14:29 | “Corkboards” (1999) |
| 15:37 | Serving System, circa 1999 |
| 16:13 | Caching |
| 18:34 | Crawling (circa 1998-1999) |
| 19:38 | Indexing (circa 1998-1999) |
| 21:02 | Index Updates (circa 1998-1999) (1) |
| 21:33 | Index Updates (circa 1998-1999) (2) |
| 21:47 | Index Updates (circa 1998-1999) (3) |
| 21:55 | Index Updates (circa 1998-1999) (4) |
| 21:57 | Index Updates (circa 1998-1999) (5) |
| 22:03 | Index Updates (circa 1998-1999) (6) |
| 22:06 | Index Updates (circa 1998-1999) (7) |
| 22:39 | Google Data Center (2000) (1) |
| 23:24 | Google Data Center (2000) (2) |
| 23:30 | Google (new data center 2001) |
| 23:42 | Google Data Center (3 days later) |
| 23:56 | Increasing Index Size and Query Capacity |
| 25:00 | Dealing with Growth (1) |
| 25:16 | Dealing with Growth (2) |
| 25:21 | Dealing with Growth (3) |
| 25:25 | Dealing with Growth (4) |
| 25:28 | Dealing with Growth (5) |
| 25:30 | Dealing with Growth (6) |
| 25:32 | Dealing with Growth (7) |
| 25:52 | Implications |
| 26:39 | Index Encoding circa 1997-1999 |
| 27:55 | Encoding Techniques |
| 28:56 | Block-Based Index Format |
| 31:40 | Implications of Ever-Wider Sharding |
| 32:49 | Early 2001: In-Memory Index |
| 34:01 | In-Memory Indexing Systems |
| 37:36 | Larger-Scale Computing |
| 37:58 | Current Machines |
| 38:28 | Serving Design, 2004 edition |
| 40:05 | New Index Format |
| 41:49 | Byte-Aligned Variable-length Encodings (1) |
| 43:07 | Byte-Aligned Variable-length Encodings (2) |
| 44:02 | Group Varint Encoding (1) |
| 44:19 | Group Varint Encoding (2) |
| 44:25 | Group Varint Encoding (3) |
| 44:31 | Group Varint Encoding (4) |
| 44:54 | Group Varint Encoding (5) |
| 45:08 | Group Varint Encoding (6) |
| 45:58 | Group Varint Encoding (7) |
| 46:10 | 2007: Universal Search |
| 49:16 | Index that? Just a minute! |
| 51:23 | Flexibility & Experimentation in IR Systems |
| 52:42 | Infrastructure for Search Systems |
| 54:01 | Experimental Cycle, Part 1 |
| 54:54 | Experimental Cycle, Part 2 |
| 55:09 | Experiment Looks Good: Now What? |
| 56:54 | Future Directions & Challenges |
| 56:58 | Cross-Language Information Retrieval |
| 58:39 | ACLs in Information Retrieval Systems |
| 59:37 | Automatic Construction of Efficient IR Systems |
| 60:54 | Information Extraction from Semi-structured Data |
| 61:48 | In Conclusion... |
| 62:31 | Thanks! Questions...? |
Lecture rating
| People found this lecture: | ||
| Worth seeing | ||
| because it is: | ||
| Valuable and informative | ||
| Well presented | ||
| Easily understandable | ||
| Acceptably recorded | ||
| You need to login to cast your vote. | ||
Report a problem or upload files
If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Related content
Link this page
Would you like to put a link to this lecture on your homepage?Go ahead! Copy the HTML snippet !





This lecture has upgraded my knowledge about search engines :)
من علاقمند به فعالیت در زمینه هوش مصنوعی هستم