Experiences with the Nutch search engine
author:
Doug Cutting,
Yahoo! Research
Description
Nutch is open-source software that implements a web search engine. It has been used in a variety of applications: vertical search engines, archival web search, search engines that incorporate novel metadata, etc. Nutch is itself implemented using Hadoop, an open-source platform for scalable computing. Hadoop facilitates the development and management of applications that run on large numbers of computers and on very large datasets. Hadoop has been demonstrated on clusters with hundreds of computers and is designed to scale to thousands of computers. This talk will present the architecture, capabilities and current status of these two projects.
You might be experiencing some problems with Your Video player.
| Slides | |
| 0:00 | Open Source Platforms for Search |
| 1:33 | What am I? |
| 2:56 | What Distinguishes Open Source? |
| 7:37 | Lucene pre-history: Xerox PARC |
| 8:54 | Lucene pre-History: Apple ATG |
| 10:51 | Lucene pre-History: Excite |
| 14:41 | Digression: Seek versus Transfer pt 1 |
| 16:54 | Digression: Seek versus Transfer pt 2 |
| 20:26 | Lucene History |
| 23:18 | Original Lucene Goals |
| 24:38 | Lucene Architecture |
| 25:16 | Lucene Indexing Algorithm |
| 28:18 | Lucene Indexing Algorithm: notes |
| 28:27 | Lucene Search Algorithms |
| 29:47 | Lucene Status |
| 32:09 | Rapid Adoption Facilitators |
| 35:18 | Lucene Future |
| 37:28 | Nutch |
| 39:25 | Nutch Documents |
| 41:20 | Nutch Queries |
| 44:33 | Query Parsing |
| 47:47 | Nutch Search Performance Tricks |
| 50:16 | Nutch Scalability Goals |
| 51:16 | Scalability |
| 52:50 | Initial Scalability |
| 53:04 | ... but not to billions of pages |
| 53:42 | Hadoop |
| 54:14 | Hadoop's DFS |
| 56:16 | MapReduce |
| 59:37 | MapReduce job processing |
| 60:15 | Hadoop Status |
| 61:37 | Nutch on Hadoop |
| 62:33 | Nutch Status |
| 63:52 | Nutch Future |
| 64:26 | Apache is Community |
| 66:09 | Thanks! |
Lecture rating
| People found this lecture: | ||
| Worth seeing | ||
| because it is: | ||
| Valuable and informative | ||
| Well presented | ||
| Easily understandable | ||
| Acceptably recorded | ||
| You need to login to cast your vote. | ||
Report a problem or upload files
If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Related content
Visitors who watched this lecture also watched...
SEE ALSO:
Link this page
Would you like to put a link to this lecture on your homepage?Go ahead! Copy the HTML snippet !






j'utilise nutch , je l'ai installé sur suze 2.6 mais j'arrive pas encore à compiler la source de nutch sur jbuildre ou ..
Lots of great content here. Go Doug Go!
This video is corrupted. After a few minutes a playing the windows media version, it stops. I've tried RealPlayer and Windows Media Player.
Sorry but he could have skipped on his "ah" "ah" "um" "um", it was annoying as hell