event thumbnail image
International Workshop on Intelligent Information Access

Experiences with the Nutch search engine

author: Doug Cutting, Yahoo! Research

Description

Nutch is open-source software that implements a web search engine. It has been used in a variety of applications: vertical search engines, archival web search, search engines that incorporate novel metadata, etc. Nutch is itself implemented using Hadoop, an open-source platform for scalable computing. Hadoop facilitates the development and management of applications that run on large numbers of computers and on very large datasets. Hadoop has been demonstrated on clusters with hundreds of computers and is designed to scale to thousands of computers. This talk will present the architecture, capabilities and current status of these two projects.

You might be experiencing some problems with Your Video player.
Slides
0:00 Open Source Platforms for Search
1:33 What am I?
2:56 What Distinguishes Open Source?
7:37 Lucene pre-history: Xerox PARC
8:54 Lucene pre-History: Apple ATG
10:51 Lucene pre-History: Excite
14:41 Digression: Seek versus Transfer pt 1
16:54 Digression: Seek versus Transfer pt 2
20:26 Lucene History
23:18 Original Lucene Goals
24:38 Lucene Architecture
25:16 Lucene Indexing Algorithm
28:18 Lucene Indexing Algorithm: notes
28:27 Lucene Search Algorithms
29:47 Lucene Status
32:09 Rapid Adoption Facilitators
35:18 Lucene Future
37:28 Nutch
39:25 Nutch Documents
41:20 Nutch Queries
44:33 Query Parsing
47:47 Nutch Search Performance Tricks
50:16 Nutch Scalability Goals
51:16 Scalability
52:50 Initial Scalability
53:04 ... but not to billions of pages
53:42 Hadoop
54:14 Hadoop's DFS
56:16 MapReduce
59:37 MapReduce job processing
60:15 Hadoop Status
61:37 Nutch on Hadoop
62:33 Nutch Status
63:52 Nutch Future
64:26 Apache is Community
66:09 Thanks!

Lecture rating

People found this lecture:
Worth seeing
because it is:
 Valuable and informative
Well presented
Easily understandable
Acceptably recorded
You need to login to cast your vote.

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Reviews and comments:

Comment1 lassaad, August 27, 2007 at 10:40 a.m.:

j'utilise nutch , je l'ai installé sur suze 2.6 mais j'arrive pas encore à compiler la source de nutch sur jbuildre ou ..


Comment2 Chris Lunt, June 12, 2008 at 1:35 a.m.:

Lots of great content here. Go Doug Go!


Comment3 Mike, August 30, 2008 at 6:30 p.m.:

This video is corrupted. After a few minutes a playing the windows media version, it stops. I've tried RealPlayer and Windows Media Player.


Comment4 kb, November 27, 2008 at 6:56 a.m.:

Sorry but he could have skipped on his "ah" "ah" "um" "um", it was annoying as hell

Write your own review or comment:

make sure you have javascript enabled or clear this field: