Experiences with the Nutch search engine
published: Feb. 25, 2007, recorded: July 2006, views: 4136
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Nutch is open-source software that implements a web search engine. It has been used in a variety of applications: vertical search engines, archival web search, search engines that incorporate novel metadata, etc. Nutch is itself implemented using Hadoop, an open-source platform for scalable computing. Hadoop facilitates the development and management of applications that run on large numbers of computers and on very large datasets. Hadoop has been demonstrated on clusters with hundreds of computers and is designed to scale to thousands of computers. This talk will present the architecture, capabilities and current status of these two projects.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !