Hadoop-ML: An Infrastructure for the Rapid Implementation of Parallel Reusable Analytics

author: Edwin Pednault, IBM Thomas J. Watson Research Center
published: Jan. 19, 2010,   recorded: December 2009,   views: 8573

Related Open Educational Resources

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.


Hadoop is an open-source implementation of Google's Map-Reduce programming model. Over the past few years, it has evolved into a popular platform for parallelization in industry and academia. Furthermore, trends suggest that Hadoop will likely be the analytics platform of choice on forthcoming Cloud-based systems. Unfortunately, implementing parallel machine learning/data mining (ML/DM) algorithms on Hadoop is complex and time consuming. To address this challenge, we present Hadoop-ML, an infrastructure to facilitate the implementation of parallel ML/DM algorithms on Hadoop. Hadoop-ML has been designed to allow for the specification of both task-parallel and data-parallel ML/DM algorithms. Furthermore, it supports the composition of parallel ML/DM algorithms using both serial as well as parallel building blocks -- this allows one to write reusable parallel code. The proposed abstraction eases the implementation process by requiring the user to only specify computations and their dependencies, without worrying about scheduling, data management, and communication. As a consequence, the codes are portable in that the user never needs to write Hadoop-specific code. This potentially allows one to leverage future parallelization platforms without rewriting one's code.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: