Distributed Data Mining
published: Feb. 25, 2007, recorded: July 2005, views: 1900
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Data mining is the automated analysis of large volumes of data looking for relationships and knowledge that are implicit in data. Data mining and knowledge discovery in large amounts of data can benefit from the use of parallel and distributed computational environments to improve both performance and quality of data selection. The goal of this tutorial is to provide researchers and practitioners with an introduction to mining large data sets by exploiting techniques from high performance parallel and distributed computing.
This tutorial is organized in two parts. In the first part an introduction to high performance parallel and distributed computing is provided. Different forms of parallelism that can be exploited in data mining techniques and algorithms are analyzed. The second part presents a review of distributed data mining approaches. For each data mining technique, different ways for parallel implementation are presented and discussed. Furthermore, parallel and distributed data mining systems and algorithms are discussed. Finally, current research issues and perspectives in high-performance data mining are outlined.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !