Architecture Conscious Data Analysis: Progress and Future Outlook

author: Srinivasan Parthasarathy, Ohio State University
published: Dec. 29, 2007,   recorded: December 2007,   views: 3658

Related Open Educational Resources

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.


Over the past several years, architectural innovation in processor design has led to new capabilities in single-chip commodity processing and high end compute clusters. Examples include hardware prefetching, simultaneous multithreading (SMT), and more recently true chip multiprocessing. At the very high-end, systems area networking technologies like InfiniBand have spurred the development of affordable cluster-based supercomputers capable of storing and managing peta bytes of data. We contend that data mining and machine learning algorithms which often require significant computational, I/O and communication resources, stand to benefit from such innovations if appropriately leveraged. The challenges to do so are daunting.
First, a large number of state-of-the-art data mining algorithms grossly under-utilize modern processors, the building blocks of current generation commodity clusters. This is due to the widening gap between processor and memory performance and the memory and I/O intensive nature of these applications. Second, the emergence of multi-core architectures to the commodity market, bring with them further complications. Key challenges brought to the fore include the need to enhance available fine-grained parallelism and to alleviate memory bandwidth pressure. Third, parallelizing data mining algorithms on a multi-level cluster environment is a challenge given the need to share and communicate large sets of data and to balance the workload in the presence of data skew.
In this talk I will discuss progress made in the context of these challenges and attempt to demonstrate that ``architecture conscious" solutions are both viable and necessary. I will attempt to separate general methodologies and techniques from specific instantiations whenever it makes sense. We will conclude with a discussion on future outlook, both in the context of systems support for next generation algorithms as well as in terms of educational objectives brought to the fore in this context.
This is joint work with my graduate students Gregory Buehrer, Amol Ghoting and Shirish Tatikonda.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: