Machine Learning Software in Practice: Quo Vadis?

author: Szilard Pafka, Epoch
published: Oct. 9, 2017,   recorded: August 2017,   views: 2297

Related Open Educational Resources

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.


Due to the hype in our industry in the last couple of years, there is a growing mismatch between software tools machine learning practitioners wish for, what they would truly need for their work, what's available (either commercially or open source) and what tool developers and researchers focus on. In this talk we will give a couple of examples of this mismatch. Several surveys and anecdotal evidence show that most practitioners work most of the time (at least in the modeling phase) with datasets that t in the RAM of a single server, therefore distributed computing tools are very of- ten overkill. Our benchmarks (available on github [1]) of the most widely used open source tools for binary classification (various implementations of algorithms such as linear methods, random forests, gradient boosted trees and neural networks) on such data show over 10x speed and over 10x RAM usage difference between various tools, with "big data" tools being the most inefficient. Significant performance gains have been obtained by those tools that incorporate various low-level (close to CPU and memory architecture) optimizations. Nevertheless, we will show that even the best tools show degrading performance on the multi-socket servers featuring a high number of cores, systems that have become widely accessible more recently. Finally, while most of this talk is about performance, we will also argue that machine learning tools that feature high-level easy-to-use APIs provide increasing productivity for practitioners and therefore are preferable.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: