Fast and Accurate k-means For Large Datasets

author: Michael Shindler, Oregon State University
published: Jan. 25, 2012,   recorded: December 2011,   views: 8430


Related Open Educational Resources

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.


Clustering is a popular problem with many applications. We consider the $k$-means problem in the situation where the data is too large to be stored in main memory and must be accessed sequentially, such as from a disk, and where we must use as little memory as possible. Our algorithm is based on recent theoretical results, with significant improvements to make it practical. Our approach greatly simplifies a recently developed algorithm, both in design and in analysis, and eliminates large constant factors in the approximation guarantee, the memory requirements, and the running time. We then incorporate approximate nearest neighbor search to compute $k$-means in $o(nk)$ (where $n$ is the number of data points; note that computing the cost, given a solution, takes $\Theta(nk)$ time). We show that our algorithm compares favorably to existing algorithms - both theoretically and experimentally, thus providing state-of-the-art performance in both theory and practice.

See Also:

Download slides icon Download slides: nips2011_shindler_largedatasets_01.pdf (949.4┬áKB)

Help icon Streaming Video Help

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Reviews and comments:

Comment1 Marko, September 2, 2014 at 11:34 a.m.:

Can Streaming k-means be used for on-line clustering, such as that the first step is done in the same manner for each point that gets into the system, and then the ball k-means is performed on the new centroid set?

Comment2 Thanos, November 25, 2018 at 9:29 a.m.:

Thanks for the share this post here it is amazing you need to getting here free desktop clock windows 10 this is the batter to all windows users working so got the hurry.

Write your own review or comment:

make sure you have javascript enabled or clear this field: