Fast and Accurate k-means For Large Datasets

Published on 2012-01-258490 Views

Michael Shindler

Clustering is a popular problem with many applications. We consider the $k$-means problem in the situation where the data is too large to be stored in main memory and must be accessed sequentially, su

NIPS Conference 2011 - Granada

Related categories

Presentation

Fast and Accurate k-means for Large Data Sets00:00

K-means Clustering00:17

Algorithms for solving k-means01:08

K-means for Large Datasets01:55

Streaming k-means02:08

Improvements/Contributions04:05

More relevant algorithms for streaming k-means05:17

Experimental Setup05:34

Time to Compute Solution06:11

Cost (Summed Squared)06:23

Bottleneck in Algorithm Runtime06:41

Compute Actual Distance to Those Neighbors08:03

Substantially Faster08:15

Cost change is (usually) minor08:42

Conclusion08:55

Acknowledgments09:37