Web-Scale Image Clustering Revisited

author: Yannis Kalantidis, Yahoo! Research Silicon Valley
published: Feb. 10, 2016,   recorded: December 2015,   views: 1773

Related Open Educational Resources

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.


Large scale duplicate detection, clustering and mining of documents or images has been conventionally treated with seed detection via hashing, followed by seed growing heuristics using fast search. Principled clustering methods, especially kernelized and spectral ones, have higher complexity and are difficult to scale above millions. Under the assumption of documents or images embedded in Euclidean space, we revisit recent advances in approximate k-means variants, and borrow their best ingredients to introduce a new one, inverted-quantized k-means(IQ-means). Key underlying concepts are quantization of data points and multi-index based inverted search from centroids to cells. Its quantization is a form of hashing and analogous to seed detection, while its updates are analogous to seed growing, yet principled in the sense of distortion minimization. We further design a dynamic variant that is able to determine the number of clusters k in a single run at nearly zero additional cost. Combined with powerful deep learned representations, we achieve clustering of a 100 million image collection on a single machine in less than one hour.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Reviews and comments:

Comment1 Emma, August 22, 2021 at 12:42 p.m.:

One of the most important tasks in the development of web design https://mediaonemarketing.com.sg/webs... of a resource is the implementation of its function, such as communication. In other words, the design of the site should be such that the user who finds himself here at his request not only does not leave it, but would like to get acquainted in more detail with the content of the resource and other information offered by him

Write your own review or comment:

make sure you have javascript enabled or clear this field: