event thumbnail image
The 25th International Conference on Machine Learning (ICML 2008)

Self-taught Clustering

author: Wenyuan Dai, Shanghai Jiao Tong University

Description

This paper focuses on a new clustering task, called self-taught clustering. Self-taught clustering is an instance of unsupervised transfer learning, which aims at clustering a small collection of target unlabeled data with the help of a large amount of auxiliary unlabeled data. The target and auxiliary data can be different in topic distribution. We show that even when the target data are not sufficient to allow effective learning of a high quality feature representation, it is possible to learn the useful features with the help of the auxiliary data on which the target data can be clustered effectively. We propose a co-clustering based self-taught clustering algorithm to tackle this problem, by clustering the target and auxiliary data simultaneously to allow the feature representation from the auxiliary data to influence the target data through a common set of features. Under the new data representation, clustering on the target data can be improved. Our experiments on image clustering show that our algorithm can greatly outperform several state-of-the-art clustering methods when utilizing irrelevant unlabeled auxiliary data.

You might be experiencing some problems with Your Video player.
Slides
0:00 Self-taught Clustering – an instance of Transfer Unsupervised Learning
0:16 Outline
0:40 Outline - Motivation
0:43 Clustering
1:16 When the data are sparse, …
1:31 Can sparse data be clustered well?
2:08 A good representation may help
3:02 Can Transfer Learning help?
3:38 Transfer Learning can help
3:57 Outline - Self-taught Clustering
4:07 Problem Definition
4:43 Self-taught Learning
5:44 Self-taught Clustering
6:12 Transfer Learning
6:37 Transfer Unsupervised Learning
7:03 Outline - Algorithm
7:05 Self-taught Clustering via Co-clustering
8:48 Objective Function
9:42 Optimization (1)
10:33 Optimization (2)
10:48 Outline - Experiments
10:52 Data Sets
11:27 Evaluation Criterion
11:35 Experimental Results
13:25 Outline - Conclusion
13:28 Conclusion
14:23 Question?
15:23 - Questions

Lecture rating

People found this lecture:
Worth seeing
because it is:
 Valuable and informative
Well presented
Easily understandable
Acceptably recorded
You need to login to cast your vote.

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: