The Minimum Code Length for Clustering Using the Gray Code

author: Mahito Sugiyama, Graduate School of Informatics, Kyoto University
published: Oct. 3, 2011,   recorded: September 2011,   views: 52
Categories
You might be experiencing some problems with Your Video player.

Slides

Slides
0:00 The Minimum Code Length for Clustering Using the Gray Code
0:01 Contributions
1:07 Demonstration (Synthetic Dataset)
1:17 G-COOL
1:32 K-means
1:43 Results (Real datasets) - 1
2:01 Results (Real datasets) - 2
2:34 Outline (1)
2:36 Outline (2)
2:40 Clustering Focusing on Compression
3:56 Our Strategy (1)
4:20 Our Strategy (2)
5:35 Outline (3)
5:46 MCL (Minimum Code Length) - 1
6:05 MCL (Minimum Code Length) - 2
6:37 Binary Encoding
7:10 MCL with Binary Encoding (1)
7:19 MCL with Binary Encoding (2)
7:24 MCL with Binary Encoding (3)
7:34 MCL with Binary Encoding (4)
7:44 MCL with Binary Encoding (5)
8:02 MCL with Binary Encoding (6)
8:07 MCL with Binary Encoding (7)
8:14 MCL with Binary Encoding (8)
8:19 MCL with Binary Encoding (9)
8:24 MCL with Binary Encoding (10)
8:47 Definition of MCL
8:53 Minimizing MCL and Clustering
9:21 Outline (4)
9:28 Optimization by COOL
9:58 COOL with Binary Encoding (1)
10:12 COOL with Binary Encoding (2)
10:19 COOL with Binary Encoding (3)
10:36 COOL with Binary Encoding (4)
11:02 COOL with Binary Encoding (5)
11:09 COOL with Binary Encoding (6)
11:21 COOL with Binary Encoding (7)
11:24 COOL with Binary Encoding (8)
11:30 Noise Filtering by COOL (1)
11:51 Noise Filtering by COOL (2)
11:57 Algorithm of COOL
12:01 Outline (5)
12:07 Gray Code
12:53 Gray Code Embedding
13:23 COOL with Gray Code (G-COOL) - 1
13:30 COOL with Gray Code (G-COOL) - 2
13:45 COOL with Gray Code (G-COOL) - 3
14:04 COOL with Gray Code (G-COOL) - 4
14:12 COOL with Gray Code (G-COOL) - 5
14:17 COOL with Gray Code (G-COOL) - 6
14:20 COOL with Gray Code (G-COOL) - 7
14:23 COOL with Gray Code (G-COOL) - 8
14:30 COOL with Binary Encoding (9)
14:35 COOL with Gray Code (G-COOL) - 8
14:40 COOL with Binary Encoding (9)
14:42 Theoretical Analysis of G-COOL
15:21 Demonstration of G-COOL
15:37 Outline (6)
15:39 Experimental Methods
15:56 Results (Synthetic datasets) (1)
16:13 Results (Synthetic datasets) (2)
16:23 Results (Synthetic datasets) (3)
16:56 Results (Synthetic datasets) (4)
17:10 Results (Synthetic datasets) (5)
17:20 Results (Synthetic datasets) (6)
17:29 Results (Real datasets) (1)
17:33 Results (Real datasets) (2)
17:38 Results (Real datasets) (3)
17:39 Outline (7)
17:41 Conclusion

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.
 
    Delicious Bibliography

Description

We propose new approaches to exploit compression algorithms for clustering numerical data. Our first contribution is to design a measure that can score the quality of a given clustering result under the light of a fixed encoding scheme. We call this measure the Minimum Code Length (MCL). Our second contribution is to propose a general strategy to translate any encoding method into a cluster algorithm, which we call COOL (COding-Oriented cLustering). COOL has a low computational cost since it scales linearly with the data set size. The clustering results of COOL is also shown to minimize MCL. To illustrate further this approach, we consider the Gray Code as the encoding scheme to present G-COOL. G-COOL can find clusters of arbitrary shapes and remove noise. Moreover, it is robust to change in the input parameters; it requires only two lower bounds for the number of clusters and the size of each cluster, whereas most algorithms for finding arbitrarily shaped clusters work well only if all parameters are tuned appropriately. G-COOL is theoretically shown to achieve internal cohesion and external isolation and is experimentally shown to work well for both synthetic and real data sets.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: