Constraint-Driven Clustering
Description
Clustering methods can be either data-driven or need-driven. Data-driven methods intend to discover the true structure of the underlying data while need-driven methods aims at organizing the true structure to meet certain application requirements. Thus, need-driven (e.g. constrained) clustering is able to find more useful and actionable clusters in applications such as energy aware sensor networks, privacy preservation, and market segmentation. However, the existing methods of constrained clustering require users to provide the number of clusters, which is often unknown in advance, but has a crucial impact on the clustering result. In this paper, we argue that a more natural way to generate actionable clusters is to let the application-specific constraints decide the number of clusters. For this purpose, we introduce a novel cluster model, Constraint-Driven Clustering (CDC), which finds an a priori unspecified number of compact clusters that satisfy all user-provided constraints. Two general types of constraints are considered, i.e. minimum significance constraints and minimum variance constraints, as well as combinations of these two types. We prove the NP-hardness of the CDC problem with different constraints. We propose a novel dynamic data structure, the CD-Tree, which organizes data points in leaf nodes such that each leaf node approximately satisfies the CDC constraints and minimizes the objective function. Based on CD-Trees, we develop an efficient algorithm to solve the new clustering problem. Our experimental evaluation on synthetic and real datasets demonstrates the quality of the generated clusters and the scalability of the algorithm.
| Slides | |
| 0:00 | Constraint-Driven Clustering |
| 0:17 | Introduction |
| 1:01 | Capturing Application Needs |
| 1:59 | Constraint-Driven Clustering |
| 2:56 | Motivation - Energy Aware Sensor Networks |
| 4:28 | Motivation - Privacy Preservation |
| 5:13 | Related Work |
| 6:24 | Constraint-Driven Clustering (CDC) |
| 6:59 | Theoretical Results |
| 7:54 | Heuristic Algorithm |
| 8:48 | CD-Tree |
| 9:54 | CD-Tree vs. CF-Tree and R*-Tree |
| 10:37 | Algorithm |
| 12:29 | Experimental Results |
| 13:15 | Results on Synthetic data set |
| 13:34 | Results on Letter data set |
| 13:50 | Conclusion & Future work |
| 14:46 | Reference |
| 14:50 | Reference |
| 14:53 | Thanks |
Lecture rating
| People found this lecture: | ||
| Worth seeing | ||
| because it is: | ||
| Valuable and informative | ||
| Well presented | ||
| Easily understandable | ||
| Acceptably recorded | ||
| You need to login to cast your vote. | ||
Report a problem or upload files
If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Related content
SEE ALSO:
Link this page
Would you like to put a link to this lecture on your homepage?Go ahead! Copy the HTML snippet !




