Finding low-entropy sets and trees from binary data
Description
The discovery of subsets with special properties from binary data has been one of the key themes in pattern discovery. Pattern classes such as frequent itemsets stress the co-occurrence of the value 1 in the data. While this choice makes sense in the context of sparse binary data, it disregards potentially interesting subsets of attributes that have some other type of dependency structure. We consider the problem of finding all subsets of attributes that have low complexity. The complexity is measured by either the entropy of the projection of the data on the subset, or the entropy of the data for the subset when modeled using a Bayesian tree, with downward or upward pointing edges. We show that the entropy measure on sets has a monotonicity property, and thus a levelwise approach can find all low-entropy itemsets. We also show that the treebased measures are bounded above by the entropy of the corresponding itemset, allowing similar algorithms to be used for finding low-entropy trees. We describe algorithms for finding all subsets satisfying an entropy condition. We give an extensive empirical evaluation of the performance of the methods both on synthetic and on real data. We also discuss the search for high-entropy subsets and the computation of the Vapnik-Chervonenkis dimension of the data.
| Slides | |
| 0:03 | Finding low-entropy sets and trees from binary data |
| 0:14 | Summary (1) |
| 1:00 | Summary (2) |
| 1:35 | Summary (3) |
| 1:48 | Outline of the talk |
| 1:58 | Low entropy attribute sets |
| 2:31 | Example |
| 3:20 | Problem definition |
| 3:47 | D-trees |
| 5:02 | Example of a D-tree |
| 5:18 | U-trees |
| 5:32 | Problem definitions |
| 5:45 | Algorithm for finding low-entropy trees |
| 6:56 | Experimental results |
| 7:43 | Experimental results: Course data pt 1 |
| 8:19 | Experimental results: Course data pt 2 |
| 9:12 | Related work |
| 10:27 | Concluding remarks |
Lecture rating
| People found this lecture: | ||
| Worth seeing | ||
| because it is: | ||
| Valuable and informative | ||
| Well presented | ||
| Easily understandable | ||
| Acceptably recorded | ||
| You need to login to cast your vote. | ||
Report a problem or upload files
If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Related content
SEE ALSO:
Link this page
Would you like to put a link to this lecture on your homepage?Go ahead! Copy the HTML snippet !





