Exploiting Duality in Summarization with Deterministic Guarantees
Description
Summarization is an important task in data mining. A major challenge over the past years has been the efficient construction of fixed-space synopses that provide a deterministic quality guarantee, often expressed in terms of a maximum-error metric. Histograms and several hierarchical techniques have been proposed for this problem. However, their time and/or space complexities remain impractically high and depend not only on the data set size n, but also on the space budget B. These handicaps stem from a requirement to tabulate all allocations of synopsis space to different regions of the data. In this paper we develop an alternative methodology that dispels these deficiencies, thanks to a fruitful application of the solution to the dual problem: given a maximum allowed error, determine the minimum-space synopsis that achieves it. These complexity advantages offer both a spaceefficiency and a scalability that previous approaches lacked. We verify the benefits of our approach in practice by experimentation.
| Slides | |
| 0:00 | The Impact of Duality on Data Synopsis Problems |
| 0:19 | Introduction |
| 2:41 | Outline |
| 3:26 | Histograms-part01 |
| 8:00 | Histograms-part02 |
| 10:35 | Histograms-part03 |
| 12:30 | Restricted Haar Wavelet Synopses-part01 |
| 14:34 | Restricted Haar Wavelet Synopses-part02 |
| 15:42 | Unrestricted Haar and Haar+ Synopses-part01 |
| 17:05 | Unrestricted Haar and Haar+ Synopses-part02 |
| 17:48 | Experiments: Histograms, Time vs. n |
| 17:58 | Experiments: Histograms, Time vs. B |
| 18:11 | Experiments: Haar Wavelets, Time vs. n |
| 18:38 | Experiments: Haar Wavelets, Time vs. B |
| 19:08 | Conclusions |
| 19:51 | Related Work |
| 20:04 | Thank you! Questions? |
Lecture rating
| People found this lecture: | ||
| Worth seeing | ||
| because it is: | ||
| Valuable and informative | ||
| Well presented | ||
| Easily understandable | ||
| Acceptably recorded | ||
| You need to login to cast your vote. | ||
Report a problem or upload files
If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Related content
SEE ALSO:
Link this page
Would you like to put a link to this lecture on your homepage?Go ahead! Copy the HTML snippet !


