event thumbnail image
Solomonovi seminarji

Model Compression

author: Rich Caruana, Cornell University

Description

Decision trees are intelligible, but do they perform well enough that you should use them? Have SVMs replaced neural nets, or are neural nets still best for regression, and SVMs best for classification? Boosting maximizes margins similar to SVMs, but can boosting compete with SVMs? And if it does compete, is it better to boost weak models, as theory might suggest, or to boost stronger models? Bagging is simpler than boosting -- how well does bagging stack up against boosting? Breiman said Random Forests are better than bagging and as good as boosting. Was he right? And what about old friends like logistic regression, KNN, and naive bayes? Should they be relegated to the history books, or do they still fill important niches?
In this talk we compare the performance of ten supervised learning methods on nine criteria: Accuracy, F-score, Lift, Precision/Recall Break-Even Point, Area under the ROC, Average Precision, Squared Error, Cross-Entropy, and Probability Calibration. The results show that no one learning method does it all, but some methods can be "repaired" so that they do very well across all performance metrics. In particular, we show how to obtain the best probabilities from max margin methods such as SVMs and boosting via Platt's Method and isotonic regression. We then describe a new ensemble method that combines select models from these ten learning methods to yield much better performance. Although these ensembles perform extremely well, they are too complex for many applications. We'll describe what we're doing to try to fix that. Finally, if time permits, we'll discuss how the nine performance metrics relate to each other, and which of them you probably should (or shouldn't) use.
During this talk I'll briefly describe the learning methods and performance metrics to help make the lecture accessible to non-specialists in machine learning.

You might be experiencing some problems with Your Video player.
Slides
0:00 Model Compression
0:39 Outline
1:57 Supervised Learning
3:40 Normalized Scores for ES
23:06 Ensemble Selection Works, But Is It Worth It?
24:22 Computational Cost
25:16 Ensemble Selection
25:38 Best Ensembles are Big & Ugly!
26:38 Best Ensembles are Big & Slow!
28:25 Can’t we make the ensembles smaller, faster, and easier to use by eliminating some base-level models?
28:37 What Models are Used in Ensembles?
30:25 What Models are Used in Ensembles?
32:14 Summary of Models Used by ES
33:10 Motivation: Model Compression
34:53 Solution: Model Compression
37:01 Why Mimic with Neural Nets?
39:21 Unlabeled Data?
40:23 Synthetic Data: True Distribution
40:43 Synthetic Data: Small Sample
40:49 Synthetic Data: Random
41:04 Synthetic Data: Random
41:33 Synthetic Data: Random
42:44 Synthetic Data: NBE
44:09 These don’t work well enough. Had to develop a new, better method.
44:10 These don’t work well enough. Had to develop a new, better method. Munging [1. To imperfectly transform information. 2. To modify data in a way that cannot be described succinctly.]
44:47 Munging
44:56 Munging
49:27 Munging
49:42 Munging
49:51 Synthetic Data: Munge
50:16 Synthetic Data: Munge
51:51 Synthetic Data: Munge
54:29 Synthetic Data
54:44 Now That We Have a Method to Generate Data, Let’s Do Some Compression
55:00 Experimental Setup: Datasets
55:10 Experimental Setup
55:30 Average Results by Size
55:58 Average Results by Size
57:23 Average Results by Size
58:27 Average Results by Size
59:00 Letter.P1 Results
59:18 Hs Results
60:01 Average Results by HU
60:55 Letter.P1 Results
61:06 Letter.P2 Results
61:14 Letter Results
63:55 It Doesn’t Always Work As Well As We’d Like, Yet!
63:59 Covtype Results
64:28 Covtype Results
65:28 Covtype Results
67:36 Covtype Results
67:50 Adult Results
68:17 Adult Results
68:26 Adult Results
75:40 RMSE Results – 400K, 256 HU
76:37 We’re Retaining 97% of Accuracy of Target Model, but How Are We Doing on Compression?
76:42 Size of Models (MB)
77:20 Execution Time of Models
77:39 Summary of Compression Results
77:50 Related Work
78:21 Related Work
79:46 Related Work
81:55 Related Work
82:18 What Still Needs to Be Done?
82:20 Future Work: Other Mimic Models
82:30 Future Work: Other Target Metrics
83:43 Future Work: Model Complexity
84:16 Future Work: Munge
88:03 Future Work: Active Learning
90:00 Thank You. Questions?

Lecture rating

People found this lecture:
Worth seeing
because it is:
 Valuable and informative
Well presented
Easily understandable
Acceptably recorded
You need to login to cast your vote.

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment: