An Efficient Parameter - Free Method for Large Scale Offline Learning
Description
With the rapid growth of computer storage
capacities, available data and demand for scoring models both follow an increasing
trend, sharper than that of the processing
power. However, the main limitation to a
wide spread of data mining solutions is the
non-increasing availability of skilled data analysts, which play a key role in data preparation and model selection. In this paper we present a parameter-free scalable classification method, which is a step towards fully automatic data mining. The method is based on Bayes optimal univariate conditional density estimators, naive Bayes classification enhanced with a Bayesian variable selection scheme, and averaging of models using a logarithmic smoothing of the posterior distribution. We focus on the complexity of the algorithms and show how they can cope with datasets that are far larger than the available central memory. We finally report results on the Large Scale Learning challenge, where our method obtains state of the art performance within practicable computation
time.
| Slides | |
| 0:00 | An Efficient Parameter-Free Method for Large Scale Offline Learning |
| 0:10 | Outline |
| 0:23 | Data Mining Methodology |
| 0:48 | Data Mining in France Telecom |
| 1:11 | Data Mining under Limited Resources |
| 2:00 | Data Mining Challenges |
| 2:32 | Averaging of Selective Naive Bayes Classifiers |
| 2:37 | Naive Bayes Classifier: Principles |
| 3:16 | Naive Bayes Classification: Three Major Improvements |
| 3:43 | Evaluation of Conditional Probabilities |
| 4:13 | Discretization: Model Selection |
| 4:25 | MODL Discretization Method |
| 5:18 | Selective Naive Bayes: Objectives |
| 5:43 | Selective Naive Bayes: Our Approach |
| 6:41 | Averaging of Selective Naive Bayes - Bayesian Model Averaging |
| 7:22 | Averaging of Selective Naive Bayes - Our Approach |
| 8:23 | Method Overview |
| 9:05 | Scalability |
| 10:00 | Key Performance Indicators |
| 11:32 | Scalability: Our Strategy |
| 12:28 | Some Practical Issues |
| 13:22 | Evaluation on the Large Scale Learning Challenge |
| 13:29 | Large Scale Learning Challenge |
| 13:59 | Our Submission |
| 15:32 | Overall Challenge Ranking |
| 15:46 | Training Time Results |
| 17:01 | Training Time: What Matters? |
| 18:14 | Accuracy Results |
| 18:48 | Illustration |
| 19:44 | Importance of Data Representation |
| 20:56 | Conclusion |
| 21:00 | Summary |
| 22:48 | Conclusion and Future Work |
| 24:17 | References |
| 24:19 | - Questions |
| 25:49 | - Questions |
Lecture rating
| People found this lecture: | ||
| Worth seeing | ||
| because it is: | ||
| Valuable and informative | ||
| Well presented | ||
| Easily understandable | ||
| Acceptably recorded | ||
| You need to login to cast your vote. | ||
Report a problem or upload files
If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Related content
SEE ALSO:
Link this page
Would you like to put a link to this lecture on your homepage?Go ahead! Copy the HTML snippet !





