The Dynamics of AdaBoost
Description
One of the most successful and popular learning algorithms is AdaBoost, which is a classification algorithm designed to construct a "strong" classifier from a "weak" learning algorithm. Just after the development of AdaBoost nine years ago, scientists derived margin- based generalization bounds to explain AdaBoost's unexpectedly good performance. Their result predicts that AdaBoost yields the best possible performance if it always achieves a "maximum margin" solution. Yet, does AdaBoost achieve a maximum margin solution? Empirical and theoretical studies conducted within this period conjecture the answer to be "yes". In order to answer this question, we look toward AdaBoost's dynamics. We simplify AdaBoost to reveal a nonlinear iterated map. We then analyze the convergence of AdaBoost for cases where cyclic behavior is found; this cyclic behavior provides the key to answering the question of whether AdaBoost always maximizes the margin. As it turns out, the answer to this question turns out to be the opposite of what was thought to be true! In this talk, I will introduce AdaBoost, describe our analysis of AdaBoost when viewed as a dynamical system, briefly mention a new boosting algorithm which always maximizes the margin with a fast convergence rate, and if time permits, I will reveal a surprising new result about AdaBoost and the problem of bipartite ranking.
| Slides | |
| 0:02 | Dynamics of AdaBoost |
| 0:12 | A Story about AdaBoost |
| 2:08 | The question remained (until recently): Does AdaBoost maximize the margin? |
| 2:34 | The question remained (until recently): Does AdaBoost maximize the margin? |
| 3:00 | The question remained (until recently): Does AdaBoost maximize the margin? |
| 3:29 | The question remained (until recently): Does AdaBoost maximize the margin? |
| 4:17 | The question remained (until recently): Does AdaBoost maximize the margin? |
| 4:42 | The question remained (until recently): Does AdaBoost maximize the margin? |
| 5:19 | The question remained (until recently): Does AdaBoost maximize the margin? |
| 6:39 | Overview of Talk |
| 7:32 | A Sample Problem |
| 7:34 | Say you have a database of news articles… |
| 8:30 | Examples of Classification Tasks: |
| 8:55 | Examples of classification algorithms: |
| 9:15 | Training Data: {(xi,yi)}i=1..m where (xi,yi) is chosen iid from an unknown probability distribution on X{-1,1}. |
| 9:31 | How do we construct a classifier? |
| 9:43 | Say we have a “weak” learning algorithm: |
| 10:03 | Boosting algorithms combine weak classifiers in a meaningful way (Schapire ‘89). |
| 11:14 | AdaBoost (Freund and Schapire ’96) |
| 12:57 | AdaBoost |
| 14:45 | AdaBoost |
| 15:06 | AdaBoost |
| 15:33 | AdaBoost |
| 17:33 | AdaBoost |
| 18:17 | Does AdaBoost choose λfinal so that the margin µ( f ) is maximized? That is, does AdaBoost maximize the margin? No! |
| 19:08 | The question remained (until recently): Does AdaBoost maximize the margin? |
| 20:11 | About the proof… |
| 20:28 | Analyzing AdaBoost using Dynamical Systems |
| 21:13 | Smallest Non-Trivial Case |
| 22:24 | TITLE |
| 23:44 | Smallest Non-Trivial Case |
| 24:05 | Smallest Non-Trivial Case |
| 24:09 | Smallest Non-Trivial Case |
| 24:11 | Smallest Non-Trivial Case |
| 24:14 | Smallest Non-Trivial Case |
| 24:24 | Smallest Non-Trivial Case |
| 24:44 | Smallest Non-Trivial Case |
| 24:57 | Two possible stable cycles! |
| 27:11 | Generalization of smallest non-trivial case |
| 28:00 | Generalization of smallest non-trivial case |
| 29:08 | Empirically Observed Cycles |
| 29:10 | Empirically Observed Cycles |
| 31:12 | Empirically Observed Cycles |
| 31:32 | Empirically Observed Cycles |
| 31:36 | Empirically Observed Cycles |
| 31:39 | Empirically Observed Cycles |
| 31:41 | Empirically Observed Cycles |
| 31:43 | If AdaBoost cycles, we can calculate the margin it will asymptotically converge to in terms of the edge values |
| 32:16 | The question remained (until recently): Does AdaBoost maximize the margin? |
Lecture rating
| People found this lecture: | ||
| Worth seeing | ||
| because it is: | ||
| Valuable and informative | ||
| Well presented | ||
| Easily understandable | ||
| Acceptably recorded | ||
| You need to login to cast your vote. | ||
Report a problem or upload files
If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
SEE ALSO:
Link this page
Would you like to put a link to this lecture on your homepage?Go ahead! Copy the HTML snippet !






Not bad, but I recommend Robert Schapire's lecture first, or rather: instead.