The Catch-Up Phenomenon in Bayesian Inference
Description
Standard Bayesian model selection/averaging sometimes learn too slowly: there exist other learning methods that lead to better predictions based on less data. We give a novel analysis of this "catch-up" phenomenon. Based on this analysis, we propose the switching method, a modification of Bayesian model averaging that never learns slower, but sometimes learns much faster than Bayes. The method is related to expert-tracking algorithms developed in the COLT literature, and has time complexity comparable to Bayes. The switching method resolves a long-standing debate in statistics, known as the AIC-BIC dilemma: model selection/averaging methods like BIC, Bayes, and MDL are consistent (they eventually infer the correct model) but, when used for prediction, the rate at which predictions improve can be suboptimal. Methods like AIC and leave-one-out cross-validation are inconsistent but typically converge at the optimal rate. Our method is the first that provably achieves both. Experiments with nonparametric density estimation confirm that these large-sample theoretical results also hold in practice in small samples.
| Slides | |
| 0:00 | The Catch-Up Phenomenon |
| 0:34 | Model Selection - 1 |
| 0:51 | Model Selection - 2 |
| 1:23 | Model Selection - 3 |
| 1:38 | Model Selection - 4 |
| 2:03 | Model Selection - 5 |
| 2:03 | Model Selection Methods - 1 |
| 3:46 | Model Selection Methods - 2 |
| 4:13 | The AIC-BIC Dilemma - 1 |
| 6:33 | The AIC-BIC Dilemma - 2 |
| 7:01 | The AIC-BIC Dilemma - 3 |
| 7:52 | The AIC-BIC Dilemma - 4 |
| 8:44 | The AIC-BIC Dilemma - 5 |
| 10:00 | The Best of Both Worlds - 1 |
| 10:41 | Example: Histograms - 1 |
| 11:39 | Example: Histograms - 2 |
| 12:36 | Example: Histograms - 3 |
| 13:46 | CV Selects more Bins than Bayes |
| 15:30 | CV Predicts better than Bayes - 1 |
| 16:40 | CV Predicts better than Bayes - 2 |
| 17:46 | ...but CV is Inconsistent! - 1 |
| 18:08 | ...but CV is Inconsistent! - 2 |
| 19:44 | The Best of Both Worlds - 2 |
| 20:06 | The Best of Both Worlds - 3 |
| 20:28 | - Questions |
| 21:17 | Menu - Bayes Factor Model Selection |
| 21:43 | Bayes Factor Model Selection - 1 |
| 23:59 | Bayes Factor Model Selection - 2 |
| 27:26 | Bayes Factor Model Selection - 3 |
| 27:28 | The Catch-Up Phenomenon |
| 27:54 | Bayes Factor Model Selection - 3 |
| 28:08 | The Catch-Up Phenomenon |
| 28:12 | Menu - Bayes Factor Model Selection: Predictive Interpretation |
| 28:13 | Bayesian Prediction |
| 29:02 | Logarithmic Loss |
| 31:17 | The Most Important Slide |
| 32:42 | Menu - The Catch-Up Phenomenon |
| 32:47 | The Catch-Up Phenomenon - 1 |
| 33:06 | The Catch-Up Phenomenon - 2 |
| 33:07 | The Catch-Up Phenomenon - 1 |
| 33:42 | The Catch-Up Phenomenon - 2 |
| 33:44 | The Catch-Up Phenomenon - 3 |
| 35:23 | The Catch-Up Phenomenon - 4 |
| 35:36 | The Catch-Up Phenomenon - 5 |
| 36:50 | The Catch-Up Phenomenon - 6 |
| 37:05 | The Switch Distribution - 1 |
| 38:04 | The Switch Distribution - 2 |
| 39:09 | The Switch Distribution - 3 |
| 39:44 | The Switch Distribution - 4 |
| 40:59 | The Switch Distribution - 5 |
| 41:39 | The Switch Distribution - 6 |
| 41:54 | Menu - Solving the AIC-BIC Dilemma: Multi-Switch Distribution |
| 42:02 | More than 2 Models - 1 |
| 42:14 | More than 2 Models - 2 |
| 42:36 | Multi-Switch Distribution - 1 |
| 42:59 | Multi-Switch Distribution - 2 |
| 43:17 | Multi-Switch Distribution - 3 |
| 43:26 | Multi-Switch Distribution - 4 |
| 43:51 | Multi-Switch Distribution - 5 |
| 43:55 | Multi-Switch Distribution - 6 |
| 44:40 | Model Selection by Switching |
| 45:27 | Switching is Consistent |
| 46:01 | Rate-of-Convergence - 1 |
| 46:23 | Rate-of-Convergence - 2 |
| 46:36 | Rate-of-Convergence - 3 |
| 46:44 | Rate-of-Convergence - 4 |
| 47:03 | Switching Achieves Minimax Rate - 1 |
| 47:47 | Switching Achieves Minimax Rate - 2 |
| 47:56 | Switch-Distribution Converges Fast |
| 48:02 | The AIC-BIC Dilemma |
| 48:09 | Computational Complexity |
| 49:11 | (Potential) Applications |
| 49:21 | “Bayesian”? |
| 49:55 | Subjective Bayesian Objections - 1 |
| 49:56 | It’s MDL, Jim, but not as We Know It! |
| 50:36 | - Questions |
| 51:18 | - Questions |
Lecture rating
| People found this lecture: | ||
| Worth seeing | ||
| because it is: | ||
| Valuable and informative | ||
| Well presented | ||
| Easily understandable | ||
| Acceptably recorded | ||
| You need to login to cast your vote. | ||
Report a problem or upload files
If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Link this page
Would you like to put a link to this lecture on your homepage?Go ahead! Copy the HTML snippet !




