A PAC-Bayesian Analysis of Dropouts
published: Oct. 6, 2014, recorded: December 2013, views: 2080
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Intuitively, a neural network that is robust to dropout perturbations should have better generalization properties - it should perform better on novel inputs. Stochastic model perturbation is the fundamental concept underlying PAC-Bayesian generalization theory. This talk will briefly summarize PAC-Bayesian generalization theory and give a regularization bound for a simple form of dropout training as a straightforward application. For a regularization bound involving an L2 penalty for model weights, dropouts reduce the regularization penalty by a factor of 1-alpha where alpha is the dropout rate. The bound then expresses a trade-off between the dropout rate and the training loss. While this regularization bound in intriguing, it may not be the right analysis. An alternative analysis involves variance reduction - the standard motivation for bagging. There are good reasons to believe that a certain general PAC-Bayes variance bound is significantly tighter than the general PAC-Bayes regularization bound. Unfortunately the variance bound is opaque - it does not involve explicit regularization and is difficult to compare with regularization bounds. Also, unlike regularization bounds, there is no obvious method for designing algorithms that minimize the variance bound. A compelling variance-based PAC-Bayesian analysis of dropouts remains an open problem.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !