Speeding Up Stochastic Gradient Descent
published: Dec. 29, 2007, recorded: December 2007, views: 863
Slides
Related content
01:51:44
930 views - Nicol Schraudolph, 2006
58:54
1033 views - Yann LeCun, 2007
02:30:56
847 views - Léon Bottou, 2003
15:27
496 views - Yann LeCun, 2007
03:59:04
596 views - Nicol Schraudolph, 2005
40:17
349 views - S.V.N. Vishwanathan, 2007
25:21
557 views - Tijmen Tieleman, 2008
04:59:19
18405 views - Sam Roweis, 2006
26:14
256 views - Hugo Larochelle, 2008
03:54:31
12741 views - Chih-Jen Lin, 2006
Report a problem or upload files
If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Description
n order to tackle large-scale learning problems whose solution necessarily involves a large model with many tunable parameters, difficult non-convex optimization has to be performed efficiently. Computational complexity arguments strongly suggest that deep architectures will be necessary to represent the kind of complex functions that AI involves. Unfortunately, this involves difficult optimization problems and efficient approximate iterative optimization becomes key to obtain good generalization, and not so much the regularization techniques that have been so well studied in the last two decades. Furthermore, because of the size of the data sets involved in such tasks, it is imperative that computation scale no more than linearly with respect to the number of training examples. In many cases, the algorithm to beat is stochastic gradient descent, and the comparisons have to be made by looking at the curve of test error versus computation time. Following recent interest in online versions of second-order optimization methods, we present computational tricks that yield a linear time variant of natural gradient optimization. Another issue, that is particularly difficult to address in the optimization of multi-layer neural networks, is how to parallelize efficiently. SMP machines becoming cheaper and easier to use, we compare and discuss different strategies for exploiting parallelization of training for multi-layer neural networks, showing that naive approaches fail but those taking into account the communication bottleneck yield impressive speed-ups.
See Also:
Download slides:
eml07_bengio_ssg_01.pdf (1.5 MB)
Launch in a standalone WM Player
Switch to Windows Media Player
Link this page
Would you like to put a link to this lecture on your homepage?Go ahead! Copy the HTML snippet !




Write your own review or comment: