More Data Less Work: Runtime As A Monotonically Decreasing Function of Data Set Size
published: July 30, 2009, recorded: June 2009, views: 3941
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
We are used to studying runtime as an increasing function of the data set size, and are happy when this increase is not so bad (e.g. when the runtime increases linearly, or even polynomiall, with the data set size). Traditional runtime analysis of learning is also viewed this way, and studies how training runtime increases as more data is available. However, considering the true objective of training, which is to obtain a good predictor, I will argue that training runtime should actually be studied as a *decreasing* function of training set size. Focusing on training Support Vector Machines (SVMs) and combining ideas from optimization, statistical learning theory, and online methods. I will then present both theoretical and empirical results demonstrating how a simple stochastic subgradient descent approach indeed displays such monotonic decreasing behavior. I will also discuss a similar phenomena in the context of Gaussian mixture clustering, where it appears that excess data turns the problem from computationally intractable to computationally tractable. Joint work with Shai Shalev-Shwartz, Karthik Sridharan, Yoram Singer, Greg Shakhnarovich and Sam Roweis.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !