Learning using Many Examples
Description
The statistical learning theory suggests to choose large capacity
models that barely avoid over-fitting the training data. In that perspective,
all datasets are small. Things become more complicated when
one considers the computational cost of processing large datasets.
Computationally challenging training sets appear when one want to emulate
intelligence: biological brains learn quite efficiently from the continuous
streams of perceptual data generated by our six senses, using
limited amounts of sugar as a source of power. Computationally challenging
training sets also appear when one want to analyze the masses
of data that describe the life of our computerized society. The more data
we understand, the more we enjoy competitive advantages.
– The first part of the tutorial clarifies the relation between the statistical
efficiency, the design of learning algorithms and their computational
cost.
– The second part makes a detailed exploration of specific learning algorithms
and of their implementation, with both simple and complex
examples.
– The third part considers algorithms that learn with a single pass
over the data. Certain algorithms have optimal properties but are
often too costly. Workarounds are discussed.
– Finally, the fourth part shows how active example selection provides
greater speed and reduces the feedback pressure that constrain parallel
implementations.
| Slides | |
| 0:00 | Learning with Large Datasets |
| 0:39 | Why Large-scale Datasets? |
| 1:25 | The Computerized Society Metaphor |
| 2:10 | Limited Computing Resources |
| 3:12 | Roadmap |
| 4:03 | Counter-Roadmap |
| 4:27 | Part I - Statistical Efficiency versus Computational Costs. |
| 4:33 | Objectives and Essential Remarks |
| 4:36 | Part I - Statistical Efficiency versus Computational Costs. |
| 5:12 | Objectives and Essential Remarks |
| 6:55 | Learning Algorithms: Standard Framework |
| 8:46 | Learning with Approximate Optimization |
| 9:44 | Decomposition of the Error (i) |
| 11:00 | Decomposition of the Error (ii) |
| 11:43 | Small-scale vs. Large-scale Learning |
| 12:10 | Small-scale Learning |
| 12:52 | Large-scale Learning |
| 14:06 | Learning versus Optimization |
| 15:09 | Asymptotics: Estimation |
| 16:11 | Asymptotics: Estimation+Optimization |
| 17:02 | . . . Approximation+Estimation+Optimization |
| 17:43 | Case Study |
| 18:11 | Quantities of Interest |
| 18:48 | Gradient Descent (GD) |
| 20:44 | Second Order Gradient Descent (2GD) |
| 21:45 | Gradient Descent (GD) |
| 21:50 | Second Order Gradient Descent (2GD) |
| 22:01 | Stochastic Gradient Descent (SGD) |
| 24:40 | Second Order Gradient Descent (2GD) |
| 24:48 | Stochastic Gradient Descent (SGD) |
| 24:53 | Second order Stochastic Descent (2SGD) |
| 25:12 | Stochastic Gradient Descent (SGD) |
| 25:21 | Second order Stochastic Descent (2SGD) |
| 28:22 | Part II - Learning with Stochastic Gradient Descent. |
| 28:29 | Benchmarking SGD in Simple Problems |
| 29:17 | Text Categorization with SVMs (1) |
| 30:11 | Text Categorization with SVMs (2) |
| 38:28 | More SVM Experiments |
| 40:33 | Text Chunking with CRFs (1) |
| 41:39 | Text Chunking with CRFs (2) |
| 43:22 | Choosing the Gain Schedule |
| 45:47 | The Secret Ingredient for a good SGD |
| 46:44 | Getting the Engineering Right |
| 48:13 | SGD for Real Life Applications |
| 49:48 | Part III - Learning with a Single Pass over the Examples |
| 50:15 | Why learning with a Single Pass? |
| 51:16 | Effect of one Additional Example (i) |
| 52:11 | Effect of one Additional Example (ii) |
| 52:59 | Yes they do! But what does it mean? |
| 54:33 | Optimal Learning in One Pass |
| 56:07 | Unfortunate Practical Issues |
| 57:30 | Summary |
| 58:43 | Stopping criteria for SGD |
| 59:55 | Part IV - Selecting Useful Training Examples. |
| 59:57 | Why learning with a Single Pass? |
| 60:07 | Effect of one Additional Example (i) |
| 60:15 | Effect of one Additional Example (ii) |
| 60:20 | Yes they do! But what does it mean? |
| 60:25 | Optimal Learning in One Pass |
| 60:35 | Unfortunate Practical Issues |
| 61:00 | Summary |
| 61:19 | Stopping criteria for SGD |
| 61:31 | Part IV - Selecting Useful Training Examples. |
| 61:52 | Incremental Training |
| 62:55 | Methods for Selecting Examples |
| 65:55 | Large-scale Support Vector Machines? |
| 66:48 | Learning in the dual |
| 68:36 | An Inefficient Dual Optimizer |
| 69:34 | Two Problems with this Algorithm |
| 70:11 | An Inefficient Dual Optimizer |
| 70:23 | Two Problems with this Algorithm |
| 71:13 | The Huller and its Derivatives |
| 72:13 | One Pass Learning with Kernels |
| 73:37 | Time and Memory |
| 74:59 | Selecting Training Examples |
| 76:17 | Selecting Examples by Sampling |
| 76:23 | Selecting Training Examples |
| 76:32 | Selecting Examples by Sampling |
| 77:03 | Active Example Selection — Small-scale |
| 78:27 | The Price of Convexity |
| 80:46 | - Questions |
| 82:47 | Active Example Selection — Small-scale |
| 82:57 | The Price of Convexity |
| 83:24 | Active Example Selection — Large-scale |
| 85:07 | Are we there yet? |
| 85:59 | Large Scale Selection of Examples (1) |
| 87:08 | Large Scale Selection of Examples (2) |
Lecture rating
| People found this lecture: | ||
| Worth seeing | ||
| because it is: | ||
| Valuable and informative | ||
| Well presented | ||
| Easily understandable | ||
| Acceptably recorded | ||
| You need to login to cast your vote. | ||
Report a problem or upload files
If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Related content
SEE ALSO:
Link this page
Would you like to put a link to this lecture on your homepage?Go ahead! Copy the HTML snippet !




