event thumbnail image
Plenaries

Entropy Properties of a Decision Rule Class in Connection with machine learning abilities

author: Alexey Chervonenkis, University of London

Description

Many methods of Machine Learning are based on the idea of empirical risk minimisation. It is to find a decision rule or a model from some set which most perfectly fits the data presented in the training set. This idea is based on the large number law: empirical risk converges to real risk, if the training set is large enough. But if the class of decision rules or models is too large (in some sense) one meets the problem of oferfitting, the model perfectly corresponds to the data presented in the training set, but shows large errors on new data. It is due to the fact that only uniform convergence of empirical risk to the real risk guarantees closeness of the optimal model behaviour on the training set and on the new data. We introduce the notion of entropy of a decision rule class over a fixed sample sequence as log of the number of possible classifications of the sequence by the rules of the class. Maximum entropy over sequences of a fixed length l determines sufficient condition of the uniform convergence and corresponding estimates. But only average entropy H(l) behaviour determine necessary and sufficient condition of the uniform convergence. The condition is that H(l) / l (average entropy per symbol) should go to zero when the sequence length goes to infinity. If the condition does not hold then there exists a set of objects with non zero probability measure, such that almost all sequences of arbitrary finite length from this set may be divided in all possible ways by the rules of the class. One can easily see, that in this case overfitting is inevitable. Similar results are found for real dependencies instead of decision rules.

You might be experiencing some problems with Your Video player.
Slides
0:00 Entropy Properties of a Decision Rule Class in Connection with machine learning abilities
3:05 Learning to Pattern Recognition
4:31 Reconstruction of Numerical Dependencies
6:16 Over Fitting in Pattern Recognition Problem
6:22 Reconstruction of Numerical Dependencies (a)
6:51 Over Fitting in Pattern Recognition Problem (a)
7:57 Over Fitting in Polynomial Regression Reconstruction
10:42 Formal Definition
12:40 Dependence of the Empirical and True Risk
14:23 Uniform Convergence of Frequencies to Probabilities
16:52 Random Functions
18:32 Conditions of the Uniform Convergence of Frequencies to Probabilities pt 1
20:08 Conditions of the Uniform Convergence of Frequencies to Probabilities pt 2
21:55 Conditions of the Uniform Convergence of Frequencies to Probabilities pt 3
25:05 Conditions for the Uniform Convergence of Means to Expectations
26:51 We Get Only Sufficient Conditions pt 1
27:03 We Get Only Sufficient Conditions pt 2
28:52 We Get Only Sufficient Conditions pt 1 (a)
29:43 We Get Only Sufficient Conditions pt 3
31:01 We Get Only Sufficient Conditions pt 4
32:10 Average Distance between Them Is Non-zero
32:45 We Get Only Sufficient Conditions pt 4 (a)
33:48 When the Uniform Convergence of Frequencies to Probabilities Does not Hold
37:08 We Get Only Sufficient Conditions pt 2 (a)
38:26 When the Uniform Convergence of Frequencies to Probabilities Does not Hold (a)

Lecture rating

People found this lecture:
Worth seeing
because it is:
 Valuable and informative
Well presented
Easily understandable
Acceptably recorded
You need to login to cast your vote.

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment: