The sample complexity of agnostic learning under deterministic labels
published: July 15, 2014, recorded: June 2014, views: 3220
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
With the emergence of Machine Learning tools that allow handling data with a huge number of features, it becomes reasonable to assume that, over the full set of features, the true labeling is (almost) fully determined. That is, the labeling function is deterministic, but not necessarily a member of some known hypothesis class. However, agnostic learning of deterministic labels has so far received little research attention. We investigate this setting and show that it displays a behavior that is quite different from that of the fundamental results of the common (PAC) learning setups. First, we show that the sample complexity of learning a binary hypothesis class (with respect to deterministic labeling functions) is not fully determined by the VC-dimension of the class. For any d, we present classes of VC-dimension d that are learnable from O(d/ϵ)-many samples and classes that require samples of size Ω(d/ϵ2). Furthermore, we show that in this setup, there are classes for which any proper learner has suboptimal sample complexity. While the class can be learned with sample complexity O(d/ϵ), any proper (and therefore, any ERM) algorithm requires Ω(d/ϵ2) samples. We provide combinatorial characterizations of both phenomena, and further analyze the utility of unlabeled samples in this setting. Lastly, we discuss the error rates of nearest neighbor algorithms under deterministic labels and additional niceness-of-data assumptions.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !