Machine Learning on Distributions
published: Jan. 16, 2013, recorded: December 2012, views: 4485
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Low-dimensional embedding, manifold learning, clustering, classification, and anomaly detection are among the most important problems in machine learning. The existing methods usually consider the case when each instance has a fixed, finite-dimensional feature representation. Here we consider a different setting. We assume that each instance corresponds to a continuous probability distribution. These distributions are unknown, but we are given some i.i.d. samples from each distribution. Our goal is to estimate the distances between these distributions and use these distances to perform low-dimensional embedding, clustering/classification, or anomaly detection. We present estimation algorithms and prove when the effective dimension is small enough (as measured by the doubling dimension), then the excess prediction risk in the regression problem converges to zero with a polynomial rate. We demonstrate the power of our methods by outperforming the best published results on several computer vision benchmarks. We also show how our perspective on learning from distributions allows us to define new analyses in astronomy and fluid dynamics simulations.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !