Robust PCA and Collaborative Filtering: Rejecting Outliers, Identifying Manipulators
published: Jan. 13, 2011, recorded: December 2010, views: 2395
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Principal Component Analysis is one of the most widely used techniques for dimensionality reduction. Nevertheless, it is plagued by sensitivity to outliers; finding robust analogs, particularly for high-dimensional data, is critical. We discuss the challenges posed by the high dimensional setting, where dimensionality is of the same order, or greater, than the number of samples. We detail why existing techniques fail -- indeed, no known algorithm can provide provable bounds to any constant fraction of outliers -- and then present two very different algorithms for High Dimensional Robust PCA. Our first algorithm achieves a breakdown point of 50% -- the best possible using any algorithm, and a stark improvement from the previous best-known result of 0%. Our second algorithm is based on ideas from convex optimization, and in addition to recovering the principal components, is also able to identify the corrupted points. We extend this to the partially observed setting, significantly extending matrix completion results to the setting of corrupted rows or columns.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !