Factorizing Gigantic Matrices
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Low-rank approximations of data matrices have become an important tool in machine learning and data mining. They allow for embedding high dimensional data in lower dimensional spaces and can therefore mitigate effects due to noise, uncover latent relations, or facilitate further processing. These properties have been proven successful in many applications areas such as bio-informatics, computer vision, text process ing, recommender systems, social network analysis, among others. Present day technologies are characterized by exponentially growing amounts of data. Recent advances in sensor technology, Internet applications, and communication networks call for methods that scale to very large and/or growing data matrices. In this tutorial, we discuss basic characteristics of matrix factorization and introduce several recent approaches that scale to modern massive data analysis problems.
The tutorial aims at a wide audience as it reviews both machine learning and data mining techniques. It is intended for PhD students, practitioners, and researchers who are interested in large scale machine learning and data analysis.
The tutorial is divided into three parts:
- Part I: Matrix Factorization — Traditional Optimization Approaches and Statistical Foundations: In this block, we will discuss foundations and multi-linear extensions of traditional methods such as SVD, PCA, K-Means, and Vector Quantization.
- Part II: Constraint Matrix Factorization Many real-world applications of matrix factorization impose constraints on the factorization problem. For instance, matrix factors need to be non-negative, convex combinations of existing data, or compact binary codes. Among others, we discuss techniques such as Spectral Hashing, NMF, Archetypal Analysis, CNMF, and CH-NMF.
- Part III: Data-driven Matrix Factorization Techniques: The first and second part of the tutorial consider norm minimization problems to obtain suitable matrix factors. Recent approaches that extend matrix factorization towards massive data assume a different point of view: they attempt to maximize the volume of a selection of rows and columns of a given data matrix. In this final part of the tutorial, we present and review approaches such as FastMap, CUR, CMD, and SiVM.
In each of the parts, we present practical applications from fields such as image processing, computer vision, robotics, web mining, and social media analysis.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !