Two-level infinite mixture for multi-domain data
published: Dec. 20, 2008, recorded: December 2008, views: 3021
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
The combined, unsupervised analysis of coupled data sources is an open problem in machine learning. A particularly important example from the biological domain is the analysis of mRNA and protein profiles derived from the same set of genes (either over time or under different conditions). Such analysis has the potential to provide a far more comprehensive picture of the mechanisms of transcription and translation than the individual analysis of the separate data sets. The problem is similar to that attacked with traditional Canonical Correlation Analysis (CCA) but in many application areas, the CCA assumptions are too restrictive. Probabilistic CCA  and kernel CCA  have both been recently proposed but the former is still limited to linear relationships and the latter compromises the interpretability in the original space. In this work, we preset a nonparametric model for coupled data that provides an interpretable description of the shared variability in the data (as well as that that isn’t shared) whilst being free of restrictive assumptions such as those found in CCA. The hierarchical model is built from two marginal mixtures (one for each representation - generalisation to three or more is straightforward). Each object will be assigned to one component in each marginal and the contingency table describing these joint assignments is assumed to have been generated by a mixture of tables with independent margins. This top-level mixture captures the shared variability whilst the marginal models are free to capture variation specific to the respective data sources. The number of components in all three mixtures is inferred from the data using a novel Dirichlet Process (DP) formulation.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !