Estimating the contribution of non-genetic factors to gene expression using Gaussian process latent variable models
published: May 3, 2010, recorded: March 2010, views: 3047
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Thanks to the recent increase in the amount of genetic profiling data available and to the ability to characterize disease activity through gene expression, it is possible to understand more in detail the multitude of causal factors linked with each disease. This is a challenging task because the integration of different sources of biological data is not straightforward and because non-genetic factors (such as differences in the experimental setting or individual characteristics such as gender and ethnicity) are not always artificially controlled. Since these non-genetic factors may cause most of the variation in gene-expression reducing the accuracy of genetic studies, there’s a pressing need for models that take them explicitly into account. We present a model in which non-genetic factors are unobserved latent variables the gene expression levels can be described as linear functions of both these latent variables and Single Nucleotide Polymorphisms (SNPs). From a generative point of view, we can see the gene expression levels Y as
Y = SV + XW +mu 1^T + epsilon
Where S is the matrix containing the SNPs, X are the latent variables, V and W are mapping matrices, is a Gaussian distributed isotropic error model and mu allows the model to have non-zero mean.
The model is inspired by the one proposed by Stegle et al. , but instead of optimizing parameters and marginalising latent variables (as in Probabilistic PCA), we marginalise the parameters and optimize the latent variables. For a particular choice of prior over the mapping matrices W and V the two approaches are equivalent.
This kind of model is called dual Probabilistic PCA and it belongs to a wider class of models called Gaussian Process - Latent Variable Models. Indeed, dual PPCA is the special case where the output dimensions are assumed to be linear, independent and identically distributed. Each of these assumptions can be relaxed obtaining new probabilistic models. Many extensions of this model are possible, but even in its simplest form the eQTL study results are extremely promising in terms of number of significant associations found.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !