ProBic: identification of overlapping biclusters usinf Probabilistic Relational Models, applied to simulated gene expression data
published: Sept. 7, 2007, recorded: September 2007, views: 3716
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Biclustering is an increasingly popular technique to identify regulatory modules that are linked to biological processes. A bicluster is defined as a subset of genes which have a similar expression profile for a subset of conditions in the context of gene expression data. We describe a novel method, called ProBic, to simultaneously identify a series of overlapping biclusters in gene expression data within the framework of Probabilistic Relational Models (PRMs) [1;2]. PRMs are a relational extension to Bayesian Networks and allow for the integration of relational data within a unified probabilistic framework. A PRM model describes a joint probability as in Bayesian networks but with additional constraints on the conditional probability functions. We propose a novel PRM based biclustering model, in which gene expression data can be considered as relational data. The classes are Gene, Condition and Expression. Both the classes Gene and Condition have a vector attribute Bicluster containing a series of bicluster-id’s. These vectors represent which biclusters exist for a gene or condition and are initially unknown. Condition has an extra attribute ID, which is a unique number for each condition. Expression has an attribute Level containing the expression value and two reference slots which point to the gene and condition for which the level was measured. Expression.Level is conditionally dependent on Gene.Bicluster, Condition.Bicluster and Condition.ID. The conditional dependency is modeled as a set of Gaussian distributions with conjugate priors. The ProBic model naturally deals with missing values (in fact, there are no ‘missing’ values in this model) and robust sets of biclusters are obtained due to explicit modeling of noise. The maximum likelihood solution is approximated using an Expectation-Maximization strategy. ProBic was applied to simulated gene expression data sets and all the biclusters were successfully identified. Various noise settings and different overlap models (average, sum, product) have been explored. Our results show that PRM models can be used to identify overlapping biclusters in an efficient and robust manner, naturally dealing with missing values and noise.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !