event thumbnail image
Probabilistic Modelling of Networks and Pathways
Pascal

Reverse engineering gene and protein regulatory networks using graphical models: A comparative evaluation study

author: Marco Grzegorczyk, Biomathematics and Statistics Scotland

Description

One of the major goals in systems biology is to infer the architecture of biochemical pathways and regulatory networks from postgenomic data, such as microarray gene expression and cytometric protein expression data. Various reverse engineering Machine Learning methods have been proposed in the literature, and it is important to understand their relative merits and shortcomings. In the talk the learning performances of three different graphical models machine learning methods, namely Relevance networks, Gaussian Graphical Models, and Bayesian networks, are cross-compared on real cytometric protein data and simulated data from the RAF signalling pathway. Relevance networks are based on pairwise association scores and straightforward to implement. But the inference is not done in the context of the whole system and there is no possibility to distinguished between direct and indirect associations. Both shortcomings are addressed by Gaussian graphical models, where the partial correlation between two variables, conditional on all the other domain variables, is employed as association score. Bayesian networks are more flexible probabilistic graphical models for conditional dependence and independence relations. Bayesian networks are based on directed acyclic graphs and can be exploited to analyse interventional data for identifying putative causal interactions. The empirical results were obtained by applying the shrinkage estimator of Schaefer and Strimmer (2005) to compute the inverse covariance matrix for Gaussian Graphical Models, and Bayesian network inference was done by sampling BNs from the posterior distribution with order Markov chain Monte Carlo (MCMC), as proposed by Friedman and Koller (2003). The experimental results were obtained by analysing data from the RAF protein signalling network reported in Sachs et al. (2005); which describes the interaction of eleven phosphorylated proteins and phospholipids in human immune system cells. Thereby it was distinguished between real cytometric protein activity measurements reported in Sachs et al. (2005) and synthetically generated data as well as between pure observational and interventional data. Observational data are obtained by passively monitoring the system without any interference while interventional data are obtained by actively manipulating variables, e.g. using gene knock-out experiments. Detailed results of this empirical study have been published in Werhli et al. (2006) and Grzegorczyk (2007). The three main findings can be summarized as follows. First, exclusively on Gaussian observational data, Bayesian networks and Gaussian graphical models were found to outperform Relevance networks. Second, for observational data no significant difference between Bayesian networks and Gaussian Graphical models was observed. Third, only for interventional data Bayesian networks clearly performed superior to the other two approaches.

You might be experiencing some problems with Your Video player.
Slides
0:00 Reverse engineering gene and protein regulatory networks using Graphical Models. A comparative evaluation study.
0:34 Original paper
0:45 Systems biology - Learning signalling pathways and regulatory networks from postgenomic data - part 1
0:57 Systems biology - Learning signalling pathways and regulatory networks from postgenomic data - part 2
1:05 Systems biology - Learning signalling pathways and regulatory networks from postgenomic data - part 3
1:10 Systems biology - Learning signalling pathways and regulatory networks from postgenomic data - part 4
1:19 Systems biology - Learning signalling pathways and regulatory networks from postgenomic data - part 5
1:30 Systems biology - Learning signalling pathways and regulatory networks from postgenomic data - part 6
1:45 Systems biology - Learning signalling pathways and regulatory networks from postgenomic data - part 7
2:04 Reverse Engineering of Regulatory Networks - part 1
2:20 Reverse Engineering of Regulatory Networks - part 2
2:54 Reverse Engineering of Regulatory Networks - part 3
3:12 Three widely applied methodologies:
3:23 •Relevance networks
3:25 Relevance networks (Butte and Kohane, 2000) - part 1
3:36 Relevance networks (Butte and Kohane, 2000) - part 2
3:44 Relevance networks (Butte and Kohane, 2000) - part 3
4:03 Relevance networks (Butte and Kohane, 2000) - part 4
4:08 Pairwise associations without taking the context of the system into consideration - part 1
4:24 Pairwise associations without taking the context of the system into consideration - part 2
4:43 •Graphical Gaussian models
4:46 Graphical Gaussian models
5:31 Shrinkage estimation of the covariance matrix (Schäfer and Strimmer, 2005) - part 1
6:19 Shrinkage estimation of the covariance matrix (Schäfer and Strimmer, 2005) - part 2
6:22 Graphical Gaussian Models
6:36 Further drawbacks
6:59 •Bayesian networks
7:01 Bayesian networks
8:03 Bayesian networks versus causal networks - part 1
8:16 Bayesian networks versus causal networks - part 2
8:33 Bayesian networks - part 1
8:45 Bayesian networks - part 2
9:10 Bayesian networks - part 3
9:24 Learning the network structure
10:20 MCMC sampling of Bayesian networks
10:44 Order MCMC (Friedman and Koller, 2003)
11:15 Equivalence classes of BNs
12:13 CPDAG representations - part 1
12:29 CPDAG representations - part 2
12:58 Interventional data
13:38 Evaluation of Performance
14:15 Probabilistic inference -DGE
14:55 Probabilistic inference -UGE - part 1
15:08 Probabilistic inference -UGE - part 2
15:10 Probabilistic inference - part 1
15:16 Probabilistic inference - part 2
15:40 Evaluation 1: AUC scores Area under Receiver Operator Characteristic (ROC) curve
16:29 Evaluation 2: TP scores - part 1
16:39 Evaluation 2: TP scores - part 2
16:50 Evaluation 2: TP scores - part 3
17:11 Evaluation - part 1
17:44 Evaluation - part 2
18:07 Evaluation - part 3
18:09 Evaluation: Raf signalling pathway
18:32 „gold standard RAF pathway„ according to Sachs et al. (2004)
19:02 Raf pathway
19:16 Data
19:43 Expression Data
19:54 Two types of experiments - part 1
20:13 Two types of experiments - part 2
20:34 Evaluation
20:34 Raf pathway
20:58 Gaussian simulated data
21:35 Netbuilder simulated data - part 1
21:54 Netbuilder simulated data - part 2
22:05 Netbuilder simulated data - part 3
22:21 Experimental Results
22:24 Synthetic data, observations
23:08 Synthetic data, interventions
23:35 Cytometry data, observations
24:08 Cytometry data, interventions
26:40 Area under the ROC curve
27:49 Number of TPs for FP=5 fixed
28:17 How can we explain the difference between synthetic and real data ?
28:24 Raf pathway
28:27 Pathway
28:31 Disputed structure of the gold-standard network
28:51 Complications with real data
29:05 Stabilisation through negative feedback loops
29:23 Conclusions 1
29:55 Conclusions 2
30:19 Additional analysis I:Raf pathway - part 1
30:45 Additional analysis I:Raf pathway - part 2
30:48 Additional analysis I:Raf pathway - part 3
30:54 CPDAGs of networks
31:15 Graphs
32:45 Some additional analysis II
34:59 Thank you
35:03 References

Lecture rating

People found this lecture:
Worth seeing
because it is:
 Valuable and informative
Well presented
Easily understandable
Acceptably recorded
You need to login to cast your vote.

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment: