event thumbnail image
Machine Learning Summer School 2005 - Chicago
Pascal

Evidence Integration in Bioinformatics

author: Phil Long, Columbia University

Description

Biologists frequently use databases; for example, when a biologist encounters some unfamiliar proteins, s/he will use databases to get a preliminary idea of what is known about them. The databases can be often interpreted as lists of assertions. An example is a protein-protein interaction database: each entry is a pair of proteins that are asserted to interact, along with the supporting evidence. Often a candidate for inclusion in such a database can be supported in a variety of fundamentally different ways. A methodological challenge is how to effectively combine these different sources of evidence to make accurate aggregate predictions. Ideas from machine learning are useful for this. I will describe some of the special properties of problems like this, and relevant methods from machine learning, including algorithms based on bayesian networks, boosting and SVMs.

You might be experiencing some problems with Your Video player.
Slides
0:00 Evidence Integration in Bioinformatics
0:13 A little molecular biology
1:26 Problems
1:45 Evidence of related function
2:40 Evidence of protein-protein interaction
4:19 Combining using machine learning
6:03 Overfitting and inductive bias
7:07 Supervised Learning with Bayes Nets
7:19 Bayesian Networks
8:04 Bayes Net - Example
10:22 Naïve Bayes
13:49 Hierarchical Naïve Bayes
15:16 Supervised learning with SVMs (kernel fusion)
15:26 Support Vector Machines for Classification
17:34 Support Vector Machine Training
18:43 Kernel fusion
21:34 Evaluation - ROC Curve
23:01 Evaluation - ROC curve
23:38 Results – membrane protein prediction
24:45 Supervised learning with boosting (RankBoost)
25:00 RankBoost
26:30 RankBoost behavior
28:17 RankBoost
29:05 Unsupervised Learning with Bayes Nets
30:00 Regulation of Expression
31:19 Transcriptional Modules
33:06 Bayes Net for Transcription Modules
37:54 Unsupervised Evidence Integration
37:57 Problem
38:20 Examples
41:36 More generally
42:22 Notation
43:15 Isn’t this just clustering?
45:17 Related Theoretical Work [MV03] – Problem
47:22 Related Theoretical Work [MV03] – Results
49:24 In our problem(s)...
50:30 Conditional independence
52:44 Our Approach
58:56 Notes
59:14 Evaluation: Yeast protein-protein data
61:10 Evaluation: other algorithms
61:36 Evaluation
61:57 Results: Protein-protein data
62:46 Results: Protein-protein data
63:49 Evaluation: Artificial data
64:07 Results: Artificial source
64:09 Results: Artificial source
64:12 Paper and Software

Lecture rating

People found this lecture:
Worth seeing
because it is:
 Valuable and informative
Well presented
Easily understandable
Acceptably recorded
You need to login to cast your vote.

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment: