Discovering Common Sequence Variation in Arabidopsis thaliana
Description
In order to characterize natural sequence variation in 20 strains of the model plant Arabidopsis thaliana, whole-genome resequencing with high-density oligonucleotide arrays was performed in collaboration with Perlegen Sciences Inc. Array data were analyzed with a combination of existing model-based (MB; Hinds et al., Science, 2005) and novel machine learning (ML) methods. For the identification of single nucleotide polymorphisms (SNPs) we developed an algorithm based on support vector machines. Training and evaluation was done on published alignments (Nordborg et al., PLoS Biology, 2005). At the same false discovery rates (FDR) as MB, the ML algorithm identifies significantly more true SNPs, especially in regions of high polymorphism density and/or low hybridization quality. The union of SNP predictions from both methods contains on average 143,572 SNPs per strain at a FDR of 2.8% (648,570 non-redundant SNPs). Furthermore, a machine learning algorithm was developed to detect polymorphic regions containing insertions, deletions and variational hotspots, where SNP detection algorithms typically fail to identify individual SNPs. It discovers the approximate location of a substantial additional proportion of polymorphisms (54% of deleted nucleotides and 33% of insertion sites). With a combination of all three methods 74% of SNPs can be directly called or are contained in a polymorphic region prediction (Zeller et al., in preparation). We examined the patterns of and forces shaping sequence variation in Arabidopsis (Clark et al., Science, 2007): e.g. significant differences were observed between gene families, and genes mediating interaction with the biotic environment harbor exceptional polymorphism levels.
| Slides | |
| 0:00 | - Discovering Common Sequence Variations in Arabidopsis thaliana - Announcement |
| 1:34 | Discovering Common Sequence Variations in Arabidopsis thaliana |
| 1:46 | Introduction - 1 |
| 3:07 | Introduction - 2 |
| 4:02 | Introduction - 3 |
| 5:02 | Introduction - 4 |
| 6:00 | Resequencing Array Basics I |
| 7:30 | Resequencing Array Basics II - 1 |
| 8:11 | Resequencing Array Basics II - 2 |
| 8:52 | Resequencing Data - 1 |
| 10:00 | Resequencing Data - 2 |
| 11:23 | Resequencing Data - 3 |
| 12:10 | Support Vector Machines for SNP Identification - 1 |
| 13:00 | Support Vector Machines for SNP Identification - 2 |
| 13:40 | Support Vector Machines for SNP Identification - 3 |
| 13:59 | Support Vector Machines for SNP Identification - 4 |
| 14:07 | Support Vector Machines for SNP Identification - 5 |
| 14:57 | 2-Layered Architecture for Inter-Strain Integration - 1 |
| 15:31 | 2-Layered Architecture for Inter-Strain Integration - 2 |
| 16:20 | 2-Layered Architecture for Inter-Strain Integration - 3 |
| 18:15 | Application to SNP Discovery |
| 21:20 | Limitations of the Technique - 1 |
| 21:30 | Limitations of the Technique - 2 |
| 22:57 | Limitations of the Technique - 3 |
| 25:33 | For this Work We Used the Shogun Toolbox |
| 25:58 | JMLR - Machine Learning Open Source Publications |
| 26:49 | New Problems and Methods in Computational Biology |
| 27:18 | Modeling Polymorphic Regions - 1 |
| 28:01 | Modeling Polymorphic Regions - 2 |
| 28:46 | Modeling Polymorphic Regions - 3 |
| 29:28 | Example - 1 |
| 30:25 | Learning to Predict Segmentations |
| 32:35 | Example - 1 |
| 33:34 | Example - 2 |
| 33:35 | Detection Performance |
| 34:21 | Complementing SNP Calls |
| 36:28 | Polymorphism Distribution - 1 |
| 38:00 | Polymorphism Distribution at Gene Boundaries |
| 39:32 | Polymorphism Distribution - 2 |
| 41:03 | Modeling Polymorphic Regions - 3 |
| 41:57 | Predicted Effects on Gene Products - 1 |
| 43:08 | Predicted Effects on Gene Products - 2 |
| 43:47 | Effects on Genes - 1 |
| 44:37 | Effects on Genes - 2 |
| 45:17 | Effects on Genes - 3 |
| 46:29 | Ab initio Gene Finding - 1 |
| 46:59 | Ab initio Gene Finding - 2 |
| 47:42 | Predicted Effects by Gene Finding |
| 48:38 | Example of Predicted Splice Form Change |
| 49:08 | Conclusions - 1 |
| 49:45 | Conclusions - 2 |
| 50:40 | Conclusions - 3 |
| 50:56 | Conclusions - 4 |
| 51:31 | - Questions |
Lecture rating
| People found this lecture: | ||
| Worth seeing | ||
| because it is: | ||
| Valuable and informative | ||
| Well presented | ||
| Easily understandable | ||
| Acceptably recorded | ||
| You need to login to cast your vote. | ||
Report a problem or upload files
If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Related content
SEE ALSO:
Link this page
Would you like to put a link to this lecture on your homepage?Go ahead! Copy the HTML snippet !



