Discovering Common Sequence Variation in Arabidopsis thaliana
published: Nov. 20, 2007, recorded: September 2007, views: 164
Slides
Related content
01:05:55
311 views - Jean-Philippe Vert, 2007
26:24
209 views - Chris Needham, 2007
10:49
99 views - Florence d'Alché, 2007
31:41
107 views - Nicolas Omont, 2007
29:01
80 views - Jean-François Gibrat, 2007
27:11
58 views - Etienne Birmelé, 2007
20:36
2204 views - Elisa Ricci, 2007
55:08
322 views - Yves Moreau, 2007
05:07:12
2892 views - Gunnar Rätsch, 2007
15:43
89 views - Gunnar Rätsch, 2008
Report a problem or upload files
If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Description
In order to characterize natural sequence variation in 20 strains of the model plant Arabidopsis thaliana, whole-genome resequencing with high-density oligonucleotide arrays was performed in collaboration with Perlegen Sciences Inc. Array data were analyzed with a combination of existing model-based (MB; Hinds et al., Science, 2005) and novel machine learning (ML) methods. For the identification of single nucleotide polymorphisms (SNPs) we developed an algorithm based on support vector machines. Training and evaluation was done on published alignments (Nordborg et al., PLoS Biology, 2005). At the same false discovery rates (FDR) as MB, the ML algorithm identifies significantly more true SNPs, especially in regions of high polymorphism density and/or low hybridization quality. The union of SNP predictions from both methods contains on average 143,572 SNPs per strain at a FDR of 2.8% (648,570 non-redundant SNPs). Furthermore, a machine learning algorithm was developed to detect polymorphic regions containing insertions, deletions and variational hotspots, where SNP detection algorithms typically fail to identify individual SNPs. It discovers the approximate location of a substantial additional proportion of polymorphisms (54% of deleted nucleotides and 33% of insertion sites). With a combination of all three methods 74% of SNPs can be directly called or are contained in a polymorphic region prediction (Zeller et al., in preparation). We examined the patterns of and forces shaping sequence variation in Arabidopsis (Clark et al., Science, 2007): e.g. significant differences were observed between gene families, and genes mediating interaction with the biotic environment harbor exceptional polymorphism levels.
See Also:
Download slides:
mlsb07_ratsch_dcs.pdf (6.1 MB)
Launch in a standalone WM Player
Switch to Windows Media Player
Link this page
Would you like to put a link to this lecture on your homepage?Go ahead! Copy the HTML snippet !




Reviews and comments:
Sound problems in the beginning at least through 4 minutes. - must be lecture.. I tested my system and it's okay.
Sound starts to be engaged at 5:00 minutes
Write your own review or comment: