Discovering Common Sequence Variation in Arabidopsis thaliana
published: Nov. 20, 2007, recorded: September 2007, views: 471
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
In order to characterize natural sequence variation in 20 strains of the model plant Arabidopsis thaliana, whole-genome resequencing with high-density oligonucleotide arrays was performed in collaboration with Perlegen Sciences Inc. Array data were analyzed with a combination of existing model-based (MB; Hinds et al., Science, 2005) and novel machine learning (ML) methods. For the identification of single nucleotide polymorphisms (SNPs) we developed an algorithm based on support vector machines. Training and evaluation was done on published alignments (Nordborg et al., PLoS Biology, 2005). At the same false discovery rates (FDR) as MB, the ML algorithm identifies significantly more true SNPs, especially in regions of high polymorphism density and/or low hybridization quality. The union of SNP predictions from both methods contains on average 143,572 SNPs per strain at a FDR of 2.8% (648,570 non-redundant SNPs). Furthermore, a machine learning algorithm was developed to detect polymorphic regions containing insertions, deletions and variational hotspots, where SNP detection algorithms typically fail to identify individual SNPs. It discovers the approximate location of a substantial additional proportion of polymorphisms (54% of deleted nucleotides and 33% of insertion sites). With a combination of all three methods 74% of SNPs can be directly called or are contained in a polymorphic region prediction (Zeller et al., in preparation). We examined the patterns of and forces shaping sequence variation in Arabidopsis (Clark et al., Science, 2007): e.g. significant differences were observed between gene families, and genes mediating interaction with the biotic environment harbor exceptional polymorphism levels.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !