Discovering Common Sequence Variation in Arabidopsis thaliana

author: Gunnar Rätsch, Max Planck Institute
published: Nov. 20, 2007,   recorded: September 2007,   views: 5188


Related Open Educational Resources

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.


In order to characterize natural sequence variation in 20 strains of the model plant Arabidopsis thaliana, whole-genome resequencing with high-density oligonucleotide arrays was performed in collaboration with Perlegen Sciences Inc. Array data were analyzed with a combination of existing model-based (MB; Hinds et al., Science, 2005) and novel machine learning (ML) methods. For the identification of single nucleotide polymorphisms (SNPs) we developed an algorithm based on support vector machines. Training and evaluation was done on published alignments (Nordborg et al., PLoS Biology, 2005). At the same false discovery rates (FDR) as MB, the ML algorithm identifies significantly more true SNPs, especially in regions of high polymorphism density and/or low hybridization quality. The union of SNP predictions from both methods contains on average 143,572 SNPs per strain at a FDR of 2.8% (648,570 non-redundant SNPs). Furthermore, a machine learning algorithm was developed to detect polymorphic regions containing insertions, deletions and variational hotspots, where SNP detection algorithms typically fail to identify individual SNPs. It discovers the approximate location of a substantial additional proportion of polymorphisms (54% of deleted nucleotides and 33% of insertion sites). With a combination of all three methods 74% of SNPs can be directly called or are contained in a polymorphic region prediction (Zeller et al., in preparation). We examined the patterns of and forces shaping sequence variation in Arabidopsis (Clark et al., Science, 2007): e.g. significant differences were observed between gene families, and genes mediating interaction with the biotic environment harbor exceptional polymorphism levels.

See Also:

Download slides icon Download slides: mlsb07_ratsch_dcs.pdf (6.1 MB)

Help icon Streaming Video Help

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Reviews and comments:

Comment1 Steven Work, November 14, 2008 at 4:34 p.m.:

Sound problems in the beginning at least through 4 minutes. - must be lecture.. I tested my system and it's okay.

Sound starts to be engaged at 5:00 minutes

Write your own review or comment:

make sure you have javascript enabled or clear this field: