Variant prioritization by genomic data fusion
published: May 13, 2014, recorded: April 2014, views: 2267
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
NGS has rapidly increased our ability to discover the cause of many previously unresolved rare monogenic disorders by sequencing rare exomic variation. However, after standard filtering against nonsynonymous single nucleotide variants (nSNVs) and loss-of-function mutations that are not present in healthy populations or unaffected samples, many potential candidate mutations are often retained and we need predictive methods to prioritize variants for further validation. Several computational methods have been proposed that take into account biochemical, evolutionary and structural properties of mutations to assess their potential deleteriousness. However, most of these methods suffer from high false positive rates when predicting the impact of rare nSNVs. A plausible explanation for this poor performance is that many of these predicted variants are mildly deleterious, but in no way specific to the disease of interest. We therefore propose a genomic data fusion methodology that integrates multiple strategies to detect deleteriousness of mutations and prioritizes them in a phenotype-specific manner. A key innovation is that we incorporate into our strategy a computational method for gene prioritization, which scores mutated genes based on their similarity to known disease genes by fusing heterogeneous genomic information. We also integrate haploinsufficiency prediction scores that predict the probability that the function of a gene is affected if present in a functionally haploid state. To integrate or fuse these data sources, we develop a machine-learning model using the Human Genome Mutation Database (HGMD) of human disease-causing mutations compared to three control sets: common polymorphisms and two independent sets of rare variation. Benchmarking on HGMD demonstrates that this integrative phenotype-specific variant prioritization significantly outperforms state-of-the-art predictors, such as SIFT or PolyPhen-2.
Download slides: mlpmsummerschool2013_moreau_genomic_data_fusion_01.pdf (11.9 MB)
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !