Functional Annotation of Human Protein Coding Isoforms via Non­convex Multi­Instance Learning

author: Tingjin Luo, National University of Defense Technology China
published: Oct. 9, 2017,   recorded: August 2017,   views: 1276

Related Open Educational Resources

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.


Functional annotation of human genes is fundamentally important for understanding the molecular basis of various genetic diseases. A major challenge in determining the functions of human genes lies in the functional diversity of proteins, that is, a gene can perform different functions as it may consist of multiple protein coding isoforms (PCIs). Therefore, differentiating functions of PCIs can significantly deepen our understanding of the functions of genes. However, due to the lack of isoform-level gold-standards (ground-truth annotation), many existing functional annotation approaches are developed at gene-level. In this paper, we propose a novel approach to differentiate the functions of PCIs by integrating sparse simplex projection—-that is, a nonconvex sparsity-inducing regularizer—-with the framework of multi-instance learning (MIL). Specifically, we label the genes that are annotated to the function under consideration as \emph{positive bags} and the genes without the function as \emph{negative bags}. Then, by sparse projections onto simplex, we learn a mapping that embeds the original bag space to a discriminative feature space. Our framework is flexible to incorporate various smooth and nonsmooth loss functions such as logistic loss and hinge loss. To solve the resulting highly nontrivial non-convex and nonsmooth optimization problem, we further develop an efficient block coordinate decent algorithm. Extensive experiments on human genome data demonstrate that the proposed approaches significantly outperform the state-of-the-art methods in terms of functional annotation accuracy of human PCIs and efficiency.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: