Some Challenging Machine Learning Problems in Computational Biology: Time-Varying Networks Inference and Sparse Structured Input-Out Learning
published: Jan. 15, 2009, recorded: November 2008, views: 7917
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Recent advances in high-throughput technologies such as microarrays and genome-wide sequencing have led to an avalanche of new biological data that are dynamic, noisy, heterogeneous, and high-dimensional. They have raised unprecedented challenges in machine learning and high-dimensional statistical analysis; and their close relevance to human health and social welfare has often created unique demands on performance metric different from standard data mining or pattern recognition problems. In this talk, I will discuss two of such problems. First, I will present a new statistical formalism for modeling network evolution over time, and several new algorithms based on temporal extensions of the sparse graphical logistic regression, for parsimonious reverse-engineering the latent time varying networks. I will show some promising results on recovering the latent sequence of temporally rewiring gene networks over more than 4000 genes during the life cycle of Drosophila melanogaster from microarray time course, at a time resolution only limited by sample frequency. Second, I will present a family of sparse structured regression models in the context of uncovering true associations between linked genetic variations (inputs) in the genome and networks of human traits (outputs) in the phenome. If time allows, I will also present another class of new models known as the maximum entropy discrimination Markov networks, which address the same problem in the maximum margin paradigm, but using a entropic regularizer that lead to a distribution of structured prediction functions that are simultaneously primal and dual sparse (i.e., with few support vectors, and of low effective feature dimension).
Joint work with Amr Ahmed, Seyoung Kim, Mladen Kolar, Le Song and Jun Zhu.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !