Bioinformatics Challenge: Learning in Very High Dimensions with Very Few Samples
published: Feb. 25, 2007, recorded: January 2005, views: 7448
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Dedicated machine learning procedures have already become an integral part of modern genomics and proteomics. However, these very high dimensional and low learning sample tasks often stretch these procedures well beyond natural boundaries of their applicability. A few such challenges will be a subject of this series of lectures. We will start with a brief overview of classification of genomics (microarray) data. In particular we shall discuss, in some detail, examples of applications to cancer genomics and proteomics. Then we concentrate on a phenomenon of anti-learning, a case of supervised classification where standard supervised learning techniques systematically produce classifiers perfect on learning sample but with independent test error rates higher than that of the default (random) classification rule. The examples of natural and synthetic anti-learning data will be given and analysed from the stand point of implications to practical supervised and unsupervised classification. A series of practical tutorials will be organized in parallel. Participants will be exposed to classification of microarray data including first-hand experience with anti-learning.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !