Predicting anti-cancer molecule activity using machine learning algorithms

author: Jose Santos, Imperial College London
published: April 17, 2008,   recorded: March 2008,   views: 419300


Related Open Educational Resources

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.


In this paper we study the anti-cancer activity of - 4.000 unique compounds against a set of 60 cell lines (e.g. Leukemia, Prostate, Breast). Small molecules play an important role in biology as they can be used as building blocks for more complex molecules and also interact with proteins inhibiting or promoting their action. In this case the consequence of adding such a compound to a cell can be far reaching as the protein may be involved in a very complex chain reaction. As such it is possible to design small molecules which can be useful drugs. Here we concentrate only in predicting a property of a given molecule: whether it will show anti-cancer activity (measured as causing at least 50% cell growing inhibition) against a given cancerous cell line. This computational prediction is important as there are a growing number of small molecules in databases worldwide and the capacity for proper lab testing is limited. For instance, the In Vitro Cell Line Screening Project at the National Cancer Institute (NCI) can currently evaluate (only) up to 3000 compounds per year for potential anti-cancer activity. From a machine learning perspective, biological problems are a good application because datasets are abundant, the data is real, the type of algorithms most suitable for a particular problem may vary substantial and it is not unusual for a problem to highlight research needs in machine learning. Finally, helping to solve biological problems may have a big impact in the wider scientific community. The molecule dataset we used is publicly available at the NCI site. We applied a range of data mining classification algorithms to this problem: Decision Trees, Inductive Logic Programming and Support Vector Machines (SVMs). As molecular features used for the learning we have used molecular weight, octanol water partition coefficient (logp) and fragment counts. A fragment is a set of connected atoms where each atom in a fragment is simply identified by its type. (e.g. carbon). If we look at the molecule as a graph, the fragment list consists of all connected components with diameter two. The experiments demonstrate that our results using support vector machines (with RBF kernel) are identical to previous published state of the art work yielding an average 73% predictive accuracy (having 54% as the baseline). We noticed however, to our surprise, that if instead of using fragment counts we use only atom counts the results are nearly identical (about 1% less accuracy, although the diference is statistical significant). An important point that must be made is that, although numerical black box algorithms like SVMs tend to be slightly more accurate than logic models (Decision Trees and ILPs in this dataset have an accuracy 3% to 4% below SVMs), it is arguable the relevance of this predictive accuracy for important practical applications like drug design. In a drug design setting what is useful is to have a set of rules that describe what a "good" compound should look like. That goal is much easily achieved with a human readable logic model like the ones we also describe in the paper.

See Also:

Download slides icon Download slides: licsb08_santos_pam_01.pdf (97.7┬áKB)

Help icon Streaming Video Help

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Reviews and comments:

Comment1 mohama zaefi , February 16, 2012 at 1:47 p.m.:

I am student
i need your full paper about Predicting anti-cancer molecule activity using machine learning algorithms.

Comment2 Loren Alba, January 17, 2020 at 7:22 p.m.:

Can anyone who is expert in fix the bugs because In my browser your video shows this message. "Your browser does not support playback of available video formats. Please install Adobe Flash player or upgrade to a more modern browser." I have mentioned it as quoted form so if anyone who no how to fix this video play error so tell me then I will watch this video. I am marketer doing work to sale phone cases by this website and doing research for my learning because I love to read new things.

Comment3 AlexRotagowski, January 17, 2021 at 10:03 p.m.:


Comment4 Noah, January 17, 2021 at 10:07 p.m.:

Nice!A very useful and informative lecture. I recently wrote a research paper on a similar topic, and it was exciting and challenging research. For this reason, I think no one will blame me for using the writing service I found through reviews <a href=>WritingJudge</a> . With the help of these guys, I wrote an excellent research paper and got a high score.

Comment5 Noah, January 17, 2021 at 10:07 p.m.:

I was searching for this information on [url=]google[/url]

Comment6 Noah, January 17, 2021 at 10:11 p.m.:

Thank you for this!

Comment7 Timothy Harris, January 17, 2021 at 10:13 p.m.:

An insightful and very useful expression. I wrote a research paper recently about a similar problem, and research was exciting and difficult. Therefore, I trust that nobody can blame me for the writing service that I find in the reviews . I wrote an outstanding research paper with the aid of these guys and got a high rating.

Comment8 lotusrited, April 14, 2021 at 2:32 p.m.:

This is a very interesting topic. I spend a lot of time in my favorite casino and machine learning techniques are also used there. I am looking for the best casinos here . There it is written about the most profitable bonus offers that casinos present.

Comment9 zindagimeregharanaonline , July 28, 2021 at 1:50 p.m.:

Write your own review or comment:

make sure you have javascript enabled or clear this field: