Predictive Modelling in the Wild: Success Factors in Data Mining Competitions and Real-World Applications
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
In this tutorial, we give our perspective on the keys to success in application of predictive modeling to competitions like KDD Cup and real-life business intelligence projects. We argue that these two modes of applying predictive modeling share many similarities, but have also some important differences. We discuss the main success factors in predictive modeling: domain understanding, statistical acumen, and appropriate algorithmic approaches. We describe our relevant experiences in the context of three recent predictive modeling competitions where our team has had success (KDD Cup 2007 and 2008 and INFORMS DM challenge 2008) and two case studies of projects we have led at IBM Research. We also survey some of the recurring challenges and complexities in practical predictive modeling applications. One key issue is information leakage, and we discuss its definition, influence, detection and avoidance. We consider leakage to be the silent killer of many predictive modeling projects, and we demonstrate its impact on the competitions, and discuss the challenges in addressing it in the real-life projects. Other challenges include framing real-life modeling objectives into predictive modeling, and usefully applying relational learning concepts when modeling "real-life" complex, relational datasets.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !