Stumping along a summary
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
The methods we used to compete in the « Exploration & Exploitation » challenge are based on three layers. The first layer provides an online summary of the data stream for continuous and nominal data. Continuous data are handled using the Greenwald and Khanna online quantile summary which provides error guarantees for a fixed memory size. Nominal data are summarized with a hash-based counting structure. With these techniques we managed to build an accurate stream summary with a small memory footprint. The second layer uses the summary to build predictors. We explored several kinds of trees from simple decision stumps to deep multivariate ones. The stumps proved to be remarkably stable and efficient. But on the other hand, a progressive unfolding of the trees seemed to improve the model on the long run. For the last layer, we explored several combination strategies: online bagging, exponential weighting, linear ranker, etc. We observed a tradeoff between the expressiveness of the predictors and the power of the combination strategy but most strategies being difficult to tune, we went back to a simple averaging. It seems, from our experiments, that both the need for exploration and the click scarcity sharpens the need for very stable models.
Download slides: explorationexploitation2011_salperwyck_urvoy_contr_01.pdf (1.3 MB)
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !