Extreme Multi-label Loss Functions for Reommencdation, Tagging, Ranking & Other Missing Label Applications

author: Himanshu Jain, Indian Institute of Technology Delhi
published: Sept. 25, 2016,   recorded: August 2016,   views: 1488

Related Open Educational Resources

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.


The choice of the loss function is critical in extreme multi- label learning where the objective is to annotate each data point with the most relevant subset of labels from an ex- tremely large label set. Unfortunately, existing loss func- tions, such as the Hamming loss, are unsuitable for learning, model selection, hyperparameter tuning and performance evaluation. This paper addresses the issue by developing propensity scored losses which: (a) prioritize predicting the few relevant labels over the large number of irrelevant ones; (b) do not erroneously treat missing labels as irrelevant but instead provide unbiased estimates of the true loss func- tion even when ground truth labels go missing under arbi- trary probabilistic label noise models; and (c) promote the accurate prediction of infrequently occurring, hard to pre- dict, but rewarding tail labels. Another contribution is the development of the PfastreXML algorithm (code available from [1]) which efficiently scales to large datasets with up to 9 million labels, 70 million points and 2 million dimensions and which gives significant improvements over the state-of- the-art.

This paper’s results also apply to tagging, recommenda- tion and ranking which are the motivating applications for extreme multi-label learning. They generalize previous at- tempts at deriving unbiased losses under the restrictive as- sumption that labels go missing uniformly at random from the ground truth. Furthermore, they provide a sound the- oretical justification for popular label weighting heuristics used to recommend rare items. Finally, they demonstrate that the proposed contributions align with real world ap- plications by achieving superior clickthrough rates on spon- sored search advertising in Bing.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: