Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling
published: Sept. 14, 2009, recorded: June 2009, views: 72
Slides
Related content
24:14
265 views - Pinar Donmez, 2008
19:52
78 views - Leman Akoglu, 2009
37:04
109 views - Loulwah AlSumait, 2009
37:50
114 views - Maria Cristina Marinescu, 2009
17:03
155 views - Ariadna Quattoni, 2009
13:38
23 views - Shin Ando, 2009
18:44
164 views - Jure Leskovec, 2009
47:48
178 views - Maria Balcan, 2009
13:41
17 views - Elena Zheleva, 2009
18:43
144 views - Lexiang Ye, 2009
Report a problem or upload files
If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Description
Many scalable data mining tasks rely on active learning to provide the most useful accurately labeled instances. However, what if there are multiple labeling sources (`oracles' or `experts') with different but unknown reliabilities? With the recent advent of inexpensive and scalable online annotation tools, such as Amazon's Mechanical Turk, the labeling process has become more vulnerable to noise - and without prior knowledge of the accuracy of each individual labeler. This paper addresses exactly such a challenge: how to jointly learn the accuracy of labeling sources and obtain the most informative labels for the active learning task at hand minimizing total labeling effort. More specifically, we present IEThresh (Interval Estimate Threshold) as a strategy to intelligently select the expert(s) with the highest estimated labeling accuracy. IEThresh estimates a confidence interval for the reliability of each expert and filters out the one(s) whose estimated upper-bound confidence interval is below a threshold - which jointly optimizes expected accuracy (mean) and need to better estimate the expert's accuracy (variance). Our framework is flexible enough to work with a wide range of different noise levels and outperforms baselines such as asking all available experts and random expert selection. In particular, IEThresh achieves a given level of accuracy with less than half the queries issued by all-experts labeling and less than a third the queries required by random expert selection on datasets such as the UCI mushroom one. The results show that our method naturally balances exploration and exploitation as it gains knowledge of which experts to rely upon, and selects them with increasing frequency.
See Also:
Download slides:
kdd09_donmez_elalsss_01.ppt (3.3 MB)
Launch in a standalone WM Player
Switch to Windows Media Player
Link this page
Would you like to put a link to this lecture on your homepage?Go ahead! Copy the HTML snippet !




Write your own review or comment: