Domain-Independent Quality Measures for Crowd Truth Disagreement
published: Nov. 28, 2013, recorded: October 2013, views: 2194
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Using crowdsourcing platforms such as CrowdFlower and Amazon Mechanical Turk for gathering human annotation data has become now a mainstream process. Such crowd involvement can reduce the time needed for solving an annotation task and with the large number of annotators can be a valuable source of annotation diversity. In order to harness this diversity across domains it is critical to establish a common ground for quality assessment of the results. In this paper we report our experiences for optimizing and adapting crowdsourcing microtasks across domains considering three aspects: (1) the micro-task template, (2) the quality measurements for the workers judgments and (3) the overall annotation workow. We performed experiments in two domains, i.e. events extraction (MRP project) and medical relations extraction (Crowd-Watson project). The results conrm our main hypothesis that some aspects of the evaluation metrics can be dened in a domainindependent way for micro-tasks that assess the parameters to harness the diversity of annotations and the useful disagreement between workers. This paper focuses specically on the parameters relevant for the 'event extraction' ground-truth data collection and demonstrates their reusability from the medical domain.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !