Crowd Truth: Machine-Human Computation Framework for Harnessing Disagreement in Gathering Annotated Data
published: Dec. 19, 2014, recorded: October 2014, views: 2317
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
In this paper we introduce the CrowdTruth open-source software framework for machine-human computation, that implements a novel approach to gathering human annotation data for a variety of media (e.g. text, image, video). The CrowdTruth approach embodied in the software captures human semantics through a pipeline of four processes: a) combining various machine processing of media in order to better understand the input content and optimize its suitability for micro-tasks, thus optimize the time and cost of the crowdsourcing process; b) providing reusable human-computing task templates to collect the maximum diversity in the human interpretation, thus collect richer human semantics; c) implementing ’disagreement metrics’, i.e. CrowdTruth metrics, to support deep analysis of the quality and semantics of the crowdsourcing data; and d) providing an interface to support data and results visualization. Instead of the traditional inter-annotator agreement, we use their disagreement as a useful signal to evaluate the data quality, ambiguity and vagueness. We demonstrate the applicability and robustness of this approach to a variety of problems across multiple domains. Moreover, we show the advantages of using open standards and the extensibility of the framework with new data modalities and annotation tasks.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !