Practice of Efficient Data Collection via Crowdsourcing at Large-Scale
published: March 2, 2020, recorded: August 2019, views: 5
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
In this tutorial, we present you a portion of unique industrial practical experience on efficient data labeling via crowdsourcing shared by both leading researchers and engineers from Yandex. Majority of ML projects require training data, and often this data can only be obtained by human labelling. Moreover, the more applications of AI appear, the more nontrivial tasks for collecting human labelled data arise. Production of such data in a large-scale requires construction of a technological pipeline, what includes solving issues related to quality control and smart distribution of tasks between workers.
We will make an introduction to data labeling via public crowdsourcing marketplaces and will present key components of efficient label collection. This will be followed by a practical session, where participants will choose one of real label collection tasks, experiment with selecting settings for the labelling process, and launch their label collection project at Yandex.Toloka, one of the largest crowdsourcing marketplace. The projects will be run on real crowds within the tutorial session. Finally, participants will receive a feedback about their projects and practical advices to make them more efficient. We invite beginners, advanced specialists, and researchers to learn how to collect labelled data with good quality and do it efficiently.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !