CloudMatcher: A Cloud/Crowd Service for Entity Matching
published: Dec. 1, 2017, recorded: August 2017, views: 747
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Entity matching (EM) nds disparate data instances that refer to the same real-world entity. EM is critical in health informatics, and will become even more so in the age of Big Data and data science. Many EM systems have been developed. In this paper, we rst discuss why it is still very dicult for domain scientists to use such EM systems. We then describe CloudMatcher, a cloud/crowd service for EM that we have been building. CloudMatcher aims to be a fast, easy-to-use, scalable, and highly available EM service on the Web. We motivate CloudMatcher then describe its design and implementation. Next, we describe its deployment in the past six months, providing a detailed analysis of its performance over four representative datasets. Finally, we discuss lessons learned.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !