Mining the Web to Facilitate Fast and Accurate Approximate Match
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Tasks relying on recognizing entities have recently received significant attention in the literature. Many such tasks assume the existence of reference entity tables. In this paper, we consider the problem of determining whether a candidate string approximately matches with a reference entity. This problem is important for extracting named entities such as products or locations from a reference entity table, or matching entity entries across heterogenous sources. Prior approaches have relied on string-based similarity which only compare a candidate string and an entity it matches with. In this paper, we observe that considering such evidence across multiple documents significantly improves the accuracy of matching. We develop efficient techniques which exploit web search engines to facilitate approximate matching in the context of our proposed similarity functions. In an extensive experimental evaluation, we demonstrate the accuracy and efficiency of our techniques.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !