Mining the Web to Facilitate Fast and Accurate Approximate Match

author: Surajit Chaudhuri, Microsoft Research
author: Venkatesh Ganti, Microsoft Research
author: Dong Xin, Microsoft Research
published: May 20, 2009,   recorded: April 2009,   views: 4047


Related Open Educational Resources

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.


Tasks relying on recognizing entities have recently received significant attention in the literature. Many such tasks assume the existence of reference entity tables. In this paper, we consider the problem of determining whether a candidate string approximately matches with a reference entity. This problem is important for extracting named entities such as products or locations from a reference entity table, or matching entity entries across heterogenous sources. Prior approaches have relied on string-based similarity which only compare a candidate string and an entity it matches with. In this paper, we observe that considering such evidence across multiple documents significantly improves the accuracy of matching. We develop efficient techniques which exploit web search engines to facilitate approximate matching in the context of our proposed similarity functions. In an extensive experimental evaluation, we demonstrate the accuracy and efficiency of our techniques.

See Also:

Download slides icon Download slides: www09_xin_mtw_01.pptx (294.6┬áKB)

Help icon Streaming Video Help

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Reviews and comments:

Comment1 trainner, May 31, 2009 at 2:34 p.m.:

Good content, but his English is sucking.

Write your own review or comment:

make sure you have javascript enabled or clear this field: