Object Identification by Statistical Methods

author: Hans-Joachim Lenz, Free University
published: Feb. 25, 2007,   recorded: October 2004,   views: 6141

Related Open Educational Resources

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.


Numerical data fusion or merging of overlapping data files becomes a hard problem if no global unique identifying keys exist in the corresponding data sets. Typical examples are the linkage of address files supplied from different sources for commercial purposes - a money making area-, the merging of special offers in various media (cf. duplicate detection), or an administrative record census (ARC) as planed in Germany, where several autonomous, heterogeneous registers are to be merged. We present a three-step procedure consisting of the steps conversion of attributes, comparison of values of a pair of objects, and classification ('matching problem') of pairs either as "same" or "matched and "not same" or "not matched". We pay special attention to the quality and the efficiency of the methodology. We briefly discuss questions like correctness and completeness as well as pre-selection techniques like 'blocking' to reduce the computational complexity of pairwise comparisons. The approach is illustrated on data from carefully composed benchmark data sets. We assume some basic knowledge in computer science and classification (supervised learning).

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: