Cross-lingual infobox alignment in Wikipedia using Entity-Attribute Factor Graph
published: Nov. 28, 2017, recorded: October 2017, views: 996
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Wikipedia infoboxes contain information about article entities in the form of attribute-value pairs, and are thus a very rich source of structured knowledge. However, as the different language versions of Wikipedia evolve independently, it is a promising but challenging problem to find correspondences between infobox attributes in different language editions. In this paper, we propose 8 effective features for cross lingual infobox attribute matching containing categories, templates, attribute labels and values. We propose entity-attribute factor graph to consider not only individual features but also the correlations among attribute pairs. Experiments on the two Wikipedia data sets of English-Chinese and English-French show that proposed approach can achieve high F1-measure:85.5% and 85.4% respectively on the two data sets. Our proposed approach finds 23,923 new infobox attribute mappings between English and Chinese Wikipedia, and 31,576 between English and French based on no more than six thousand existing matched infobox attributes. We conduct an infobox completion experiment on English-Chinese Wikipedia and complement 76,498 (more than 30% of EN-ZH Wikipedia existing cross-lingual links) pairs of corresponding articles with more than one attribute-value pairs.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !