Extending Tables with Data from over a Million Websites
published: Dec. 19, 2014, recorded: October 2014, views: 4433
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
This Big Data Track submission demonstrates how the BTC 2014 dataset, Microdata annotations from thousands of websites, as well as millions of HTML tables are used to extend local tables with additional columns. Table extension is a useful operation within a wide range of application scenarios: Imagine you are an analyst having a local table describing companies and you want to extend this table with the headquarter of each company. Or imagine you are a film enthusiast and want to extend a table describing films with attributes like director, genre, and release date of each film. The Mannheim Search Joins Engine automatically performs such table extension operations based on a large data corpus gathered from over a million websites that publish structured data in various formats. Given a local table, the Mannheim Search Joins Engine searches the corpus for additional data describing the entities of the input table. The discovered data are then joined with the local table and their content is consolidated using schema matching and data fusion methods. As result, the user is presented with an extended table and given the opportunity to examine the provenance of the added data. Our experiments show that the Mannheim Search Joins Engine achieves a coverage close to 100% and a precision of around 90% within different application scenarios.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !