2nd Workshop on Linked Data Quality (LDQ2015)

About

Since the start of the Linked Open Data (LOD) Cloud, we have seen an unprecedented volume of structured data published on the web, in most cases as RDF and Linked (Open) Data. The integration across this LOD Cloud, however, is hampered by the ‘publish first, refine later’ philosophy. This is due to various quality problems existing in the published data such as incompleteness, inconsistency, incomprehensibility, etc. These problems affect every application domain, be it scientific (e.g., life science, environment), governmental, or industrial applications. We see linked datasets originating from crowdsourced content like Wikipedia and OpenStreetMap such as DBpedia and LinkedGeoData and also from highly curated sources e.g. from the library domain. Quality is defined as “fitness for use”, thus DBpedia currently can be appropriate for a simple end-user application but could never be used in the medical domain for treatment decisions. However, quality is a key to the success of the data web and a major barrier for further industry adoption.

Despite the quality in Linked Data being an essential concept, few efforts are currently available to standardize how data quality tracking and assurance should be implemented. Particularly in Linked Data, ensuring data quality is a challenge as it involves a set of autonomously evolving data sources. Additionally, detecting the quality of datasets available and making the information explicit is yet another challenge. This includes the (semi-)automatic identification of problems. Moreover, none of the current approaches uses the assessment to ultimately improve the quality of the underlying dataset.

Jul 15, 2015

1823 views

Lecture Series