Semantic Data Mining
author: Anže Vavpetič, Department of Knowledge Technologies, Jožef Stefan Institute
author: Agnieszka Ławrynowicz, Institute of Computing Science, Poznan University of Technology
author: Jędrzej Potoniec, Institute of Computing Science, Poznan University of Technology
author: Melanie Hilario, Geneva Artificial Intelligence Laboratory, University of Geneva
author: Alexandros Kalousis, University of Geneva
author: Nada Lavrač, Department of Knowledge Technologies, Jožef Stefan Institute
published: Nov. 29, 2011, recorded: September 2011, views: 910
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
The term semantic data mining denotes a data mining approach where domain ontologies are used as background knowledge. Such approach is motivated by large amounts of data that are increasingly becoming openly available and described using real-life ontologies represented in Semantic Web languages, arguably most extensively in the domain of biology. This recently opened up the possibility for interesting large-scale and real-world semantic applications.
The availability of semantically annotated data poses requirements for new kinds of approaches for data mining that would be able to deal with the complexity, and expressivity of the semantic representation languages, leverage on availability of ontologies and explicit semantics of the described resources, and account for novel assumptions (e.g., open world) that underlie reasoning services exploiting ontologies.
The tutorial addresses the above issues, focusing on the problems of how machine learning techniques can work directly on the richly structured Semantic Web data, exploit ontologies, and the Semantic Web technologies, what is the value added of machine learning methods exploiting ontologies, and what are the challenges for developers of semantic data mining methods. It also contains demonstrations of tools supporting semantic data mining.
The tutorial presents the topic of semantic data mining from three complementary perspectives.
Firstly, it presents a general framework for semantic data mining, following the work [NVTL09]. The first part of the tutorial also discusses a new method for semantic subgroup discovery: g-SEGS. It is accompanied with a presentation of the developed tool, a part of Orange4WS environment.
The second part of tutorial covers the topic of learning from description logics (DL-learning), motivated by the fact that the standard Web ontology language, OWL, is theoretically based on description logics. This includes a demo of a tool supporting DL-learning (a plugin to the Rapid Miner system).
Finally, the third part of the tutorial covers the topic of semantic meta-mining. This approach has three features that distinguish it from its predecessors. First, more than in previous work, it adopts a process-oriented approach where meta-learning is applied to support design choices at different stages of the complete data mining process or workflow. Second, it complements dataset descriptions with an in-depth analysis and characterization of algorithms—their underlying assumptions, optimization goals and strategies, the models and patterns they generate. Finally, it relies on a data mining ontology which distills extensive background knowledge concerning knowledge discovery itself.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !