SOFIE: Self-Organizing Flexible Information Extraction
author: Mauro Sozio, Max Planck Institute for Computer Science, Max Planck Institute
author: Gerhard Weikum, Max Planck Institute for Informatics, Max Planck Institute
published: May 20, 2009, recorded: April 2009, views: 5199
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
This paper presents SOFIE, a system that can extend an existing ontology by new facts. SOFIE provides a integrative framework, in which information extraction, word disambiguation and semantic reasoning all become part of one unifying model. SOFIE processes text or Web sources and finds meaningful patterns. It maps the words in the pattern to entities in the ontology. It hypothesizes on the meaning of the pattern, and checks the semantic plausibility of the hypothesis with the existing ontology. Then the new fact is added to the ontology, avoiding inconsistency with the existing facts. The logical model that connects existing facts, new hypotheses, extraction patterns, and consistency constraints is represented as a set of propositional clauses. We use an approximation algorithm for the Weighted MAX SAT problem to compute the most plausible subset of hypotheses. Thereby, the SOFIE framework integrates the paradigms of pattern matching, entity disambiguation, and ontological reasoning into one unified model, and enables the automated growth of large ontologies. Experiments, using the YAGO ontology as existing knowledge and various text and Web corpora as input sources, show that our method yields very good precision around 90 percent or higher.
Download slides: www09_suchanek_sofie_01.ppt (1.2 MB)
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !
Write your own review or comment: