event thumbnail image
2nd European Semantic Technology Conference

Getting at the Semantics of Texts

author: Hans Uszkoreit, Saarland University

Description

As semantic technologies keep evolving and maturing, there is growing concern about the gigantic wealth of knowledge encoded in so-called unstructured data. Actually the bulk of human knowledge on the web (and in books) is represented in texts. Not even the most optimistic proponents of semantic representation standards expect that this information will be rewritten or extensively complemented by semantic meta-data through intellectual labour. On the other hand, there is a discipline of science and technology called computational linguistics that has been concerned for several decades with the automatic analysis of human language. One of the original goals of this field was the automatic understanding of texts by translating them into a knowledge representation language that machines could use for reasoning. However, through sobering experience of the complexity of this task most applied computational linguists turned to easier challenges. There is now a wide variety of human language technologies, many of which have enabled new types of products. Among these applications are text classification, email response systems, text-to-speech software, grammar checking and statistical machine translation.

In this presentation, however, the state of the art and recent achievements in two strands of language technology will be explained and illustrated by examples. One of them is the automatic extraction of semantic relations, or more precisely of relation instances, from large volumes of texts. Such relation instances could be events, properties of objects, or opinions on products. Using results from our own research, I will demonstrate how machine learning techniques were combined with existing advanced language analysis methods for improving such an analysis beyond the best results achievable by either one of these approaches alone. I will also show how the semantic domain models can be utilized for improving the performance of the relation extraction.

The second strand of research to be presented is the deep syntactic and semantic analysis of human language. While most computational linguists had turned away from this fundamental challenge in favour of lower hanging fruit, a few groups continued the quest for text understanding. Because of the size of the problem and the desire to develop techniques that would work for more than language, several of them teamed up in international collaborations. I will briefly describe the two largest international collaborations in this area, the DELPH-IN initiative dedicated to deep language processing with HPSG and the PARGRAM initiative pursuing the same goal by LFG. HPSG and LFG are two advanced formal models of linguistic description developed in the seventies and eighties of last century. The results of the PARGRAM initiative were lead by PARC and are among the central assets of the search technology company Powerset which was recently acquired by Microsoft. The results of the DELPH-IN initiative are collected in growing a open-source repository of research resources. I will explain the significance of recent advances by these two consortia and related research activities.

In the conclusion of the talk I will argue that a combination of the machine-learning approach to relation extraction with the advances of the deep linguistic processing research will open the way to an exploitation of large volumes of unstructured textual data by semantic technologies.

You might be experiencing some problems with Your Video player.
Slides
0:00 Getting at the Semantics of Texts
2:09 Outline
2:50 Do We Have Artificial Intelligence? - 1
3:33 Do We Have Artificial Intelligence? - 2
4:06 Do We Have Artificial Intelligence? - 3
5:07 Our Success May not Be Sweeping...
5:19 Types of Information Extraction in LT - 1
5:51 Types of Information Extraction in LT - 2
5:56 The Problem of Grammar Acquisition
6:55 Main Methods - 1
8:20 Main Methods - 2
8:58 Relevant Related Work and Inspirations
11:09 Two Approaches to Seed Construction by Bootstrapping
12:39 Our Approach: DARE - 1
13:32 Our Approach: DARE - 2
13:57 Our Approach: DARE - 3
14:06 Two Domains
16:08 Nobel Prize Awards
16:56 Rules Are Learned from the Linguistic Structure - 1
17:37 Rules Are Learned from the Linguistic Structure - 2
18:05 Rules Are Learned from the Linguistic Structure - 3
18:37 Rule Components
19:09 Pattern Extraction Step 1
20:19 Pattern Extraction Step 2
20:36 Seed Complexity and Sentence Extent
21:14 Experiments
22:25 Evaluation of Nobel Prize Domain
22:32 Evaluation Against Ideal Tables
23:41 Iteration Behavior (Seed vs. Rule)
23:59 Management Succession Domain
24:29 Comparison
25:21 Reusability of Rules
25:57 The Dream
26:13 Research Questions
26:26 Start of Bootstrapping (simplified)
27:05 Abstraction
27:51 Questions
27:51 Two Distributions - 1
27:58 Two Distributions - 2
28:13 Distribution of Mentionings to Events
28:15 Scale-Free Networks
28:29 Example of Scale-Free Nets
28:41 Small-World Property
28:42 Airline Route Networks
28:56 Motorway Route Networks
29:03 Airline Route Networks
29:04 Small-World Property
29:09 Airline Route Networks
29:10 Motorway Route Networks
29:11 Small Worlds for Bootstrapping
29:12 Instance to Pattern
29:51 Rules to Instances
30:06 If We Find a Large World with Continents and Islands...
30:23 Approaches to Solve the Problem
31:06 Other Discovered Award Events
32:30 Further Approaches
33:21 Next Steps
33:32 Experiment with other Domains
35:19 Improving Recall
36:59 Improving Precision
37:31 Problems with Knowledge Representation
38:35 Problems with Formal Grammars
38:57 Sour Grapes
40:06 A Contradiction
41:23 What Has Changed for Knowledge Processing?
41:53 What Has Changed for Deep Processing?
42:10 Three Traditions
43:02 Grammar
43:07 A Big Difference
43:15 The Dream is Living On
43:46 The DELPH-IN Initiative
44:38 HPSG
44:40 Start of the Cooperation
44:47 Efficiency Problem Solved
45:01 Still not Robust...
45:48 Hybrid NLP
46:51 Coverage Extension
48:15 Conclusion and Outlook
48:49 - Questions

Lecture rating

People found this lecture:
Worth seeing
because it is:
 Valuable and informative
Well presented
Easily understandable
Acceptably recorded
You need to login to cast your vote.

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: