Populating the Semantic Web by Macro-Reading Internet Text

author:Tom Mitchell, School of Computer Science, Carnegie Mellon University
published: Nov. 24, 2009,   recorded: October 2009,   views: 386
You might be experiencing some problems with Your Video player.

Related content

Visitors who watched this lecture also watched...
01:02:31
BLOGIC or Now What's in a Link?

347 views - Pat Hayes, 2009
01:00:16
Present, Personalized and Precise: Defining Search for Web 3.0

164 views - Nova Spivack, 2009
01:01:24
Semisupervised Learning Approaches

6906 views - Tom Mitchell, 2006
08:36
Interview with Tom Mitchell

4834 views - Davor Orlič, Tom Mitchell, 2006
17:42
Coupled Semi-Supervised Learning for Information Extraction

156 views - Andrew Carlson, 2010
02:07:31
Patterns in Vector Spaces

2248 views - Elisa Ricci, 2009
58:52
Brains, Meaning and Corpus Statistics

236 views - Tom Mitchell, 2009
01:36:27
PhD Thesis Defense: Dynamics of large networks

14111 views - Jure Leskovec, 2008
01:04:57
Learning, Information Extraction and the Web

599 views - Tom Mitchell, 2007
05:52:44
Introduction to the Semantic Web

3827 views - Aldo Gangemi, Sean Bechhofer, Asunción Gómez-Pérez, Jim Hendler, 2008

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.

We are currently conducting a short survey. We value your feedback, and would appreciate if you took a few moments to respond to some questions. Click here to take the survey.

Description

A key question to the future of the semantic web is "how will we acquire structured information to populate the semantic web on a vast scale?" One approach is to enter this information manually. A second approach is to take advantage of the great deal of structured information already present in various databases, and to develop common ontologies, publishing standards, and reward systems to make this data widely accessible. We consider here a third approach: developing software that automatically extracts structured information from unstructured text present on the web.

This talk will survey attempts to extract structured knowledge from unstructured text, and will focus on an approach with three characteristics that we hypothesize make it viable. First, in contrast to the very difficult problem of reading information from a single document, we consider the much easier problem of reading hundreds of millions of documents simultaneously, so that our system can extract facts that are stated many times by combining evidence from many documents. Second, our system begins with a given ontology that defines the types of information to be extracted, enabling it to focus its effort and to ignore most of the text which is irrelevant to the target ontology. Third, the system uses a new class of semi-supervised learning algorithms to learn how to extract information from web pages -- algorithms designed to achieve greater accuracy when given more complex ontologies. Our experiments show that this approach can produce knowledge bases containing tens of thousands of facts to populate given ontologies with approximately 90% accuracy, starting with only a handful of labeled training examples and 200 million unlabeled web pages.

Link this page  

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: