event thumbnail image
NATO Advanced Study Institute on Mining Massive Data Sets for Security

Combining Information Retrieval and Information Extraction for Medical Intelligence

author: Roman Yangarber, New York University

Description

Global epidemic and medical surveillance is an essential function of Public Health agencies, whose primary aim is to protect the public from major health threats. To perform this function effectively one requires timely and accurate medical information from a wide range of sources. In this work we present a system designed to monitor the disease epidemics by analyzing textual reports, mostly in the form of news, available on the Web. The system rests on two major components—MedISys, based on Information Retrieval (IR) technology, and PULS, an Information Extraction (IE) system. The Medical Information System, MedISys, is an automatic tool that gathers reports concerning Public Health from thousands of Internet sources world-wide in 32 languages, classifies them according to hundreds of categories, detects trends across categories and languages, and notifies users.MedISys compiles quantitative summaries of latest reports on a variety of diseases, bioterrorism, toxins, bacteria, hemorrhagic fevers, viruses, medicines, water contaminations, animal diseases, Public Health organisations, etc.3 The system categorises all documents according to about 200 classes of health threats, using pre-defined weighted boolean queries, or alerts. It uses statistical procedures to detect a sudden increase in the volume of articles in any of the classes. MedISys is part of the EuropeMediaMonitor (EMM) product family [2], developed at the EC’s Joint Research Centre (JRC), which also includes NewsBrief,4 a live news aggregation system, and NewsExplorer,5 a news summary and analysis system [1]. MedISys has already proved to be a useful and an effective tool, which attracts thousands of users daily. IE technology is a natural direction for further enhancing the functionality that MedISys offers. One reason for this is that IE is able to deliver information about specific incidents of the diseases, whereas IR returns entire matched documents (with an indication which alerts fired). Another reason is that IE could boost precision, since keyword-based queries may trigger on documents which are off-topic but happen to mention the alerts in unrelated contexts, while pattern matching in IE assures that the keywords appear in relevant contexts only.

You might be experiencing some problems with Your Video player.
Slides
0:00 Combining Information Retrieval and Information Extraction for Medical Intelligence
2:13 Outline
3:12 Users and motivation
5:36 Information vs. Intelligence
7:52 Combination of Technologies
8:21 Outline - MedISys: Information Retrieval
8:41 Medical Information System - MedISys
10:29 Public vs. restricted MedISys
12:40 MedISys - Objective
13:37 Current Subscribers to MedISys Alerts and Reports include
14:01 MedISys categories and category types
15:07 Filtering of Public Health-related news
16:08 Filtering news by language and sources
16:52 Aggregation of the multilingual ‘alert’ statistics (1)
17:42 Aggregation of the multilingual ‘alert’ statistics (2)
18:25 Aggregation of the multilingual ‘alert’ statistics (3)
18:42 Alerting functions
19:57 Alerting functions (2)
20:22 Medical events in MedISys
22:02 Outline - PULS: Information Extraction
22:20 MedISys - Beyond IR
23:58 Event Extraction review
24:11 Core IE Engine and Knowledge bases (KBs)
27:10 Example: Event extraction
30:36 IE and Semantics: Reference resolution
31:41 IE and Semantics: Elided attributes
32:57 PULS
34:26 PULS
42:29 Outline - MedISys/PULS Integration
42:35 MedISys/PULS Integration
44:08 MedISys + PULS
45:15 Outline - Information Aggregation
45:28 Toward Cross-Document Aggregation
47:51 Distribution of attribute values
51:15 Confidence
52:52 Utilizing Confidence
53:50 Aggregation into Outbreaks
57:02 Outline - Performance
57:59 Performance: some preliminary numbers
59:38 Evaluation of Confidence
61:22 Evaluation of Outbreak Aggregation
62:01 Evaluation of Confidence
62:03 Evaluation of Outbreak Aggregation
62:57 Outline - Current work
63:07 Improvements
67:46 - Questions
68:12 - Questions

Lecture rating

People found this lecture:
Worth seeing
because it is:
 Valuable and informative
Well presented
Easily understandable
Acceptably recorded
You need to login to cast your vote.

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment: