Combining Information Retrieval and Information Extraction for Medical Intelligence

author: Roman Yangarber, New York University (NYU)
published: Dec. 3, 2007,   recorded: September 2007,   views: 5649


Related Open Educational Resources

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.


Global epidemic and medical surveillance is an essential function of Public Health agencies, whose primary aim is to protect the public from major health threats. To perform this function effectively one requires timely and accurate medical information from a wide range of sources. In this work we present a system designed to monitor the disease epidemics by analyzing textual reports, mostly in the form of news, available on the Web. The system rests on two major components—MedISys, based on Information Retrieval (IR) technology, and PULS, an Information Extraction (IE) system. The Medical Information System, MedISys, is an automatic tool that gathers reports concerning Public Health from thousands of Internet sources world-wide in 32 languages, classifies them according to hundreds of categories, detects trends across categories and languages, and notifies users.MedISys compiles quantitative summaries of latest reports on a variety of diseases, bioterrorism, toxins, bacteria, hemorrhagic fevers, viruses, medicines, water contaminations, animal diseases, Public Health organisations, etc.3 The system categorises all documents according to about 200 classes of health threats, using pre-defined weighted boolean queries, or alerts. It uses statistical procedures to detect a sudden increase in the volume of articles in any of the classes. MedISys is part of the EuropeMediaMonitor (EMM) product family [2], developed at the EC’s Joint Research Centre (JRC), which also includes NewsBrief,4 a live news aggregation system, and NewsExplorer,5 a news summary and analysis system [1]. MedISys has already proved to be a useful and an effective tool, which attracts thousands of users daily. IE technology is a natural direction for further enhancing the functionality that MedISys offers. One reason for this is that IE is able to deliver information about specific incidents of the diseases, whereas IR returns entire matched documents (with an indication which alerts fired). Another reason is that IE could boost precision, since keyword-based queries may trigger on documents which are off-topic but happen to mention the alerts in unrelated contexts, while pattern matching in IE assures that the keywords appear in relevant contexts only.

See Also:

Download slides icon Download slides: mmdss07_yangarber_cir_01.pdf (997.4 KB)

Help icon Streaming Video Help

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: