event thumbnail image
NATO Advanced Study Institute on Mining Massive Data Sets for Security

Open Source Intelligence

author: Clive Best, Joint Research Centre

Description

Open Source Intelligence can be defined as the retrieval, extraction and analysis of information from publicly available sources. Each of these three processes is the subject of ongoing research resulting in specialised techniques. Today the largest source of open source information is the Internet. Most newspapers and news agencies have web sites with live updates on unfolding events, opinions and perspectives on world events are published. Most governments monitor news reports to feel the pulse of public opinion, and for early warning and current awareness of emerging crises. The phenomenal growth in knowledge, data and opinions published on the Internet requires advanced software tools which allow analysts to cope with the overflow of information. Malicious use of the Internet has also grown rapidly particularly on-line fraud, illegal content, virtual stalking, and various scams. These are all creating major challenges to security and law enforcement agencies. The alarming increase in the use of the Internet by extremist and Terrorist groups has emerged. The number of terrorist linked websites has grown from about 15 in 1998 to some 4500 today. These sites use slick multimedia to distil propaganda whose main purpose is to 1) enthuse and stir up rebellion in embedded communities 2) instill fear in the “enemy” and fight psychological warfare. Anonymous communication between terrorist cells via bulletin boards, chat rooms and email is also prevalent. The Joint Research Centre has developed significant experience in Internet content monitoring through its work on media monitoring (EMM) for the European Commission. EMM forms the core of the Commissions daily press monitoring service, and has also been adopted by the European Council Situation Centre for their ODIN system. A new research topic at the JRC is Web mining and open source intelligence. This applies EMM technology to the wider Internet and not just to news sites. This applies advanced multi-lingual search techniques to identify potential web resources and the extraction and download of all the textual content. This is then followed by automatic change detection, the recognition of places, names and relationships, and further analysis of the resultant large bodies of text. These tools help analysts to process large amounts of documents and derive structured data easier to analyse.

This talk will review 4 main topics:

• Internet trends and the rapid rise of Web 2.0 user generated content • Information retrieval: Live content monitoring of multilingual news reports. Web scraping & RSS feed generation, Web Mining and content monitoring • Information Extraction: Topic filtering, Topic Clustering, multilingual named entity extraction, geocoding and geolocating text, event extraction, opinion mining. • Information Analysis: Social Network derivation, geospatial indexing and analysis, incident tracking databases, statistical trend analysis, threat monitoring and assessment.

You might be experiencing some problems with Your Video player.
Slides
0:00 Open Source Intelligence
0:46 Acknowledgements
0:57 Programme
3:03 Open-Source Intelligence
4:51 Growth of Open Source Intelligence
7:23 OSINT - definitions
8:36 Intelligence Sources
8:56 Who uses OSINT
11:10 What is intelligence anyway ?
12:34 Typical Agency, Company, UNHQ, Council SitCen etc.
14:48 Who is involved ?
15:58 Information Overload
16:47 OSINT Cycle
17:01 OSINT Processes
19:27 Open Source Information Sources
21:08 Features of OSINF
22:53 OSINF is NOT JUST TEXT - image acquisition & analysis And Data Fusion
23:51 Image intelligence
24:41 Monitoring Iran Nuclear Programme (1)
25:34 Monitoring Iran Nuclear Programme (2)
25:57 Complementary search in Farsi
26:12 Monitoring Iran Nuclear Programme (3)
26:39 Damage Assessment with Satellite Imagery
26:55 Beirut - Damage to buildings (1)
27:11 Beirut - Damage to buildings (2)
27:44 Beirut - Damage to infrastructure
27:59 Web Mining & Intelligence: First Diversion : Understanding Web Trends
28:14 Evolving communications
31:23 Web 2.0 – convergence of 3 things
32:09 So-called Web 2 enabled by widespread deployment of broadband
33:56 However - still only 10 users per web server !
34:51 Explosion in User publishing (1)
35:50 Explosion in User publishing (2)
36:37 Explosion in User publishing (3)
37:24 Cultural Variations
38:28 Future Social Change
38:55 Second Diversion: Dark Side of the Web
39:47 Second Diversion: Dark Side of the Web
40:06 New Challenges….
40:45 New Responses …
40:57 The Internet as a Tool for Command & Control (1)
41:27 The Internet as a Tool for Command & Control (2)
41:59 The Internet as a Tool for Command & Control (3)
42:51 Secure Communication
43:57 Simpler Internet Tricks
45:17 More Tricks
45:36 Use of Internet Cafes
46:01 Information Terrorism (1)
46:47 Information Terrorism (2)
47:56 From Forum to Jihad
49:04 The Internet as a Tool of Psychological Warfare
49:21 The Internet as a Tool for Psychological Warfare (1)
50:39 The Internet as a Tool for Psychological Warfare (2)
51:09 The Internet as a Tool for Psychological Warfare (3)
51:24 Counter-measures
52:37 Implications
53:27 1. Web Mining and Intelligence - RealTime News Monitoring
54:31 Live Web Site Monitoring
55:00 News Monitoring - The Problem
56:47 N E W S - M I N I N G L A N D S C A P E
58:18 Live News Monitoring and Aggregation
59:00 News Fluxes have some special characteristics
60:40 News Site Monitoring
61:41 Update Frequencies
62:01 Headline check for each site – detect new articles.
63:59 RSS-Really Simple Syndication
64:44 RSS Feeds
65:36 EMM Overview

Lecture rating

People found this lecture:
Worth seeing
because it is:
 Valuable and informative
Well presented
Easily understandable
Acceptably recorded
You need to login to cast your vote.

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.

 Watch videos:   (click on thumbnail to launch)

Watch Part 1
Part 1 1:06:01
Flash video Slides Windows Media video

!NOW PLAYING
Watch Part 2
Part 2 0:23:47
Flash video Slides Windows Media video

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: