event thumbnail image
NATO Advanced Study Institute on Mining Massive Data Sets for Security

Filtering Multi-Lingual Terrorist Content with Graph-Theoretic Classifi-cation Tools

author: Mark Last, Ben-Gurion University of the Negev

Description

Since the web is increasingly used by terrorist organizations, the ability to automatically detect multi-lingual terrorist-related content is extremely important. In this talk, we present an efficient detection methodology based on the recently developed graph-based web document representation models. Evaluation is performed on corpora in English and Arabic languages.

You might be experiencing some problems with Your Video player.
Slides
0:00 Filtering Multi-Lingual Terrorist Content with Graph-Theoretic Classification Tools
0:34 Outline
1:32 Important Preliminaries (1)
3:03 Important Preliminaries (2)
3:36 Propaganda in Arabic Organization: Palestinian Islamic Jihad (1)
5:00 Propaganda in Arabic Organization: Palestinian Islamic Jihad (2)
5:52 Propaganda in Russian Organization: Hamas
6:31 Propaganda in English and Hebrew Organization: Hezbollah
8:14 Training Materials How to build a rocket engine
9:06 Training Materials (cont.) How to prepare a highly explosive acetone peroxide (the stuff is very serious)
10:39 Tactical Orders (?)
12:07 Challenges in Filtering Terrorist Content
15:29 Web Document Representation and Categorization
15:40 Text Categorization (TC) Basic Definition
16:15 Text Categorization (TC) Tasks
17:02 The Vector-Space Model (Salton et al., 1975)
18:03 The “Bag of Words” Approach A Practical Example (1)
18:21 The “Bag of Words” Approach A Practical Example (2)
18:53 The “Bag of Words” Approach A Practical Example (3)
21:58 Advantages of the Vector-Space Model (based on Joachims, 2002)
22:41 Limitations of the Vector-Space Model
24:05 DIVIDE ET IMPERA (“Divide and Rule”) The Word Separation in the Ancient Latin
24:48 Alternative Representation of Multilingual Web Documents:
25:13 Relevant Definitions (Based on Bunke and Kandel, 2000)
26:28 The Graph-Based Model of Web Documents
27:27 The Standard Representation
28:40 The Simple Representation
29:00 The n-distance Representation
29:27 The n-simple Representation
29:49 The Absolute Frequency Representation
30:02 The Relative Frequency Representation
30:10 Graph Based Document Representation – Detailed Example Source: www.cnn.com, May 24, 2005
30:38 Graph Based Document Representation - Parsing
30:58 Graph Based Document Representation - Preprocessing (1)
31:07 Graph Based Document Representation - Preprocessing (2)
31:13 Standard Graph Based Document Representation
32:00 Simple Graph Based Document Representation
33:29 “Lazy” Categorization with Graph-Based Models
35:05 Distance between two Graphs
35:36 Relevant Definitions (Based on Bunke and Kandel, PRL, 2000)
36:07 More Graph-Theoretic Definitions (1)
36:14 More Graph-Theoretic Definitions (2)
36:24 More Graph-Theoretic Definitions (3)
36:49 More Graph-Theoretic Definitions (cont.) (1)
36:59 More Graph-Theoretic Definitions (cont.) (2)
37:16 More Graph-Theoretic Definitions (cont.) (3)
37:23 MMCSN Distance Measure between two Graphs
38:38 k-Nearest Neighbors with Graphs
40:16 The Hybrid Approach to Document Categorization (Markov et al., 2006)
41:59 Predictive Model Induction with Hybrid Representation
42:44 Frequent Subgraph Extraction Example
44:21 Frequent Subgraph Extraction: Complexity
48:10 Case Study 1
48:16 Document Collection
49:00 Preprocessing of Documents in Arabic
50:10 Accuracy Results
51:27 Resulting Decision Tree
53:07 Does the word الصهيوني (“Zionist”) indicate a terrorist document?
54:17 Case Study 2 - Categorization of Terrorist Web Documents in English
54:24 Document Collection
54:53 Results for the Hybrid Smart Approach Maximum Graph Size: 100 Nodes
55:33 Resulting Decision Tree Subgraph Frequency Threshold: 0.55
56:59 Conclusions
57:43 Future Work
58:23 References (1)
58:53 References (2)
59:10 References (3)
59:30 Thank you!

Lecture rating

People found this lecture:
Worth seeing
because it is:
 Valuable and informative
Well presented
Easily understandable
Acceptably recorded
You need to login to cast your vote.

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment: