Filtering Multi-Lingual Terrorist Content with Graph-Theoretic Classifi-cation Tools
author:
Mark Last,
Ben-Gurion University of the Negev
Description
Since the web is increasingly used by terrorist organizations, the
ability to automatically detect multi-lingual terrorist-related content is
extremely important. In this talk, we present an efficient detection
methodology based on the recently developed graph-based web document
representation models. Evaluation is performed on corpora in English and
Arabic languages.
You might be experiencing some problems with Your Video player.
| Slides | |
| 0:00 | Filtering Multi-Lingual Terrorist Content with Graph-Theoretic Classification Tools |
| 0:34 | Outline |
| 1:32 | Important Preliminaries (1) |
| 3:03 | Important Preliminaries (2) |
| 3:36 | Propaganda in Arabic Organization: Palestinian Islamic Jihad (1) |
| 5:00 | Propaganda in Arabic Organization: Palestinian Islamic Jihad (2) |
| 5:52 | Propaganda in Russian Organization: Hamas |
| 6:31 | Propaganda in English and Hebrew Organization: Hezbollah |
| 8:14 | Training Materials How to build a rocket engine |
| 9:06 | Training Materials (cont.) How to prepare a highly explosive acetone peroxide (the stuff is very serious) |
| 10:39 | Tactical Orders (?) |
| 12:07 | Challenges in Filtering Terrorist Content |
| 15:29 | Web Document Representation and Categorization |
| 15:40 | Text Categorization (TC) Basic Definition |
| 16:15 | Text Categorization (TC) Tasks |
| 17:02 | The Vector-Space Model (Salton et al., 1975) |
| 18:03 | The “Bag of Words” Approach A Practical Example (1) |
| 18:21 | The “Bag of Words” Approach A Practical Example (2) |
| 18:53 | The “Bag of Words” Approach A Practical Example (3) |
| 21:58 | Advantages of the Vector-Space Model (based on Joachims, 2002) |
| 22:41 | Limitations of the Vector-Space Model |
| 24:05 | DIVIDE ET IMPERA (“Divide and Rule”) The Word Separation in the Ancient Latin |
| 24:48 | Alternative Representation of Multilingual Web Documents: |
| 25:13 | Relevant Definitions (Based on Bunke and Kandel, 2000) |
| 26:28 | The Graph-Based Model of Web Documents |
| 27:27 | The Standard Representation |
| 28:40 | The Simple Representation |
| 29:00 | The n-distance Representation |
| 29:27 | The n-simple Representation |
| 29:49 | The Absolute Frequency Representation |
| 30:02 | The Relative Frequency Representation |
| 30:10 | Graph Based Document Representation – Detailed Example Source: www.cnn.com, May 24, 2005 |
| 30:38 | Graph Based Document Representation - Parsing |
| 30:58 | Graph Based Document Representation - Preprocessing (1) |
| 31:07 | Graph Based Document Representation - Preprocessing (2) |
| 31:13 | Standard Graph Based Document Representation |
| 32:00 | Simple Graph Based Document Representation |
| 33:29 | “Lazy” Categorization with Graph-Based Models |
| 35:05 | Distance between two Graphs |
| 35:36 | Relevant Definitions (Based on Bunke and Kandel, PRL, 2000) |
| 36:07 | More Graph-Theoretic Definitions (1) |
| 36:14 | More Graph-Theoretic Definitions (2) |
| 36:24 | More Graph-Theoretic Definitions (3) |
| 36:49 | More Graph-Theoretic Definitions (cont.) (1) |
| 36:59 | More Graph-Theoretic Definitions (cont.) (2) |
| 37:16 | More Graph-Theoretic Definitions (cont.) (3) |
| 37:23 | MMCSN Distance Measure between two Graphs |
| 38:38 | k-Nearest Neighbors with Graphs |
| 40:16 | The Hybrid Approach to Document Categorization (Markov et al., 2006) |
| 41:59 | Predictive Model Induction with Hybrid Representation |
| 42:44 | Frequent Subgraph Extraction Example |
| 44:21 | Frequent Subgraph Extraction: Complexity |
| 48:10 | Case Study 1 |
| 48:16 | Document Collection |
| 49:00 | Preprocessing of Documents in Arabic |
| 50:10 | Accuracy Results |
| 51:27 | Resulting Decision Tree |
| 53:07 | Does the word الصهيوني (“Zionist”) indicate a terrorist document? |
| 54:17 | Case Study 2 - Categorization of Terrorist Web Documents in English |
| 54:24 | Document Collection |
| 54:53 | Results for the Hybrid Smart Approach Maximum Graph Size: 100 Nodes |
| 55:33 | Resulting Decision Tree Subgraph Frequency Threshold: 0.55 |
| 56:59 | Conclusions |
| 57:43 | Future Work |
| 58:23 | References (1) |
| 58:53 | References (2) |
| 59:10 | References (3) |
| 59:30 | Thank you! |
Lecture rating
| People found this lecture: | ||
| Worth seeing | ||
| because it is: | ||
| Valuable and informative | ||
| Well presented | ||
| Easily understandable | ||
| Acceptably recorded | ||
| You need to login to cast your vote. | ||
Report a problem or upload files
If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Related content
Visitors who watched this lecture also watched...
Link this page
Would you like to put a link to this lecture on your homepage?Go ahead! Copy the HTML snippet !





Map-Reduce Technique and Its Applications for IR (MRT)