Bayesian Clustering for Email Campaign Detection
Description
We discuss the problem of clustering elements according to the sources that have generated them. For elements that are characterized
by independent binary attributes, a closed-form Bayesian solution exists. We derive a solution for the case of dependent attributes that is
based on a transformation of the instances into a space of independent feature functions. We derive an optimization problem that produces a
mapping into a space of independent binary feature vectors; the features can reflect arbitrary dependencies in the input space. This problem
setting is motivated by the application of spam filtering for email service providers. Spam traps deliver a real-time stream of messages
known to be spam. If elements of the same campaign can be recognized reliably, entire spam and phishing campaigns can be contained.
We present a case study that evaluates Bayesian clustering for this application.
| Slides | |
| 0:00 | Bayesian Clustering for Email Campaign Detection |
| 0:08 | Email Campaign Detection (1) |
| 1:37 | Email Campaign Detection (2) |
| 2:33 | Spam Filtering |
| 3:42 | Problem Setting |
| 4:17 | Outline |
| 5:38 | Bayesian Clustering with Independent Binary Features |
| 6:57 | Feature Transformation |
| 7:49 | How Do We Get ? |
| 9:10 | What Does Look Like? |
| 10:55 | How To Find The Optimal ? |
| 11:52 | Feature Transformation: Algorithm |
| 12:32 | Feature Transformation: Example |
| 13:40 | Sequential Bayesian Clustering |
| 14:27 | Case Study: Email Campaign Detection |
| 16:25 | Setting 1: Non-Spams from Test Distribution Available |
| 17:39 | Results (1) |
| 18:31 | Setting 2: No Non-Spams from Test Distribution Available |
| 19:03 | Results (2) |
| 19:35 | Conclusions |
| 20:38 | - Questions |
Lecture rating
| People found this lecture: | ||
| Worth seeing | ||
| because it is: | ||
| Valuable and informative | ||
| Well presented | ||
| Easily understandable | ||
| Acceptably recorded | ||
| You need to login to cast your vote. | ||
Report a problem or upload files
If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Related content
Link this page
Would you like to put a link to this lecture on your homepage?Go ahead! Copy the HTML snippet !



