Stream Data Mining: A Big Data Perspective
published: Oct. 12, 2016, recorded: August 2016, views: 1383
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Data streams are continuous flows of data. Examples of data streams include network traffic, sensor data, call center records and so on. Data streams demonstrate several unique properties that together conform to the characteristics of big data (i.e., volume, velocity, variety and veracity) and add challenges to data stream mining. In this talk we will present an organized picture on how to handle various data mining techniques in data streams. Most existing data stream classification techniques ignore one important aspect of stream data: arrival of a novel class. We address this issue and propose a data stream classification technique that integrates a novel class detection mechanism into traditional classifiers, enabling automatic detection of novel classes before the true labels of the novel class instances arrive. Novel class detection problem becomes more challenging in the presence of concept-drift, when the underlying data distributions evolve in streams. In this talk we will show how to make fast and correct classification decisions under this constraint with limited labeled training data and apply them to real benchmark data. In addition, we will present a number of stream classification applications such as adaptive malicious code detection, website fingerprinting, evolving insider threat detection and textual stream classification. This research was funded in part by NSF, NASA, Air Force Office of Scientific Research (AFOSR) and Raytheon.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !