TCS: Efficient Topic Discovery over Crowd-oriented Service Data
published: Oct. 7, 2014, recorded: August 2014, views: 2429
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
In recent years, with the widespread usage of Web 2.0 techniques, crowdsourcing plays an important role in offering human intelligence in various service websites, such as Yahoo! Answer and Quora. With the increasing amount of crowd-oriented service data, an important task is to analyze latest hot topics and track topic evolution over time. However, the existing techniques in text mining cannot effectively work due to the unique structure of crowd-oriented service data, task-response pairs, which consists of the task and its corresponding responses. In particular, existing approaches become ineffective with the ever-increasing crowd-oriented service data that accumulate along the time. In this paper, we first study the problem of discovering topics over crowd-oriented service data. Then we propose a new probabilistic topic model, the Topic Crowd Service Model (TCS model), to effectively discover latent topics from massive crowd-oriented service data. In particular, in order to train TCS efficiently, we design a novel parameter inference algorithm, the Bucket Parameter Estimation (BPE), which utilizes belief propagation and a new sketching technique, called Pairwise Sketch (pSketch). Finally, we conduct extensive experiments to verify the effectiveness and efficiency of the TCS model and the BPE algorithm.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !