Discriminative Topic Modeling based on Manifold Learning

author: Seungil Huh, Robotics Institute, School of Computer Science, Carnegie Mellon University
published: Oct. 1, 2010,   recorded: July 2010,   views: 5042
Categories

Slides

Related Open Educational Resources

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.
  Bibliography

Description

Topic modeling has been popularly used for data analysis in various domains including text documents. Previous topic models, such as probabilistic Latent Semantic Analysis (pLSA) and Latent Dirichlet Allocation (LDA), have shown impressive success in discovering low-rank hidden structures for modeling text documents. These models, however, do not take into account the manifold structure of data, which is generally informative for the non-linear dimensionality reduction mapping. More recent models, namely Laplacian PLSI (LapPLSI) and Locally-consistent Topic Model (LTM), have incorporated the local manifold structure into topic models and have shown the resulting benefits. But these approaches fall short of the full discriminating power of manifold learning as they only enhance the proximity between the low-rank representations of neighboring pairs without any consideration for non-neighboring pairs. In this paper, we propose Discriminative Topic Model (DTM) that separates non-neighboring pairs from each other in addition to bringing neighboring pairs closer together, thereby preserving the global manifold structure as well as improving the local consistency. We also present a novel model fitting algorithm based on the generalized EM and the concept of Pareto improvement. As a result, DTM achieves higher classification performance in a semi-supervised setting by effectively exposing the manifold structure of data. We provide empirical evidence on text corpora to demonstrate the success of DTM in terms of classification accuracy and robustness to parameters compared to state-of-the-art techniques.

See Also:

Download slides icon Download slides: kdd2010_huh_dtnbml_01.ppt (1.3┬áMB)


Help icon Streaming Video Help

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: