Reading Tea Leaves: How Humans Interpret Topic Models
published: Jan. 19, 2010, recorded: December 2009, views: 6700
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Probabilistic topic models are a commonly used tool for analyzing text data, where the latent topic representation is used to perform qualitative evaluation of models and guide corpus exploration. Practitioners typically assume that the latent space is semantically meaningful, but this important property has lacked a quantitative evaluation. In this paper, we present new quantitative methods for measuring semantic meaning in inferred topics. We back these measures with large-scale user studies, showing that they capture aspects of the model that are undetected by measures of model quality based on held-out likelihood. Surprisingly, topic models which perform better on held-out likelihood may actually infer less semantically meaningful topics.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !