event thumbnail image
The 13th International Conference on Knowledge Discovery and Data Mining

Knowledge Discovery of Multiple-topic Document using Parametric Mixture Model with Dirichlet Prior

author: Issei Sato , University of Tokyo

Description

Documents, such as those seen onWikipedia and Folksonomy, have tended to be assigned with multiple topics as a meta-data. Therefore, it is more and more important to analyze a relationship between a document and topics assigned to the document. In this paper, we proposed a novel probabilistic generative model of documents with multiple topics as a meta-data. By focusing on modeling the generation process of a document with multiple topics, we can extract specific properties of documents with multiple topics. Proposed model is an expansion of an existing probabilistic generative model: Parametric Mixture Model (PMM). PMM models documents with multiple topics by mixing model parameters of each single topic. Since, however, PMM assigns the same mixture ratio to each single topic, PMM cannot take into account the bias of each topic within a document. To deal with this problem, we propose a model that considers Dirichlet distribution as a prior distribution of the mixture ratio. We adopt Variational Bayes Method to infer the bias of each topic within a document. We evaluate the proposed model and PMM using MEDLINE corpus. The results of F-measure, Precision and Recall show that the proposed model is more effective than PMM on multiple-topic classification. Moreover, we indicate the potential of the proposed model that extracts topics and document-specific keywords using information about the assigned topics.

You might be experiencing some problems with Your Video player.
Slides
0:00 Knowledge Discovery of Multiple-topic Document using Parametric Mixture Model with Dirichlet Prior
0:18 Contents 1
0:26 What kind of relationship between a document and topics exists?
1:38 Contents 2
1:42 Probabilistic Generative Model
3:14 PMM-part01
4:14 PMM-part02
4:45 Contents 3
4:48 Dirichlet distribution on mixture ratio (π1,π2,π3)
6:09 Estimate of Mixture Ratio
9:03 Graphical Model
9:42 Contents 4
9:45 Multiple-topic classification
10:11 Evaluation by F-measure
11:39 F-measure:PDMM vs PMM 1/2
12:35 Precision:PDMM vs PMM
12:47 Recall:PDMM vs PMM
12:54 F-measure:PDMM vs PMM 2/2
13:48 Contents 5
13:54 Word Ranking
14:57 [Female(0.499)], [Male(0.460)] [Biological Markers(0.041)]
16:08 [Rats(0.411)], [Child(0.352)] [Incidence(0.237)]
16:23 [Female(0.442)], [Animals(0.437)] [Pregnancy(0.066)],[Glucose(0.055)]
17:11 [Pregnancy(0.502)],[Glucose(0.498)]
17:59 Summary
18:27 [Thank] [you] [for] [listening] [!]

Lecture rating

People found this lecture:
Worth seeing
because it is:
 Valuable and informative
Well presented
Easily understandable
Acceptably recorded
You need to login to cast your vote.

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment: