Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation
author:Shuang-Hong Yang, Georgia Institute of Technology
published: Sept. 14, 2009, recorded: July 2009, views: 38
Slides
Related content
24:28
121 views - Laurence A. F. Park, 2009
13:11
108 views - Rohini K Srihari, 2009
37:04
108 views - Loulwah AlSumait, 2009
26:08
45 views - Guangyu Zhu, 2009
23:09
16 views - Shuang-Hong Yang, 2009
01:24:49
3265 views - Zoubin Ghahramani, 1970
01:34:49
6373 views - Yee Whye Teh, 2007
01:05:42
4846 views - Michael I. Jordan, 2005
25:32
399 views - Jie Tang, 2006
05:54
153 views - Jiawei Han, 2009
Report a problem or upload files
If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Description
This paper addresses Named Entity Mining (NEM), in which we mine knowledge about named entities such as movies, games, and books from a huge amount of data. NEM is potentially useful in many applications including web search, online advertisement, and recommender system. There are three challenges for the task: finding suitable data source, coping with the ambiguities of named entity classes, and incorporating necessary human supervision into the mining process. This paper proposes conducting NEM by using click-through data collected at a web search engine, employing a topic model that generates the click-through data, and learning the topic model by weak supervision from humans. Specifically, it characterizes each named entity by its associated queries and URLs in the click-through data. It uses the topic model to resolve ambiguities of named entity classes by representing the classes as topics. It employs a method, referred to as Weakly Supervised Latent Dirichlet Allocation (WS-LDA), to accurately learn the topic model with partially labeled named entities. Experiments on a large scale click-through data containing over 1.5 billion query-URL pairs show that the proposed approach can conduct very accurate NEM and significantly outperforms the baseline.
See Also:
Download slides:
kdd09_yang_nemctduwslda_01.ppt (1.1 MB)
Launch in a standalone WM Player
Switch to Windows Media Player
Link this page
Would you like to put a link to this lecture on your homepage?Go ahead! Copy the HTML snippet !




Write your own review or comment: