Information Genealogy: Uncovering the Flow of Ideas in Non-Hyperlinked Document Databases
published: Aug. 13, 2007, recorded: August 2007, views: 60
Related content
13:02
88 views - Benyah Shaparenko, 2006
15:07
344 views - Rong Ge, 2007
14:06
376 views - Ricardo Baeza-Yates, 2007
01:00:47
12624 views - David MacKay, 2006
15:18
1353 views - Yiping Ke, 2007
22:22
19 views - Benyah Shaparenko, 2009
04:59:19
18446 views - Sam Roweis, 2006
08:06
429 views - Pavel Berkhin, 2007
17:54
460 views - Qiaozhu Mei, 2007
16:44
43 views - Aleksander Kolcz, 2007
Report a problem or upload files
If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Description
We now have incrementally-grown databases of text documents ranging back for over a decade in areas ranging from personal email, to news-articles and conference proceedings. While accessing individual documents is easy, methods for overviewing and understanding these collections as a whole are lacking in number and in scope. In this paper, we address one such global analysis task, namely the problem of automatically uncovering how ideas spread through the collection over time. We refer to this problem as Information Genealogy. In contrast to bibliometric methods that are limited to collections with explicit citation structure, we investigate content-based methods requiring only the text and timestamps of the documents. In particular, we propose a language-modeling approach and a likelihood ratio test to detect influence between documents in a statistically well-founded way. Furthermore, we show how this method can be used to infer citation graphs and to identify the most influential documents in the collection. Experiments on the NIPS conference proceedings and the Physics ArXiv show that our method is more effective than methods based on document similarity.
Link this page
Would you like to put a link to this lecture on your homepage?Go ahead! Copy the HTML snippet !




Write your own review or comment: