en
0.25
0.5
0.75
1.25
1.5
1.75
2
Analyzing Word Frequencies in Large Text Corpora using Inter-arrival Times and Bootstrapping
Published on Nov 30, 20112737 Views
Comparing frequency counts over texts or corpora is an important task in many applications and scientific disciplines. Given a text corpus, we want to test a hypothesis, such as "word X is frequent",
Related categories
Chapter list
Motivation00:00
Data00:19
Problem setting01:17
Binomial test (bag-of-words model) - 101:54
Binomial test (bag-of-words model) - 202:56
Binomial test (bag-of-words model) - 303:14
Many words are bursty04:03
Proposed method 1: Inter-arrival times - 105:32
Proposed method 1: Inter-arrival times - 207:11
Proposed method 2: Bootstrapping08:24
Comparison for sergeant09:09
Example: frequency thresholds09:46
Finding significant news events11:43
Conclusion13:23