Analyzing Word Frequencies in Large Text Corpora using Inter-arrival Times and Bootstrapping thumbnail
Pause
Mute
Subtitles
Playback speed
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Full screen

Analyzing Word Frequencies in Large Text Corpora using Inter-arrival Times and Bootstrapping

Published on Nov 30, 20112737 Views

Comparing frequency counts over texts or corpora is an important task in many applications and scientific disciplines. Given a text corpus, we want to test a hypothesis, such as "word X is frequent",

Related categories

Chapter list

Motivation00:00
Data00:19
Problem setting01:17
Binomial test (bag-of-words model) - 101:54
Binomial test (bag-of-words model) - 202:56
Binomial test (bag-of-words model) - 303:14
Many words are bursty04:03
Proposed method 1: Inter-arrival times - 105:32
Proposed method 1: Inter-arrival times - 207:11
Proposed method 2: Bootstrapping08:24
Comparison for sergeant09:09
Example: frequency thresholds09:46
Finding significant news events11:43
Conclusion13:23