0.25
0.5
0.75
1.25
1.5
1.75
2
Looking for a Needle in a Haystack: Semi-automatic Creation of a Latvian Multi-word Dictionary from Small Monolingual Corpora
Published on Jul 27, 2018494 Views
Multiword expressions (MWEs) are an indispensable part of almost any dictionary. However, the identification of missing MWEs that have recently appeared in a language is not a simple task. In this p
Related categories
Chapter list
Looking for the Needle in a Haystack: Semi-automatic Creation of Latvian Multi-word Dictionary from Small Monolingual Corpora00:00
Multi-word expressions01:03
Full Stack of Language Resources for Natural Language Understanding and Generation in Latvian02:37
Tezaurs.lv - the largest open lexical database for Latvian - 105:11
Tezaurs.lv - the largest open lexical database for Latvian - 206:54
The aim of this study07:32
Strategies for MWE identification and extraction08:16
Limitation: rather small amount of data09:20
Application of statistical measures - 110:04
Application of statistical measures - 210:35
Application of Statistical Measures11:39
Lemmatization14:20
Filtering MWE Candidates16:31
Linguistic filters16:37
Results: Balanced Corpus of the Modern Latvian Language17:18
Limitation: 2-3 tokens17:55
t-score as measure for term extraction18:17
Extraction of verbal phrases18:40
Latvian-Lithuanian Corpus LiLa18:56
Latvian-Lithuanian Corpus19:22
Open Subtitles Corpus19:23
Open Subtitles Corpus19:35
Conclusion19:41
Thank you!20:07