video thumbnail
Pause
Mute
Subtitles not available
Playback speed
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Full screen

Looking for a Needle in a Haystack: Semi-automatic Creation of a Latvian Multi-word Dictionary from Small Monolingual Corpora

Published on 2018-07-27498 Views

Multiword expressions (MWEs) are an indispensable part of almost any dictionary. However, the identification of missing MWEs that have recently appeared in a language is not a simple task. In this p

Related categories

Presentation

Looking for the Needle in a Haystack: Semi-automatic Creation of Latvian Multi-word Dictionary from Small Monolingual Corpora00:00
Multi-word expressions01:03
Full Stack of Language Resources for Natural Language Understanding and Generation in Latvian02:37
Tezaurs.lv - the largest open lexical database for Latvian - 105:11
Tezaurs.lv - the largest open lexical database for Latvian - 206:54
The aim of this study07:32
Strategies for MWE identification and extraction08:16
Limitation: rather small amount of data09:20
Application of statistical measures - 110:04
Application of statistical measures - 210:35
Application of Statistical Measures11:39
Lemmatization14:20
Filtering MWE Candidates16:31
Linguistic filters16:37
Results: Balanced Corpus of the Modern Latvian Language17:18
Limitation: 2-3 tokens17:55
t-score as measure for term extraction18:17
Extraction of verbal phrases18:40
Latvian-Lithuanian Corpus LiLa18:56
Latvian-Lithuanian Corpus19:22
Open Subtitles Corpus19:35
Conclusion19:41
Thank you!20:07