Néoveille - An automatic System for Lexical Units Life-Cycle Tracking

author: Emmanuel Cartier, University of Paris-Nord 13
published: July 27, 2018,   recorded: July 2018,   views: 3


This paper details methods, experiments in French and a software prototype designed to track lexical units life-cycle through newspapers monitor corpora. The Néoveille platform combines state-of-theart processes to detect and track linguistic changes and a web platform for linguists to create and manage their corpora, accept or reject automatically detected neologisms, describe linguistically the validated neologisms and follow their lifecycle on monitor corpora (Cartier, 2016). In this presentation, we will focus on the module dedicated to the life-cycle-tracking system. This task is challenging as it does not imply any creation of a new lexical item, but a new usage of an already existing lexical item. We propose to tackle this kind of change through four main parameters : • the relative frequency change of the lexical units through time : timeline series analysis have a long tradition in Business Analytics and mathematical models have been proposed to detect change points and trends from frequency data; corpus linguistics have also proposed several measures to tackle diachronic change from frequency data (Hilpert and Gries, 2016); we will present the results on several measures and analysis on a French contemporary monitor newspaper corpora; • change in the combinatorial profile of lexical units: previous approaches on “word sketch” (Kilgariff, 2004) or “behavioral profile” (Gries, 2012) have paved the way to the study of the semantic signature of lexical units through collocations and collostructions. We generalize these approaches to track combinatorial change at the lexical, lexico-syntactic and syntactic levels through the use of productivity measures applied to language models. We also propose to theoretically ground this approach on diachronic construction grammars, operationalizing so-called constructional change and constructionalization (Traugott and Trousdale, 2013). • change in the distributional profile of lexical units: the distributional semantic approach (Pantel et al., 2010; Baroni and Lenci, 2010) enables to semantically gather lexical units through similarity of contexts. The distributional semantics approach enables to detect semantic change by expliciting, from one period to another, different similar lexical units (Hamilton, 2016). We will present some results for French and current limitations; • diastratic and diatopic change: the last parameter enables to track the changes by keeping track of textual genres, domains and geographical metadata attached to documents where occur the lexical units, and in turn the changes in these parameters. For the above parameters, we will present experiments on a French contemporary corpora spanning 30 years, showing that every parameter is able to track specific changes, and that a combination of parameters enables a more fine-grained caracterization of lexical change. Automatic detection is offering to lexicographers a bunch of tools to track lexical units life-cycles, taking into account linguistic and socio-linguistic parameters. All results will be available on the project website.

