Cross-Lingual Classification of Crisis Data
published: Nov. 22, 2018, recorded: October 2018, views: 247
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Many citizens nowadays flock to social media during crises to share or acquire the latest information about the event. Due to the sheer volume of data that is typically circulated during such events, it is necessary to have the ability to efficiently filter out irrelevant posts, and thus focus attention to the posts that are truly of relevance to the crisis. Recent research experimented with various statistical, and semantic, methods to automatically classify relevant and irrelevant posts to a given crisis or set of crises. However, it is unclear how such approaches perform when the posts about a crisis are generated in different languages. The typical approach is train the model for each language, but this is costly, time consuming, and not a viable option for rapidly evolving crisis situations. In this paper we test statistical and semantic classification approaches on cross-lingual datasets from 30 crisis events, consisting of posts written mainly in English, Spanish, and Italian. We experiment with scenarios where the model is trained on one language, and tested on another, and where the data is translated to a single language. We show that the addition of semantic features extracted from external knowledge bases show increases in accuracy over the statistical model.
Download slides: iswc2018_khare_cross_lingua_classification_01.pdf (2.9 MB)
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !