Creation of Standards for Social Media Corpora: a Digital Humanities Topic Par Excellence
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Even though empirical research of computer-mediated communication (CMC) has a tradition of almost two decades, there are still only very few annotated CMC/social media corpora which are available to the scientific community and the public. The major reason for that situation is the lack of standards and tools for collecting, representing, annotating and providing resources of that type. One crucial issue is the unclear legal situation w.r.t. CMC/social media data. On the example of a legal expertise sought for the integration of an existing German chat corpus into CLARIN-D, the talk will highlight this issue (according to German law) and describe how it has been handled in the project. Another crucial issue arises from the fact that, due to the distinct communicative characteristics of CMC/social media discourse, standards and tools for the representation and annotation of text corpora can not be adopted for CMC/social media corpora without modifications. The creation of standards and the adaptation of NLP tools for that new type of language resource is a digital humanities topic par excellence since (1) it focuses on data which are born digital while at the same time (2) it requires a combination of expertise from humanities and computational sciences.
Download slides: clarinplusworkshop2017_beisswenger_social_media_01.pdf (1.1 MB)
Download slides: clarinplusworkshop2017_beisswenger_social_media_01.pdf (207.6 KB)
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !