CLARIN Workshop: Digital Youth in East Asia: Theoretical, Methodological and Technical Issues, Brussels 2017
This day-long workshop teaches basic natural language processing techniques for large volumes of online text (from websites, blogs, forums, social media, etc.) to researchers in the social sciences and humanities. It was the final day in a three-day conference entitled ‘Digital Youth in East Asia: Theoretical, Methodological, and Technical Issues’, organised by the East Asian Studies research unit at the Université Libre de Bruxelles. English was the predominant language analysed, in order to demonstrate the tools and methods via a lingua franca, although resources for Chinese, Korean, and Japanese will also be introduced. A dataset of YouTube comments from popular Korean pop music videos has been collected for the analysis.
Online text is qualitatively different from offline text, and many traditional corpus methods do not directly translate to online material. Moreover, the concept of 'big data' is closely connected with the internet, so issues of scale make quantitative approaches more necessary. How can we find patterns in online text? What are the opportunities, and what are the main challenges and constraints? These patterns can relate to sentiment, topic, or simple frequencies, all of which the workshop will cover (using the Python programming language). The workshop will also introduce participants to CLARIN resources and tools that are relevant to a) the analysis of large volumes of digital discourse and b) East Asian languages.
The workshop took place at the Université Libre de Bruxelles on Wednesday, 11 October, 2017 to Friday, 13 October, 2017.