Big Natural Language Data Processing
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
The automated processing of large volumes of text data has become a mission critical capability in a wide-range of industries. Current tools enabled data scientists to produce every more impactful NLP systems. Getting these systems started however, can still be challenging.
This tutorial will review the best-practice data-driven methods, tools, and resources for many common applications that require the processing of high volume, high velocity and/or mixed veracity textual data. We will show PDF content extraction, distributed ETL patterns, and NLP tooling with working code samples (mostly in Python) using the Spark and AWS Lambda distributed computing platforms.
When completed participants will understand and be able to prototype components of an end-to-end NLP system that achieve baseline results. They will also be shown advanced strategies that can be expanded and further explored without re-implementing the underlying baseline infrastructure.We assume that audience members have a general understanding of textual data, NLP applications, and data science methods.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !