Big Natural Language Data Processing

author: Gabor Melli, OpenGov, Inc.
author: Matt Seal, OpenGov, Inc.
published: Sept. 16, 2016,   recorded: August 2016,   views: 1610

Related Open Educational Resources

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.

 Watch videos:   (click on thumbnail to launch)

Watch Part 1
Part 1 57:31
Watch Part 2
Part 2 1:21:16


The automated processing of large volumes of text data has become a mission critical capability in a wide-range of industries. Current tools enabled data scientists to produce every more impactful NLP systems. Getting these systems started however, can still be challenging.

This tutorial will review the best-practice data-driven methods, tools, and resources for many common applications that require the processing of high volume, high velocity and/or mixed veracity textual data. We will show PDF content extraction, distributed ETL patterns, and NLP tooling with working code samples (mostly in Python) using the Spark and AWS Lambda distributed computing platforms.

When completed participants will understand and be able to prototype components of an end-to-end NLP system that achieve baseline results. They will also be shown advanced strategies that can be expanded and further explored without re-implementing the underlying baseline infrastructure.We assume that audience members have a general understanding of textual data, NLP applications, and data science methods.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: