Learnable Representations for Natural Language

author: Alexander Clark, Royal Holloway, University of London
published: Jan. 19, 2010,   recorded: December 2009,   views: 6524


Related Open Educational Resources

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.


The Chomsky hierarchy was explicitly intended to represent the hypotheses from distributional learning algorithms; yet these standard representations are well known to be hard to learn, even under quite benign learning paradigms, because of the computationally complexity of inferring rich hidden structures like trees.

There is a lot of interest in unsupervised learning of natural language -- current approaches (e.g. Klein and Manning, Johnson's Adaptor Grammars) use modifications of existing models such as tree or dependency structures together with sophisticated statistical models in order to recover structures that are as close as possible to gold standard manual annotations.

This tutorial will cover a different approach: recent algorithms for the unsupervised learning of representations of natural language based on distributional learning (Clark & Eyraud 2007; Clark, Eyraud and Habrard, 2008; Clark 2009). This research direction involves abandoning the standard models and designing new representation classes for formal languages that are richly structured but where the structure is not hidden but based on observable structures of the language -- the syntactic monoid or a lattice derived from that monoid. These representation classes are as a result easy to learn.

We will look briefly at algorithms for learning deterministic automata, and then move on to algorithms for learning context free and context sensitive languages. These algorithms explicitly model the distribution of substrings of the language: they are efficient (polynomial update time) and provably correct for a class of languages that includes all regular languages, many context free languages and a few context sensitive languages. This class may be rich enough to represent natural language syntax.

See Also:

Download slides icon Download slides: nipsworkshops09_clark_lrnl_01.pdf (888.1┬áKB)

Help icon Streaming Video Help

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: