Modeling Multilingual Grounded Language

author: Bill Dolan, Microsoft Research
published: Jan. 11, 2013,   recorded: December 2012,   views: 2990


Related Open Educational Resources

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.


Language allows us endlessly creative ways to express the same basic meaning, whether through monolingual paraphrasing or through bilingual translation. That expressive power, though, poses huge challenges for computational approaches to language understanding: how can we model the relationship between strings that are superficially dissimilar and yet “mean the same thing”?

This problem is becoming particularly acute as software interfaces move toward simpler, more natural interactions that often crucially rely on linguistic input. An intelligent interface needs to be able to decide, for example, that the following utterances all describe approximately the same user intent:

Show me some formal dress shoes
I need some shoes for a job interview
Ik wil kleedschoenen
Търся официални обувки
我需要一些正式 􀀀合穿的鞋子

In addition to the challenge of modeling such mono- and multi-lingual alternations on a broad scale, we also need to learn to ground language in the real world – in this case, to an inventory of shoe styles, for instance, and show a set of relevant images/records to the user. How can we reliably map natural language utterances – no matter how they are expressed – to appropriate changes in machine state? This talk will describe and modeling program aimed at capturing how different expressions of the same meaning – whether within one language or across multiple languages - are grounded in the real world. In addition, we will describe and demonstrate the results of crowdsourcing experiments aimed at building multilingual datasets that are grounded in video segments, database objects, programming functions, and human movement.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: