Reasoning, Attention and Memory

author: Sumit Chopra, Facebook
published: Aug. 23, 2016,   recorded: August 2016,   views: 13340


Related Open Educational Resources

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.


The machine learning community has had great success in the last decades at solving basic prediction tasks such as text classification, image annotation and speech recognition. However, solutions to deeper reasoning tasks have remained elusive. A key component towards achieving deeper reasoning is the use of long term dependencies as well as short term context during inference. Until recently, most existing machine learning models have lacked an easy way to read and write to part of a (potentially very large) long-term memory component, and to combine this seamlessly with inference. To combine memory with reasoning, a model must learn how to access it, i.e. to perform *attention* over its memory.

Within the last year or so, there has been some notable progress in this area however. Models developing notions of attention have shown positive results on a number of real-world tasks such as machine translation and image captioning. There has also been a surge in building models of computation which explore differing forms of explicit storage. Towards that end, I’ll shed some light on a set of models that fall in this category. In particular, I’ll discuss the Memory Networks, and its application to a wide variety of tasks, such as, question answering based on simulated stories, cloze style question answering, and dialog modeling. I’ll also talk about their subsequently proposed variants, including, End2End Memory Networks and Key Value Memory Networks. In addition, I will also talk about Neural Turing Machines, and Stack Augmented Recurrent Neural Networks. Throughout the talk I’ll discuss the advantages and disadvantages of each of these models and their variants. I will conclude with a discussion on what is still lacking among these models and potential open problems.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: