Incorporating Structure in Deep Learning

author: Raquel Urtasun, Department of Computer Science, University of Toronto
published: May 27, 2016,   recorded: May 2016,   views: 1764
Categories

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.
  Bibliography

Description

Deep learning algorithms attempt to model high-level abstractions of the data using architectures composed of multiple non-linear transformations. A multiplicity of variants have been proposed and shown to be extremely successful in a wide variety of applications including computer vision, speech recognition as well as natural language processing. In this talk I’ll show how to make these representations more powerful by exploiting structure in the outputs, the loss function as well as in the learned embeddings.

Many problems in real-world applications involve predicting several random variables that are statistically related. Graphical models have been typically employed to represent and exploit the output dependencies. However, most current learning algorithms assume that the models are log linear in the parameters. In the first part of the talk I’ll show a variety of algorithms that can learn arbitrary functions while exploiting the output dependencies, unifying deep learning and graphical models.

Supervised training of deep neural nets typically relies on minimizing cross-entropy. However, in many domains, we are interested in performing well on metrics specific to the application domain. In the second part of the talk I’ll show a direct loss minimization approach to train deep neural networks, which provably minimizes the task loss. This is often non-trivial, since these loss functions are neither smooth nor decomposable and thus are not amenable to optimization with standard gradient-based methods. I’ll demonstrate the applicability of this general framework in the context of maximizing average precession, a structured loss commonly used to evaluate ranking problems.

Deep learning has become a very popular approach to learn word, sentence and/or image embeddings. Neural embeddings have shown great performance in tasks such as image captioning, machine translation and paraphrasing. In the last part of my talk I’ll show how to exploit the partial order structure of the visual semantic hierarchy over words, sentences and images to learn order embeddings. I’ll demonstrate the utility of these new representations for hypernym prediction and image-caption retrieval.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Reviews and comments:

Comment1 Bobby, June 24, 2016 at 12:01 a.m.:

Oeh, slides?

Write your own review or comment:

make sure you have javascript enabled or clear this field: