A Dirty Model for Multi-task Learning

author: Ali Jalali, University of Texas at Austin
published: Jan. 12, 2011,   recorded: December 2010,   views: 6323


Related Open Educational Resources

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.


We consider the multiple linear regression problem, in a setting where some of the set of relevant features could be shared across the tasks. A lot of recent research has studied the use of L1 Lq norm block-regularizations with q and 1 for such (possibly) block-structured problems, establishing strong guarantees on recovery even under high-dimensional scaling where the number of features scale with the number of observations. However, these papers also caution that the performance of such block-regularized methods are very dependent on the to which the features are shared across tasks. Indeed they show that if the extent of overlap is less than a threshold, or even if parameter in the shared features are highly uneven, then block L1 Lq regularization could actually perform than simple separate elementwise L1 regularization. We are far away from a realistic multi-task setting: not only do the set of relevant features have to be exactly the same across tasks, but their values have to as well. Here, we ask the question: can we leverage support and parameter overlap when it exists, but not pay a penalty when it does not? Indeed, this falls under a more general question of whether we can model such which may not fall into a single neat structural bracket (all block-sparse, or all low-rank and so on). Here, we take a first step, focusing on developing a dirty model for the multiple regression problem. Our method uses a very simple idea: we decompose the parameters into two components and regularize these differently. We show both theoretically and empirically, our method strictly and noticeably outperforms both L1 and L1 Lq methods, over the entire range of possible overlaps. We also provide theoretical guarantees that the method performs well under high-dimensional scaling.

See Also:

Download slides icon Download slides: nips2010_jalali_dmm_01.pdf (952.0┬áKB)

Help icon Streaming Video Help

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Reviews and comments:

Comment1 old friend, May 13, 2011 at 9:56 p.m.:

Be che lajani keshide shodi
bebin che gonahani kardi ke khoda be in ruz andakhtat
tobe kon

Comment2 Another Old Friend, November 15, 2011 at 3 a.m.:

Great Talk Ali jan.

To Old friend: bekhab baba!

Write your own review or comment:

make sure you have javascript enabled or clear this field: