Latent Space Domain Transfer between High Dimensional Overlapping Distributions

author: Sihong Xie, Department of Computer Science, University of Illinois at Chicago
author: Wei Fan, Baidu, Inc.
author: Jing Peng, Montclair State University
author: Olivier Verscheure, IBM Thomas J. Watson Research Center
author: Jiangtao Ren, Sun Yat-Sen University
published: May 20, 2009,   recorded: April 2009,   views: 4708


Related Open Educational Resources

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.


Transferring knowledge from one domain to another is challenging due to a number of reasons. Since both conditional and marginal distribution of the training data and test data are non-identical, model trained in one domain, when directly applied to a diffeerent domain, is usually low in accuracy. For many applications with large feature sets, such as text document, sequence data, medical data, image data of different resolutions, etc. two domains usually do not contain exactly the same features, thus introducing large numbers of "missing values" when considered over the union of features from both domains. In other words, its marginal distributions are at most overlapping. In the same time, these problems are usually high dimensional, such as, several thousands of features. Thus, the combination of high dimensionality and missing values make the relationship in conditional probabilities between two domains hard to measure and model. To address these challenges, we propose a framework that first brings the marginal distributions of two domains closer by "filling up" those missing values of disjoint features. Afterwards, it looks for those comparable sub-structures in the "latent-space" as mapped from the expanded feature vector, where both marginal and conditional distribution are similar. With these sub-structures in latent space, the proposed approach then find common concepts that are transferable across domains with high probability. During prediction, unlabeled instances are treated as "queries", the mostly related labeled instances from out-domain are retrieved, and the classifcation is made by weighted voting using retrievd out-domain examples. We formally show that importing feature values across domains and latent semantic index can jointly make the distributions of two related domains easier to measure than in original feature space, the nearest neighbor method employed to retrieve related out domain examples is bounded in error when predicting in-domain examples.

See Also:

Download slides icon Download slides: www09_xie_lsdt_01.ppt (5.3┬áMB)

Help icon Streaming Video Help

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Reviews and comments:

Comment1 Huda Yousuf, May 29, 2009 at 1 p.m.:

It is very very nice this lecture,can you presentation the reference range?

Write your own review or comment:

make sure you have javascript enabled or clear this field: