Cross-Lingual Document Retrieval through Hub Languages

author: Jan Rupnik, Artificial Intelligence Laboratory, Jožef Stefan Institute
published: Jan. 11, 2013,   recorded: December 2012,   views: 3745


Related Open Educational Resources

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.


We address the problem of learning similarities between documents written in different languages for language pairs where little or no direct supervision (in the form of a comparable or parallel corpus) is available. To make up for the lack of direct supervision, our approach takes advantage of the fact that they may be linked indirectly by a hub language. That is, correspondences exist between each of the languages and a third, hub language. The main goal of our paper is to explore the viability of cross-lingual learning under such conditions. We propose a method that extracts a set of multilingual topics that facilitate a common representation of documents in different languages. The method is suitable for a comparable multilingual corpus with missing documents. We evaluate the approach in a truly multi-lingual setting, performing document retrieval across eight Wikipedia languages.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: