Advances in Cross-Lingual Syntactic Transfer
published: Jan. 11, 2013, recorded: December 2012, views: 4795
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
The idea to use annotated resources from one language to learn models for another has been around for at least a decade. Typically these models have relied on access to parallel data. However, recent approaches have focused on "direct" cross-lingual transfer, and in particular, delexicalized transfer. Delexicalized parsing models are conditioned only on properties of the input that are available across languages, typically induced tags or clusters. Since these properties are universally available, it is possible to directly use a parser trained on English for every other language. This simple method has shown itself to be surprisingly effective and outperforms the best weakly-supervised models by a significant margin. However, the assumptions underlying these models are far to weak to obtain parsing accuracies at the level of monolingual supervised methods. In this talk I will focus on porting ideas from work on selective parameter sharing in multi-source direct transfer to highly accurate latent CRF parsing models. I will then present novel semi-supervised learning algorithms that relexicalize these models on unlabeled target language data to give significant improvements. The final model brings us one step closer to building robust syntactic parsers for all the world's languages.
Download slides: nipsworkshops2012_mcdonald_syntactic_transfer_01.pdf (11.5 MB)
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !