Defence of the Doctoral Dissertation: Machine Learning of Semantics for Text Understanding
published: June 12, 2017, recorded: May 2017, views: 66
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Text Understanding is a long term goal of Artificial Intelligence. Its mission is to create algorithms that will fully automatically understand human-composed text. In this dissertation we propose, implement and evaluate machine learning approaches and meaning representations to make progress towards Text Understanding. Meaning representations are used to encode the information contained in a sentence. Furthermore, they can be used in applications that require detailed understanding of the sentence. The focus of our work is on the intrinsic meaning (semantics) of sentences rather than their context or syntactic characteristics. An ontology is a conceptualization that can be used to express the meaning of text. Predicate logic, which connects concepts from the ontology, is one of the meaning representations that we use. We propose two rule-based approaches for mapping text into predicate logic. One approach is based on textual patterns, while the other approach is based on a syntax parser. Rules are manually constructed and the approaches work on the assumption of a complete ontology. This led us to develop a grammar induction approach for semantic parsing and ontology learning. The induced context-free grammar parses a sentence of text into a semantic tree, which is a meaning representation, where each node has its own semantic category, e.g. person, location, profession. Furthermore, some of the nodes can be aligned with ontology concepts. The trees are used as a source for relation extraction and instance discovery, while a taxonomy is extracted from the grammar. Thus, both grammar and semantic trees are used to extend the ontology, so that it also represents the meaning of the input sentences. Text understanding can be both learned and evaluated through the task of Natural Language Inference (NLI), where the goal is to determine whether one sentence (hypothesis) entails, contradicts or is neutral in respect to the other sentence (premise). Instead of focusing on this classification task, we propose several text generative neural networks for generating hypotheses which allow the construction of new NLI datasets. To evaluate the models, we propose a new metric -- the accuracy of the classifier trained on the generated dataset and tested on the original, manually constructed dataset. The accuracy obtained by our best generative model is only 2.7% lower than the accuracy of the classifier trained on the original dataset. The model learns a mapping embedding for each training example. Furthermore, combining the best generated and original dataset results in the highest accuracy. Our metric awards diverse, accurate, non-trivial and comprehensible examples. By comparing various metrics we show that datasets that obtain higher ROUGE or METEOR scores do not necessarily yield higher classification accuracies. We also provide an analysis of what are the characteristics of a good dataset including the indistinguishability of the generated dataset from the original one.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !