Potential and limitations of minimally supervised botstrapping
published: Nov. 12, 2007, recorded: October 2007, views: 3825
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
The detection of relation instances is a central functionality for the extraction of structured information from unstructured textual data and for gradually turning texts into semi-structured information. With respect to the acquisition of the classifiers or detection grammars, the existing approaches fall in three large categories: * detection by classifiers/grammars acquired through intellectual human labor * detection by classifiers/grammars acquired through supervised learning * detection by classifiers/grammars acquired through unsupervised or minimally supervised learning. In the talk we will provide examples for the classes of approaches and summarize their respective advantages and disad¬vantages. We will argue that different relation detection tasks require different methods or even different combinations of methods. One empirically promising and theoretically attractive line of research is the learning of extraction rules from seeds. Several minimally supervised approaches have been investigated that accomplished rather decent results with a minimum of effort. The learning algorithms are not domain dependent. The seed-based bootstrapping approaches are theoretically pleasing because the learned patterns and rules are modular and transparent. They can be reused in new applications and they can be a valuable resource for (computational) linguistic investigation. We will explain several bootstrapping methods, most of them starting with patterns as seeds and some with event seeds. We will also describe our own approach of bootstrapping (Xu et al. 2007) a radical extension of Xu et al. (2006). In this approach, learning starts from a small set of n-ary relation instances as "seeds" in order to auto-ma¬ti¬cally learn pattern rules from parsed data, which then can extract new instances of the n-ary relation and its projections. After a fruitful period of skillful trial and error, there seems to be the right time now for a more systematic investigation of the alternative approaches to relation detection. In addition to tables of recall and precision values for competing methods, we urgently need explanations, i.e. causal theories explaining the virtues and shortcomings of alternative techniques with respect to properties of domains and text data. We describe one theory of this kind based on experimental evidence and explanatory insight. The advocated scientific methodology will enable optimal choices for specific tasks, effectively reduce the number of promising combinations of methods for future investigation, and guide the search for completely new approaches.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !