Sunita Sarawagi
search externally:   Google Scholar,   Springer,   CiteSeer,   Microsoft Academic Search,   Scirus ,   DBlife


My topics of interest span several fields including databases, data mining, machine learning and statistics. A good idea about my research interests can be obtained by following my publications. Some specific problems and projects on which I have worked are listed below.

  • World Wide Tables: The goal of this project is to answer table queries by tapping partially structured sources like tables and lists on the web.
  • Information Extraction and data integration: Recently, I have been interested in graphical models and their use for various extraction and integration problems. As part of this effort, I have developed a package for Conditional Random Fields (CRF) that can be downloaded from sourceforge.
  • ALIAS: This is a prototype of an interesting and fairly compelling application of the use of machine learning techniques like Active Learning to ease the duplicate elimination task that arise in data cleaning.
  • DATAMOLD: is a tool for Information Extraction (more like text segmentation) using learning based on Hidden Markov Models. This software has been licensed by a data cleaning consulting company to solve real-life address cleaning tasks.
  • ICube: This is a project on which I worked actively between 1999-2001. It is about enhanced mining of multidimensional OLAP products. A web demo of ICube is available.
  • New data mining operations: I have worked on temporal data mining. Currently interested in various multi-class, multi-label and multi-taxonomy learning problems.
  • Database mining integration: I have worked on two different aspects of this problem. First on algorithmic and architectural issues related to expressing association rule mining algorithm, in a relational engine. Second, on deploying learnt models within a relational engine so as to allow close integration with SQL querying and optimization.
  • Some past projects (pre-1996): In the past I have worked on various problems related to multidimensional OLAP indexing and aggregation computation. My PhD thesis was on query optimization and scheduling for tertiary memory databases.
  • Ancient projects (pre-1991): I got my first glimpse to research in computer science theory through search problems arising in rectangle cutting and packing problems.


flag Open-domain Quantity Queries on Web Tables: Annotation, Response, and Consensus Models
as author at  Research Sessions,
  invited talk
flag WWT: A system for query-driven relation extraction from the semi-structured web
as author at  1st Workshop on Automated Knowledge Based Construction (AKBC), Grenoble 2010,
flag Accurate Max-margin Training for Structured Output Spaces
as author at  25th International Conference on Machine Learning (ICML), Helsinki 2008,