event thumbnail image
The 7th International Symposium on Intelligent Data Analysis

Robust Tree-Based Incremental Imputation Method for Data Fusion

author: Antonio D'Ambrosio, University of Naples Federico II

Description

Data Fusion and Data Grafting are concerned with combining files and information coming from different sources. The problem is not to extract data from a single database, but to merge information collected from different sample surveys. The typical data fusion situation formed of two data samples, the former made up of a complete data matrix X relative to a first survey, and the latter Y which contains a certain number of missing variables. The aim is to complete the matrix Y beginning from the knowledge acquired from the X. Thus, the goal is the definition of the correlation structure which joins the two data matrices to be merged. In this paper, we provide an innovative methodology for Data Fusion based on an incremental imputation algorithm in tree-based models. In addition, we consider robust tree validation by boosting iterations. A relevant advantage of the proposed method is that it works for a mixed data structure including both numerical and categorical variables. As benchmarking methods we consider explicit methods such as standard trees and multiple regression as well as an implicit method based principal component analysis. A widely extended simulation study proves that the proposed method is more accurate than the other methods.

You might be experiencing some problems with Your Video player.
Slides
0:00 Robust Tree-Based Incremental Imputation Method for Data Fusion
0:30 Outline
0:55 What is Data Fusion? pt 1
1:53 What is Data Fusion? pt 2
2:59 What is Data Fusion? pt 3
4:15 Data Fusion Approaches
5:26 Our Proposal
6:18 Why Incremental?
7:02 General Idea of Boosting
7:41 Recall: Ensemble Methods
8:19 R.I.I. Algorithm
9:48 Robust Tree-Based Incremental Imputation Algorithm
10:07 Simulation Setting: Numerical Missing Values
11:37 Simulation Study pt 1
11:57 Simulation Study pt 2
12:14 Simulation 1:Test Error Progress through AdaBoost Iterations
12:34 Simulation Study – Sim 1: Main Results
14:43 Simulation 2:Test Error Progress through AdaBoost Iterations
14:46 Simulation Study – Sim 2: Main Results
14:52 Simulation 3:Test Error Progress through AdaBoost Iterations
14:54 Simulation Study – Sim 3: Main Results
15:01 A Real Dataset: Boston Housing
15:37 Boston housing: Main Results
16:26 Concluding Remarks and Future Perspectives

Lecture rating

People found this lecture:
Worth seeing
because it is:
 Valuable and informative
Well presented
Easily understandable
Acceptably recorded
You need to login to cast your vote.

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Reviews and comments:

Comment1 Frank Tranton, October 11, 2007 at 7:23 p.m.:

Good talk, good idea, but Data Fusion seems to be not much feasible

Write your own review or comment: