The Use of Randomization and Statistical Significance in Data Mining

Published on 2013-01-163166 Views

Kai Puolamäki

The concept and theory of statistical significance testing is well established in a traditional setup, but not in the problem settings related to data mining. In this talk I discuss the formulation as

PTDM 2012 - Brussels

Related categories

Statistical & Consensus Methods

Presentation

The Use of Randomization and Statistical Significance in Data Mining00:00

Contents00:01

Helsinki DM group 201200:36

Recent alumni01:49

Other approaches to the learning problem02:44

“Clarifying” analogue02:51

Learning problem03:52

Bayesian learning05:41

Collaborative filtering07:20

Algorithmic approach (1)07:50

Algorithmic approach (2)08:27

Other approaches09:07

Traditional statistcal significance testing (1)09:20

Traditional statistcal significance testing (2)09:50

Statistical significance testing11:55

Multiple hypothesis testing (1)12:33

Multiple hypothesis testing (2)13:08

Data mining formulation (1)14:22

Data mining formulation (2)14:44

Data mining formulation (3)15:48

Correlations and co-occurrences (1)16:49

Correlations and co-occurrences (2)17:00

Correlations and co-occurrences (3)17:11

Correlations and co-occurrences (4)18:15

Correlations and co-occurrences (5)19:04

Correlations and co-occurrences (6)19:35

Correlations and co-occurrences (7)20:48

Correlations and co-occurrences (8)22:17

Tell me what I do not know23:45

Tell me something I don’t know (1)23:55

Tell me something I don’t know (2)24:28

Randomize = sample from Pr(ω)25:33

Constrain RG25:44

Randomize with constraint RG26:11

Tell me something I don’t know (3)26:24

Tell me something I don’t know (4)27:12

Tell me something I don’t know (5)28:16

Tell me something I don’t know (6)29:47

Most informative set of patterns (1)30:42

Most informative set of patterns (2)30:53

Most informative set of patterns (3)32:33

Most informative set of patterns (4)32:59

Most informative set of patterns (5)33:37

Most informative set of patterns (6)34:09

Most informative set of patterns (7)35:44

Most informative set of patterns (8)36:02

Most informative set of patterns (9)36:55

Most informative set of patterns (10)39:02

Most informative set of patterns (11)40:37

Time series segmentation (1)41:54

Time series segmentation (2)42:39

Time series segmentation (3)43:47

Time series segmentation (4)44:30

Time series segmentation (5)44:45

Time series segmentation (6)45:18

Time series segmentation (7)47:00

Time series segmentation (8)47:21

Agglomerative hierarchical clustering47:48

Most informative set of patterns (12)48:43

ptdm2012_puolamaeki_statistical_significance_01_Page_6049:39

Relation to Bayesian learning (1)49:49

Bayesian learning (1)50:20

Bayesian learning (2)50:55

Bayesian learning (3)51:26

Bayesian learning (4)51:49

Bayesian learning (5)52:03

Bayesian learning (6)52:20

Bayesian learning (7)52:37

Bayesian learning (8)53:17

Summary 54:35