Unlocking the potential of large prospective biobank cohorts for -omics data analysis: aspects of study design, prediction and causality

Published on 2016-07-181369 Views

Krista Fischer

Recent decade has seen a tremendous increase in availability of data from large population-based biobank cohorts. Such datasets include various types of -omics data (genomics, transcriptomics, metab

ESHG Symposium 2016 - Barcelona

Related categories

Presentation

Unlocking the potential of large prospective biobank cohorts for -omics data analysis00:00

Machine learning vs Statistics02:52

The potential in large prospective biobank cohorts03:38

Biobank cohorts have brought a new era…05:39

Estonian Biobank07:20

EGCUT cohort vs Estonian population08:34

A prospective cohort of 50000+ participants („Gene Donors“)09:14

Follow-up studies: statistical aspects10:04

Example - 111:21

Example - 212:22

Example - 312:50

Genetic predictors for survival/mortality – why needed?13:45

Mortality studies in population-based biobank cohorts – sampling and timescales15:34

Standard survival analysis approach16:35

Age as time scale17:28

Genetic predictors affect from birth on, should we start the age scale at 0? - 118:54

Genetic predictors affect from birth on, should we start the age scale at 0? - 219:03

Biobank recruitment and follow-up20:28

Observed follow-up times on age scale21:13

Most common analysis method21:16

Partial likelihood for the Cox model23:34

Results of a simulation study (true HR=2)24:47

But...simulation when HR is small (HR=1.05) - 228:56

Genetic predictors for mortality – more challenges in biobank data30:18

What happens if you use parental data?32:18

Some simulations33:53

Genetic predictors for mortality – methodological approaches?35:26

A two-step Cox modeling approach37:00

Comparison of p - values38:29

How to handle power issues? (low no of cases)41:50

Nested case-control design42:33

Example of the Estonian Biobank analysis43:57

Often cases are over-sampled, but this is not a nested case-control design45:32

Other aspects to consider47:13

Some results…53:54

But...simulation when HR is small (HR=1.05) - 155:55

Part II Genetic (polygenic) risk scores01:00:45

Why is genetic risk important?01:01:28

How to measure genetic risk?01:02:54

Type 2 Diabetes01:03:22

Comparison of cohort-specific and meta-analysis effect estimates01:03:45

Genetic (polygenic) risk scores (GRS)01:04:26

GRS: questions to address01:04:57

Problem with p-value based selections: „winners curse“01:05:37

The „true GRS“…01:06:59

Doubly-weighted GRS01:08:01

GRS for Type 2 Diabetes: allele count vs weighted scores01:09:09

ROC curves (BMI=25..35)01:10:45

T2D prevalence in individuals aged 45-8001:11:05

Genetic risk score (GRS) for CAD and cardiovascular mortality in men01:11:40

Extreme cases and controls01:11:55

Part 3: aspects of causality01:12:10

What is a causal effect?01:16:37

How to estimate causal effects? - 101:17:59

How to estimate causal effects? - 201:19:33

Causal graphs (DAGs)01:20:21

How to estimate causal effects?01:20:52

Can genetics help us? The idea of Mendelian Randomization01:21:54

Example from recent literature01:22:52

Mendelian randomization (MR)01:23:53

MR– how does it work? - 101:24:22

Mendelian randomization example01:25:48

A general association structure with one genotype and two phenotypes01:26:09

Can we test pleiotropy? - 101:28:31

Can we test pleiotropy? - 201:28:49

Conclusions01:28:57

Conclusions II01:29:51

Collaborators01:30:24

European Mathematical Genetics Meeting 201701:30:51

What is estimated in the presence of pleiotropy?01:32:20

MR– how does it work? - 201:32:45