From Micro to Macro: Data Driven Phenotyping by Densification of Longitudinal Electronic Medical Records

author: Jiayu Zhou, Samsung Research America
published: Oct. 7, 2014,   recorded: August 2014,   views: 2246

Related Open Educational Resources

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.


Inferring phenotypic patterns from population-scale clinical data is a core computational task in the development of personalized medicine. One important source of data on which to conduct this type of research is patient Electronic Medical Records (EMR). However, the patient EMRs are typically sparse and noisy, which creates significant challenges if we use them directly to represent patient phenotypes. In this paper, we propose a data driven phenotyping framework called Pacifier (PAtient reCord densIFIER), where we interpret the longitudinal EMR data of each patient as a sparse matrix with a feature dimension and a time dimension, and derive more robust patient phenotypes by exploring the latent structure of those matrices. Specifically, we assume that each derived phenotype is composed of a subset of the medical features contained in original patient EMR, whose value evolves smoothly over time. We propose two formulations to achieve such goal. One is Individual Basis Approach (IBA), which assumes the phenotypes are different for every patient. The other is Shared Basis Approach (SBA), which assumes the patient population shares a common set of phenotypes. We develop an efficient optimization algorithm that is capable of resolving both problems efficiently. Finally we validate Pacifier on two real world EMR cohorts for the tasks of early prediction of Congestive Heart Failure (CHF) and End Stage Renal Disease (ESRD). Our results show that the predictive performance in both tasks can be improved significantly by the proposed algorithms (average AUC score improved from 0.689 to 0.816 on CHF, and from 0.756 to 0.838 on ESRD respectively, on diagnosis group granularity). We also illustrate some interesting phenotypes derived from our data.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: