Dual Beta Process Priors for Latent Cluster Discovery in Chronic Obstructive Pulmonary Disease

author: James C. Ross, Harvard Medical School
published: Oct. 7, 2014,   recorded: August 2014,   views: 1558

Related Open Educational Resources

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.


Chronic obstructive pulmonary disease (COPD) is a lung disease characterized by airflow limitation usually associated with an inflammatory response to noxious particles, such as cigarette smoke. COPD is currently the third leading cause of death in the United States and is the only leading cause of death that is increasing in prevalence. It also represents an enormous financial burden to society, costing tens of billions of dollars annually in the U.S. It is widely accepted by the medical community that COPD is a heterogeneous disease, with substantial evidence indicating that genetic variation contributes to varying levels of disease susceptibility. This heterogeneity makes it difficult to predict health decline and develop targeted treatments for better patient care. Although researchers have made several attempts to discover disease subtypes, results have been inconclusive, in part because standard clustering methods have not properly dealt with disease manifestations that may worsen with increased exposure. In this paper we introduce a transformative way of looking at the COPD subtyping task. Specifically, we model the relationship between risk factors (such as age and smoke exposure) and manifestations of disease severity using Gaussian Processes, which allow us to represent so-called "disease trajectories". We also posit that individuals can be associated with multiple disease types (latent clusters), which we assume are influenced by genetics. Furthermore, we predict that only subsets of the numerous disease-related quantitative features are useful for describing each latent subtype. We model these associations using two separate beta process priors, and we describe a variational inference approach to discover the most probable latent cluster assignments. Results are validated with associations to genetic markers.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: