Introduction to causal discovery: A Bayesian Networks approach
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
The tutorial presents an introduction to basic assumptions and techniques for causal discovery from observational data with the use of graphs that represent conditional independence models. It first presents the basic theory of causal discovery such as the Causal Markov Condition, the Faithfulness Condition, and the d-separation criterion, graphical models for representing causality such as Causal Bayesian Networks, Maximal Ancestral Graphs and Partial Ancestral Graphs. It presents prototypical and state-of-the-art algorithms such as the PC, FCI and HITON for learning such models (global learning) or parts of such models (local learning) from data. The tutorial also discusses the connections of causality to feature selection and present causal-based feature selection techniques. Finally, case-studies of applications of causal discovery algorithms are presented, with a focus on applications to biomedical data.
The tutorial is designed for a wide audience with a general Machine Learning, Data Mining, and Statistical background..
The tutorial aims to:
- Familiarize the audience with the field and increase comprehension of the problem of causal induction as it pertains to everyday data analysis tasks; familiarize the audience with formalisms that represent causal relations among variables and provide a language for thinking about causality and causal discovery
- Increase understanding of the basic principles of causal induction and familiarity with prototypical and state-of-the-art algorithms in the field; enable the correct interpretation of the output of such algorithms
- Enable the correct application of causal-discovery algorithms in practical data mining, machine learning, or statistical analysis tasks
More specifically, it aims to clarify the following issues that are important to every researcher and practitioner of data analysis:
- While most machine learning techniques assume identically and independently distributed data (i.i.d. data) quite often in many fields the data do not follow this assumption. The data may be experimental (e.g., after knocking out a gene) or under selection bias, e.g., in case-control studies. The tutorial helps understanding the differences and how they arise due to the causal structure of the domain
- It is often the case that the purpose of the analysis is to identify important variables (a.k.a feature selection), called biomarkers in biology, risk factors in medicine, etc. The tutorial helps understanding the connection between the selected variables and the causal structure.
- It is often the case that prediction models are not the final goal, but instead the goal is to control a system, e.g., treat a patient, design a drug with desired properties, etc. Causal modeling and induction is necessary to build machine learning models that can predict the outcome in a system that is being manipulated (e.g., under different experimental conditions).
- The tutorial provides a deeper understanding in standard (non-causal) Bayesian Networks that have been proven important in Machine Learning, reasoning with Uncertainty in Artificial Intelligence, and Decision Support Systems for over two decades.
- Causal discovery has already led to important discoveries, thus knowledge of these methods and their potential is important for the data analysts of the future.
The tutorial outline is shown below:
1. Representing Causality 2. Inducing Causal Models from Data 3. Case Studies and Practical Issues
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !