Causality Program Visitor Speaker Series: Testing Causal Diagrams with Mixed Data

Tuesday, April 5th, 2022, 2:00 pm–3:00 pm

Add to Calendar

Parent Program:

Causality

Speaker:

Johannes Textor (Radboud University)

Location:

Calvin Lab Room 116

Testing Causal Diagrams with Mixed Data

Since the publication of the landmark paper "Causal diagrams for epidemiologic research", Epidemiologists have increasingly adopted DAGs, mainly to derive adjustment sets for addressing confounding bias. In the first part of this talk, I will present results from a systematic review of DAG use in biomedical research. Unfortunately, we find the use of DAGs in practice to be rather problematic in several ways. For instance, almost no researcher appears willing to conduct any model testing to probe the consistency of their hand-drawn DAG with the dataset it is meant to represent. While there are several reasons for the lack of model testing, we hypothesize that one issue might be the difficulty of testing conditional independence (CI) statements both conceptually and in practice.

Motivated by these findings, the second part of my talk will present some work we did that aims to make CI testing more accessible and useable for applied researchers, especially for mixtures of categorical and ordinal data (which are common in Epidemiology). Most existing CI tests for categorical and ordinal data stratify the sample by the conditioning variables, perform simple independence tests in each stratum, and combine the results. Unfortunately, the statistical power of this approach degrades rapidly as the number of conditioning variables increases. We propose a simple unified CI testing approach for ordinal and categorical data, that maintains reasonable calibration and power in high-dimensional data. Our test appears to outperform existing baselines in model testing and structure learning for dense directed graphical models while being comparable for sparse models. The characteristics of our test makes it attractive for manual testing of causal models: it is easy to implement, can be used with non-parametric or parametric probability models, has the symmetry property, and has reasonable computational cost.