Simons Fellows Seminar Series: Minimax-Optimal Policy Learning and Evaluation under Unobserved Confounding
Angela Zhou (Simons Institute)
Room 116 and Zoom
Abstract: We study the problem of learning (single-timestep) personalized decision policies from observational data while accounting for possible unobserved confounding. This problem is called "batch learning with bandit feedback" in computer science or "optimal decision/treatment rules" in biostatistics, "optimal policy learning" in econometrics. Previous approaches, which assume unconfoundedness, that is, that no unobserved confounders affect both the treatment assignment as well as outcome, can lead to policies that introduce harm rather than benefit when some unobserved confounding is present as is generally the case with observational data. Instead, because policy value and regret may not be point-identifiable, we study a robust method that minimizes the worst-case estimated regret of a candidate policy against a baseline policy over an uncertainty set for propensity weights that controls the extent of unobserved confounding. Our uncertainty sets are superpopulation versions of sensitivity analysis in causal inference. We prove generalization guarantees that ensure our policy is safe when applied in practice and in fact obtains the best possible uniform control on the range of all possible population regrets that agree with the possible extent of confounding. We develop efficient algorithmic solutions to compute this minimax-optimal policy. Finally, we assess and compare our methods on synthetic and semisynthetic data; including a case study on personalizing hormone replacement therapy based on observational data in which we illustrate our results on a randomized experiment. Based on [1].
Bio: Angela is a Research Fellow at Simons Program on Causality. In Summer 2022 she will start as an Assistant Professor at University of Southern California Marshall School of Business in Data Sciences and Operations. Her research interests are in statistical machine learning for data-driven (sequential) decision making under uncertainty, causal inference, and the interplay of statistics and optimization. She is particularly interested in applications-motivated methodology with guarantees in order to bridge method and practice.