Given i.i.d samples from an unknown distribution, estimating its symmetric properties is a classical problem in information theory, statistics and computer science. Symmetric properties are those that are invariant to label permutations and include popular functionals such as entropy and support size. Early work on this question dates back to the 1940s when R. A. Fisher and A. S. Corbet studied this to estimate the number of distinct butterfly species in Malaysia. Over the past decade, this question has received great attention leading to computationally efficient and sample optimal estimators for various symmetric properties. All these estimators were property specific and the design of a single estimator that is sample optimal for any symmetric property remained a central open problem in the area. In a recent breakthrough, Acharya et. al. showed that computing an approximate profile maximum likelihood (PML), a distribution that maximizes the likelihood of the observed multiset of frequencies, allows statistically optimal estimation of any symmetric property. However, since its introduction by Orlitsky et. al. in 2004, efficient computation of an approximate PML remained a well known open problem. In our work, we resolved this question by designing the first efficient algorithm for computing an approximate PML distribution. In addition, our investigations have led to a deeper understanding of various computational and statistical aspects of PML and universal estimators.

### Friday, February 11th, 2022

Machine learning is increasingly being used for mechanism design, with applications such as price optimization on online marketplaces and ad auction design. In this talk, I will give an overview of my research on mechanism design via machine learning, touching on statistical problems such as overfitting, incentive problems, and privacy preservation.

How do we make machine learning (ML) algorithms fair and reliable? This is particularly important today as ML enters high-stakes applications such as hiring and education, often adversely affecting people's lives with respect to gender, race, etc., and also violating anti-discrimination laws. When it comes to resolving legal disputes or even informing policies and interventions, only identifying bias/disparity in a model's decision is insufficient. We really need to dig deeper into how it arose. E.g., disparities in hiring that can be explained by an occupational necessity (code-writing skills for software engineering) may be exempt by law, but the disparity arising due to an aptitude test may not be (Ref: Griggs v. Duke Power `71). This leads us to a question that bridges the fields of fairness, explainability, and law: How can we identify and explain the sources of disparity in ML models, e.g., did the disparity entirely arise due to the critical occupational necessities? In this talk, I propose a systematic measure of "non-exempt disparity," i.e., the bias which cannot be explained by the occupational necessities. To arrive at a measure for the non-exempt disparity, I adopt a rigorous axiomatic approach that brings together concepts in information theory (in particular, an emerging body of work called Partial Information Decomposition) with causality.

Today's data-rich platforms are reshaping the operations of societal networks by providing information, recommendations, and matching services to a large number of users. How can we model the behavior of human agents in response to services provided by these platforms, and develop tools to improve the aggregate outcomes in a socially desirable manner? In this talk, I will briefly summarize our works that tackle this question from three aspects: 1) Game-theoretic analysis of the impact of information platforms (navigation apps) on the strategic behavior and learning processes of travelers in uncertain networks; 2) Market mechanism design for efficient carpooling and toll pricing in the presence of autonomous driving technology; 3) Security analysis and resource allocation for robustness under random or adversarial disruptions.

I will provide a brief overview of previous work on credible inference in the context of causal inference and statistical machine learning, and discuss ongoing directions on interfaces of causal inference embedded in complex operational systems.

The Q-learning algorithm is a simple, fundamental and practically very effective reinforcement learning algorithm. However, the basic protocol can exhibit an unstable behavior when implemented even with simple linear function approximation. While tools like target networks and experience replay are often implemented to stabilize the learning process, the individual contribution of each of these mechanisms is not well understood theoretically. This work proposes an exploration variant of the basic Q-learning protocol with linear function approximation. Our modular analysis illustrates the role played by each algorithmic tool that we adopt: a second order update rule, a set of target networks, and a mechanism akin to experience replay. Together, they enable state of the art regret bounds on linear MDPs while preserving the most prominent feature of the algorithm, namely a space complexity independent of the number of step elapsed.

Even the most carefully curated economic data sets have variables that are noisy, missing, discretized, or privatized. The standard workflow for empirical research involves data cleaning followed by data analysis that typically ignores the bias and variance consequences of data cleaning. We formulate a semiparametric model for causal inference with corrupted data to encompass both data cleaning and data analysis. We propose a new end-to-end procedure for data cleaning, estimation, and inference with data cleaning-adjusted confidence intervals. We prove consistency, Gaussian approximation, and semiparametric efficiency for our estimator of the causal parameter by finite sample arguments. The rate of Gaussian approximation is $n^{-1/2}$ for global parameters such as average treatment effect, and it degrades gracefully for local parameters such as heterogeneous treatment effect for a specific demographic. Our key assumption is that the true covariates are approximately low rank. In our analysis, we provide nonasymptotic theoretical contributions to matrix completion, statistical learning, and semiparametric statistics. We verify the coverage of the data cleaning-adjusted confidence intervals in simulations calibrated to resemble differential privacy as implemented in the 2020 US Census.

Current methods for causal discovery typically report a single directed acyclic graph (DAG). Through an example, I hope to convince you that this might not be the best practice. In fact, depending on how two DAGs intersect and the local geometry at the intersection, the hardness of this problem can vary dramatically.

What is the best we can do with the amount of data at our disposal with a given learning task? Modern learning problems---with a modest amount of data or subject to data processing constraints---frequently raise the need to understand the fundamental limits and make judicious use of the available small or imperfect data. This talk will cover several examples of learning where exploiting the key structure, as well as optimally trading between real-world resources, are vital to achieve statistical efficiency.

In this talk, I will survey some of my dissertation work on algorithmic problems arising in the design and analysis of randomized experiments. I hope to give a sense of the style of problems and technical work that I enjoy. During my dissertation work, I was asking: How can we design sampling algorithms to achieve desired levels of covariate balance in a randomized experiment? How can we estimate the variance of a treatment effect estimator in the presence of general interference? How should we analyze and design so-called "bipartite" experiments where units which receive treatment are distinct from units on which outcomes are measured?

Reinforcement learning (RL) has recently achieved tremendous successes in several artificial intelligence applications. Many of the forefront applications of RL involve "multiple agents", e.g., playing chess and Go games, autonomous driving, and robotics. In this talk, I will introduce several recent works on multi-agent reinforcement learning (MARL) with theoretical guarantees. Specifically, we focus on solving the most basic multi-agent RL setting: infinite-horizon zero-sum stochastic games (Shapley 1953), using three common RL approaches: model-based, value-based, and policy-based ones. We first show that for the tabular setting, "model-based multi-agent RL" (estimating the model first and then planning) can achieve near-optimal sample complexity when a generative model of the game environment is available. Second, we show that a simple variant of "Q-learning" (value-based) can find the Nash equilibrium of the game, even if the agents run it independently/in a "fully decentralized" fashion. Third, we show that "policy gradient" methods (policy-based) can solve zero-sum stochastic games with linear dynamics and quadratic costs, which equivalently solves a robust and risk-sensitive control problem. With this connection to robust control, we discover that our policy gradient methods automatically preserve the robustness of the system during iterations, some phenomena we referred to as "implicit regularization". Time permitting, I will also discuss some ongoing and future directions along these lines.

The ability to learn from data and make decisions in real-time has led to the rapid deployment of machine learning algorithms across many aspects of everyday life. While this has enabled new services and technologies, the fact that algorithms are increasingly interacting with people and other algorithms marks a distinct shift away from the traditional machine learning paradigm. Indeed, little is known about how these algorithms--- that were designed to operate in isolation--- behave when confronted with strategic behaviors on the part of people, and the extent to which strategic agents can game the algorithms to achieve better outcomes. In this talk, I will give an overview of my work on learning games and in the presence of strategic agents and multi-agent reinforcement learning.

What will happen to Y if we do A? A variety of meaningful socio-economic and engineering questions can be formulated this way. To name a few: What will happen to a patient's health if they are given a new therapy? What will happen to a country's economy if policy-makers legislate a new tax? What will happen to a data center's latency if a new congestion control protocol is used? In this talk, we will explore how to answer such counterfactual questions using observational data---which is increasingly available due to digitization and pervasive sensors---and/or very limited experimental data. The two key challenges in doing so are: (i) counterfactual prediction in the presence of latent confounders; (ii) estimation with modern datasets which are high-dimensional, noisy, and sparse. Towards this goal, the key framework we introduce is connecting causal inference with tensor completion, a very active area of research across a variety of fields. In particular, we show how to represent the various potential outcomes (i.e., counterfactuals) of interest through an order-3 tensor. The key theoretical results presented are: (i) Formal identification results establishing under what missingness patterns, latent confounding, and structure on the tensor is recovery of unobserved potential outcomes possible. (ii) Introducing novel estimators to recover these unobserved potential outcomes and proving they are finite-sample consistent and asymptotically normal. The efficacy of the proposed estimators is shown on high-impact real-world applications. These include working with: (i) TaurRx Therapeutics to propose novel clinical trial designs to reduce the number of patients recruited for a trial and to correct for bias from patient dropouts; (ii) Uber Technologies on evaluating the impact of certain driver engagement policies without having to run an A/B test.

Motivated by recent advances in both theoretical and applied aspects of multiplayer games, spanning from e-sports to multi-agent generative adversarial networks, we focus on min-max optimization in team zero-sum games. In this class of games, players are split into two teams with payoffs equal within the same team and of opposite sign across the opponent team. Unlike the textbook two-player zero-sum games, finding a Nash equilibrium in our class can be shown to be CLS-hard, i.e., it is unlikely to have a polynomial-time algorithm for computing Nash equilibria. Moreover, in this generalized framework, we establish that even asymptotic last iterate or time average convergence to a Nash Equilibrium is not possible using Gradient Descent Ascent (GDA), its optimistic variant, and extra gradient. Specifically, we present a family of team games whose induced utility is \emph{non} multi-linear with \emph{non} attractive \emph{per-se} mixed Nash Equilibria, as strict saddle points of the underlying optimization landscape. Leveraging techniques from control theory, we complement these negative results by designing a modified GDA that converges locally to Nash equilibria. Finally, we discuss connections of our framework with AI architectures with team competition structures like multi-agent generative adversarial networks.

Since their introduction in Goodfellow et al. (2014) as sampling algorithms, Generative Adversarial Networks (GANs) have evolved to produce remarkable results in several tasks e.g. image generation, text-to-image translation, etc. Statistically, a GAN may be viewed as a density estimate constructed by optimizing over an Integral Probability Metric (IPM) encoded by its discriminator. I will present our work on estimating a nonparametric density under IPMs defined by Besov spaces. Such IPMs are a rich class of losses and include, e.g., Lp distances, the total variation distance, and generalizations of both the Wasserstein and the Kolmogorov-Smirnov distances. Our results generalize, unify, or improve several results, both recent and classical. Consequently, we imply bounds on the statistical error of a GAN, showing that GANs are minimax optimal and in some cases, strictly outperform the best linear estimator (e.g. the empirical estimator, kernel density estimator).

Further, we study the above framework of nonparametric density estimation under the Huber contamination model, in which a proportion of the data comes from an unknown outlier distribution. We provide a minimax optimal estimator that adapts to both an unknown contamination proportion and the unknown smoothness of the true density. We use this to imply that certain GAN architectures are robustly minimax optimal.