Events
Fall 2021
Meet the Fellows Welcome Event: Wednesday Schedule
Wednesday, September 8th, 2021, 10:00 am–5:30 pm
Location:
Calvin Lab Auditorium
Wednesday, September 8
Schedule Thursday, September 9
Schedule
Wednesday, September 8, 2021
Session 1 | Machine Learning I |
10 a.m. – 10:10 a.m. |
Synthetic Interventions
Dennis Shen (MIT) | Abstract
Consider a setting where there are N heterogeneous units (e.g., individuals, sub-populations) and D interventions (e.g., incentive programs, medications). Our goal is to learn the causal effect of every intervention on every unit (i.e., N x D causal parameters) based on potentially confounded data, where the number of observations does not scale with N x D. Towards this, we present a causal framework, synthetic interventions (SI), to infer these N x D causal parameters while only having access to observations associated with each of the N units under two interventions, independent of D. Empirically, we validate our framework through both experimental and observational case studies; namely, a large-scale A/B test performed on an e-commerce platform, phase 3 clinical trial data from a pharmaceutical company, and an evaluation of COVID-19 related policies.
|
10:10 a.m. – 10:20 a.m. |
Linear Predictors in Reinforcement Learning: From Exponential Lower Bounds to Polynomial Upper Bounds
Andrea Zanette (UC Berkeley) | Abstract
Linear predictors are among the easiest type of function approximation: they are simple, effective and widely used in statistics and machine learning. However, when linear predictors are implemented in mainstream reinforcement learning algorithms, divergence may occur, suggesting that reinforcement learning presents unique challenges compared to machine learning. In this talk we briefly present recent results with linear methods, from exponential information-theoretic lower bounds for offline RL to state of the art upper bounds when additional assumptions about the MDP are made.
|
10:20 a.m. – 10:30 a.m. |
Towards Credible and Effective Data-Driven Decision-Making
Angela Zhou (UC Berkeley) | Abstract
Learning to make decisions from datasets in realistic environments is subject to practical challenges such as unobserved confounders, missingness, and bias, which may undermine the otherwise beneficial impacts of data-driven decisions. In this talk, we introduce a methodological framework for learning causal-effect maximizing personalized decision policies in the presence of unobserved confounders; or obtaining bounds in more complicated settings such as sequential decison-making. Personalized decisions improve outcomes if treatment effects are heterogeneous, as in domains such as e-commerce, healthcare, and public policy. However, recent work unilaterally assumes unconfoundedness: that there are no unobserved confounders affecting treatment and outcome, which is often untrue for widely available observational data. We develop a methodological framework that accounts for possible unobserved confounding by minimizing the worst-case estimated regret over an ambiguity set for propensity weights. We prove generalization guarantees that ensure the learned policy will be safe when applied in practice and maintains the same rate of convergence as the non-robust approach. Finally, we provide a semi-synthetic case study on personalizing hormone replacement therapy based on the parallel WHI observational study and clinical trial. Hidden confounding can hinder existing policy learning approaches and lead to unwarranted harm, while the novel robust approach guarantees safety and focuses on well-evidenced improvement. On a technical level, the work relates to developing methods with guarantees for data-driven decision-making under ambiguity: using optimization to expand the design space of statistical estimands, and statistics to establish convergence rates thereafter.
|
10:30 a.m. – 10:40 a.m. |
Primal--Dual Optimization and Application to Sampling
Adil Salim (Simons Institute) | Abstract
Optimization and sampling are fundamental tasks of machine learning procedures. Although these two tasks seem unrelated, they can be unified through the lens of optimal transport theory. Indeed, sampling from some target distribution can be reformulated as minimizing a functional defined over a space of probability distributions called the Wasserstein space. In this talk, I will illustrate this connection in the setting of primal--dual optimization. Primal--dual optimization is a natural framework for minimizing composite functions over the Euclidean space. I will show how primal--dual optimization techniques can be carried over the Wasserstein space to sample from distributions whose log density is composite. In particular, by viewing it as a primal--dual algorithm, we will derive new results for the proximal Langevin algorithm in the nonsmooth setting.
|
10:40 a.m. – 10:50 a.m. |
Data Efficient Methods for Reinforcement Learning
Xian Carrie Wu (Simons Institute) | Abstract
I will give a brief introduction of some of my current research directions related to data efficient methods for Reinforcement Learning.
|
10:50 a.m. – 11 a.m. |
Theoretical Foundations of Data-Driven Algorithm Design
Ellen Vitercik (Carnegie Mellon ) | Abstract
Algorithms often have tunable parameters that impact performance metrics such as runtime and solution quality. For many algorithms used in practice, no parameter settings admit meaningful worst-case bounds, so the parameters are made available for the user to tune. Alternatively, parameters may be tuned implicitly within the proof of a worst-case approximation ratio or runtime bound. Worst-case instances, however, may be rare or nonexistent in practice. A growing body of research has demonstrated that data-driven algorithm design can lead to significant improvements in performance. This approach uses a training set of problem instances sampled from an unknown, application-specific distribution and returns a parameter setting with strong average performance on the training set. In this talk, I will cover recent research that provides statistical guarantees for data-driven algorithm design.
|
11 a.m. – 11:30 a.m. | Break |
Session 2 | Algorithms and Complexity |
11:30 a.m. – 11:40 a.m. |
Distribution-Free Robustness
Sitan Chen (UC Berkeley) | Abstract
A resurgence of interest in robust statistics in recent years has led to a number of powerful algorithmic frameworks for learning from adversarially contaminated data. One limitation of these approaches is that they crucially exploit the assumption that the uncorrupted data points come from a fairly structured distribution, e.g. satisfying certain moment bounds. In this talk, I'll survey some recent and ongoing work on developing algorithms for these problems which can handle datasets sampled from arbitrary distributions or even generated in an online fashion. I'll also discuss some of my other work on learning neural networks and mixture models, as well as learning from quantum data.
|
11:40 a.m. – 11:50 a.m. |
Pseudorandom Generators and Small-Space Derandomization
William Hoza (Simons Institute) | Abstract
Compared to other big open questions in complexity theory, there is a lot of optimism about proving L = BPL in our lifetimes. Over the past few decades, there has been steady progress on the problem from several different angles. Arguably the most promising approach is to design sufficiently powerful pseudorandom generators (PRGs). PRGs have applications beyond derandomization, and they provide deep insights into the powers and limitations of the models of computation that they fool. In this talk, I will give a brief overview of these topics and my research in the area.
|
11:50 a.m. – 12 p.m. |
Preconditioning and Locality in Algorithm Design
Jason Li (Simons and Berkeley) | Abstract
We highlight recent advances in graph algorithms for problems such as deterministic mincut, Steiner mincut, parallel shortest path, and more. The common theme behind these improvements are the concepts of preconditioning and locality, two simple ideas that have revolutionized the algorithmic field for the past two decades.
|
12 p.m. – 12:10 p.m. |
Query, Communication and Discrepancy
Makrand Sinha (Simons and Berkeley) | Abstract
Query and Communication Complexity have turned out to be powerful models for understanding unconditional limitations on algorithms. I will give a brief overview of my work related to separations in quantum query and communication complexity, and lower bounds on the size of linear programs for combinatorial optimization problems. I will also touch upon some of my recent efforts in discrepancy theory, which has also inspired some new lower bound techniques in complexity, such as applications of stochastic calculus and high-dimensional probabilistic ideas.
|
12:10 p.m. – 12:20 p.m. |
Dynamic Linear Algebra
Jan van den Brand (Simons and Berkeley) | Abstract
Dynamic linear algebra -- algorithmic techniques for matrices that change over time -- lies at the core of many applications, from continuous optimization to efficient graph algorithms to machine learning. Yet, until recently, the full power of dynamic linear algebra was not known and exploited in most applications. In this talk, I will overview progress on longstanding problems in dynamic shortest path data structures, regression algorithms, optimal transport, and other problems using dynamic linear algebra.
|
12:20 p.m. – 12:30 p.m. |
Let's Sample Perfectly
Siddharth Bhandari (Simons Institute) | Abstract
We will go over some fascinating questions trying to bridge the gap between approximate and perfect sampling. In particular, we will explore how the Coupling From The Past (CFTP) framework can be implemented efficiently for certain underlying Markov Chains.
|
12:30 p.m. – 12:40 p.m. |
A Complexity-Theoretic Perspective on Fairness
Michael P. Kim (UC Berkeley) | Abstract
Algorithms make predictions about people constantly. The spread of such prediction systems has raised concerns that algorithms may exhibit unfair discrimination, especially against individuals from minority groups. While it's easy to speculate on the risks of unfair prediction, devising an effective definition of algorithmic fairness is challenging. We highlight a notion called multi-calibration that formalizes the goals of fair prediction through the lens of complexity theory. Multi-calibration requires that algorithmic predictions be well-calibrated (unbiased), not simply overall, but simultaneously over a rich collection of subpopulations. This ``multi-group'' notion strengthens the guarantees of group fairness definitions, without incurring the cost (statistical and computational) associated with individual protections.
|
12:40 p.m. – 2 p.m. | Lunch |
Session 3 | Quantum Computing I |
2 p.m. – 2:10 p.m. |
Quantum Simulation of Relativistic Electronic Structure Hamiltonians
Torin Stetina (Simons Institute) | Abstract
Special relativity not only affects high speed travel, but also the macroscopic properties of heavy element containing molecules and materials. For example, the color of gold would normally be predicted to appear silver using standard non-relativistic electronic structure theory. These effects arise from the microscopic description of relativistic electrons using the Dirac equation, and can consequently change the quantitative and qualitative predictions of the properties of matter. In recent years, quantum simulation algorithms have primarily focused on the non-relativistic ground state electronic structure problem, but in this work, we extend the Hamiltonian to include second order effective quantum electrodynamics, and analyze its computational cost for a fault tolerant quantum computer.
|
2:10 p.m. – 2:20 p.m. |
Post-Quantum Succinct Arguments: Breaking the Quantum Rewinding Barrier
Fermi Ma (Simons and Berkeley) | Abstract
We prove that Kilian's four-message succinct argument system is post-quantum secure in the standard model when instantiated with any probabilistically checkable proof and any collapsing hash function (which in turn exist based on the post-quantum hardness of Learning with Errors). At the heart of our proof is a general-purpose quantum rewinding procedure that enables a reduction to repeatedly query a quantum adversary for accepting transcripts as many times as desired. Based on joint work with Alessandro Chiesa, Nicholas Spooner, and Mark Zhandry. https://eprint.iacr.org/2021/334.pdf
|
2:20 p.m. – 2:30 p.m. |
Quantum Necromancy and the Hardness of Observing Schrodinger's Cat
Yosi Atia (UC Berkeley) | Abstract
When applied to macroscopic scales, the unitarity of quantum mechanics leads to paradoxes such as Schrodinger's cat and Wigner's friend. The question of when and how superpositions of macroscopic states become "effectively measured" is fundamental in quantum mechanics. Here, we examine this question using the tools of quantum computational complexity. We prove that if one had a quantum circuit to determine whether a system was in an equal superposition of two orthogonal states (for example, the |Alive> and |Dead> states of Schrodinger's cat), then with only a slightly larger circuit, one could also swap the two states (e.g., bring a dead cat back to life). In other words, observing interference between the |Alive> and |Dead> states is a "necromancy-hard'' problem: technologically infeasible in any world where death is permanent. We also show that this statement is robust -- i.e., even a partial ability to observe interference implies partial swapping ability and vice versa. Finally, without relying on any unproved complexity conjectures, we show that all of these results are quantitatively tight. Our results have possible implications for the state dependence of observables in quantum gravity, the subject that originally motivated this study. Joint work with Scott Aaronson and Leonard Susskind (https://arxiv.org/abs/2009.07450)
|
2:30 p.m. – 2:40 p.m. |
Short Proof of a Spectral Chernoff Bound for Local Hamiltonians
Nilin Abrahamsen (Simons Institute) | Abstract
We give a simple proof of a Chernoff bound for the spectrum of a k-local Hamiltonian based on Weyl's inequalities. The complexity of estimating the spectrum's ϵ(n)-th quantile up to constant relative error thus exhibits the following dichotomy: For ϵ(n)=d−n the problem is NP-hard and maybe even QMA-hard, yet there exists constant a>1 such that the problem is trivial for ϵ(n)=a−n. We note that a related Chernoff bound due to Kuwahara and Saito (Ann. Phys. '20) for a generalized problem is also sufficient to establish such a dichotomy, its proof relying on a careful analysis of the \emph{cluster expansion}.
|
2:40 p.m. – 3 p.m. |
Break
|
Session 4 | Computational Complexity of Statistical Inference |
3 p.m. – 3:10 p.m. |
Fundamental Limits and Optimal Uses of Data in Modern Learning Problems
Yanjun Han (Simons Institute) | Abstract
What is the best we can do with the amount of data at our disposal with a given learning task? Modern learning problems—with a modest amount of data or subject to data processing constraints—frequently raise the needs to understand the fundamental limits and make judicious use of the available small or imperfect data. Given a learning problem, the aim of this talk is to exploit its key structure, as well as optimally trade between real-world resources, to achieve statistical efficiency. This talk will cover several examples spanning statistics, information theory, theoretical computer science, operations management, and economics.
|
3:10 p.m. – 3:20 p.m. |
Convex Programs, Computational Phase Transitions, and Robust Algorithms
Sam Hopkins (UC Berkeley) | Abstract
I will highlight a few of my current interests in the intersection of algorithms, computational complexity, and high-dimensional statistics, including: -- the relationship between the predictions about computational tractability made using sophisticated but non-rigorous tools from statistical physics and the (provable) success or failure of provable and robust convex-programming-based algorithms -- convex programs for Bayesian statistics with provable guarantees -- lower bounds against sum-of-squares based algorithms -- and more
|
3:20 p.m. – 3:30 p.m. |
Information Theory and Statistics
Cynthia Rush (Columbia University) | Abstract
I will use the time to discuss a few of my current research interests and directions related to the fields of information theory and statistics. In particular, I will discuss my efforts towards obtaining a better understanding of the fundamental limits of statistical recovery in complex problems and designing computationally-efficient methods to perform at those limits and towards establishing rigorous theoretical guarantees for the performance of high-dimensional statistical estimation and inference procedures.
|
3:30 p.m. – 4 p.m. |
Break
|
Session 5 | Computational Complexity of Statistical Inference |
4 p.m. – 4:10 p.m. |
Understanding Statistical-to-Computational Gaps via Low-Degree Polynomials
Alex Wein (Simons Institute) | Abstract
Low-degree polynomials are a restricted model of computation that captures the best known polynomial-time algorithms for many problems in high-dimensional statistics (including planted clique, sparse PCA, and more). As such, lower bounds against low-degree polynomials are a form of concrete evidence for average-case computational hardness. I will give an overview of some of the techniques used to prove lower bounds against low-degree polynomials for three different types of problems: detection, recovery, and optimization.
|
4:10 p.m. – 4:20 p.m. |
Sharp Threshold for the Planted Matching Problem
Dana Yang (Cornell University) | Abstract
Inference on random graphs is an exciting field both for the broad practical interest and the various elegant proof techniques. Here we look at the inference of a planted perfect matching on a random bipartite graph, where the edge weights are independent and their distributions differ for edges in or outside the planted matching. This model is a planted version of the classical minimum weight perfect matching problem on bipartite graphs. Under the planted matching model, we show that the optimal reconstruction error (average fraction of mismatched edges) undergoes a phase transition from zero to strictly positive at a critical threshold. We characterize this threshold, and show that it is attained by the MLE, i.e. the minimum weight perfect matching on the log-likelihood-ratio graph. This is joint work with Jian Ding (Penn), Yihong Wu (Yale), and Jiaming Xu (Duke).
|
4:20 p.m. – 4:30 p.m. |
On the Power of Preconditioning in Sparse Linear Regression
Frederic Koehler (UC Berkeley) | Abstract
Sparse linear regression with ill-conditioned design matrices is notoriously difficult to solve with computationally efficient algorithms. We discuss some recent algorithmic results showing how to efficiently solve sparse linear regression in certain (random design) settings using preconditioning, as well as new lower bound constructions in the same average case setting, where the optimally preconditioned Lasso provably fails. We also discuss some related work and open problems involving graphical models and other areas.
|
4:30 p.m. – 4:40 p.m. |
Correlation Adjusted Debiasing (CAD): Debiasing the Lasso with Inaccurate Covariate Model
Michael Celentano (Stanford University) | Abstract
We consider the problem of estimating a low-dimensional parameter in high-dimensional linear regression. Constructing an approximately unbiased estimate of the parameter of interest is a crucial step towards performing statistical inference. Several authors suggest to orthogonalize both the variable of interest and the outcome with respect to the nuisance variables, and then regress the residual outcome with respect to the residual variable. This is possible if the covariance structure of the regressors is perfectly known, or is sufficiently structured that it can be estimated accurately from data (e.g., the precision matrix is sufficiently sparse). Here we consider a regime in which the covariate model can only be estimated inaccurately, and hence existing debiasing approaches are not guaranteed to work. When errors in estimating the covariate model are correlated with errors in estimating the linear model parameter, an incomplete elimination of the bias occurs. We propose the Correlation Adjusted Debiased Lasso (CAD), which nearly eliminates this bias in some cases, including cases in which the estimation errors are neither negligible nor orthogonal. We consider a setting in which some unlabeled samples might be available to the statistician alongside labeled ones (semi-supervised learning), and our guarantees hold under the assumption of jointly Gaussian covariates. The new debiased estimator is guaranteed to cancel the bias in two cases: (1) when the total number of samples (labeled and unlabeled) is larger than the number of parameters, or (2) when the covariance of the nuisance (but not the effect of the nuisance on the variable of interest) is known. Neither of these cases is treated by state-of-the-art methods.
|
4:40 p.m. – 5:30 p.m. | Reception |