Theory at the Institute and Beyond, January 2021
by Prasad Raghavendra (Simons Institute)
Estimating the mean
Every so often as a researcher one has the feeling of being born too late. This is the sentiment that the most basic questions have all been already answered by those who came before. Soon, however, one encounters a result that shakes this feeling by demonstrating how many basic questions remain unanswered. The result I describe here is of this kind.
The question is probably the simplest in statistics: Given samples \(x_{1},x_{2},\ldots,\ x_{n}\) from an unknown distribution D over real numbers, how would we estimate the mean of the distribution?
The most natural algorithm would be to add the samples and divide by n, i.e., \(\frac{1}{n}\sum_{i}^{}x_{i}\
The quality of an algorithm is measured by the number of samples it needs. More precisely, we are concerned about the number of samples needed for the estimate to be within an error \(\epsilon\) of the true mean with probability at least 1-\(\text{δ.}\) If the distribution D is Gaussian, then indeed the empirical mean is the optimal estimate; one needs \(K\left( \epsilon,\delta \right) = (2\log{1/\delta)/\epsilon\hat{
Empirical mean is far from reaching this gold standard, even on natural distributions like power law distributions that are ubiquitous in real data. To overcome this limitation of empirical mean, the median of means algorithm was proposed in the 1980s. The idea is simple but clever: partition the samples into buckets, compute the empirical mean in each bucket, and output the median of these means. This estimator is a fundamental subroutine that is widely used in streaming algorithms.
The median of means estimator comes within a constant factor of the gold standard, on arbitrary distribution with bounded variance. Over the past decade, estimators have been proposed that exactly match the gold standard, but with additional assumptions on the distribution, such as knowing the variance or bounded fourth moments.
Finally, in a recent work, Jasper C.H. Lee and Paul Valiant have devised an estimator that exactly matches the gold standard number of samples on arbitrary distribution with bounded variance.
Tempted as one might be to believe that this marks the end of the story for mean estimation over reals, it probably doesn’t. Estimating the mean in higher dimensions or in the presence of an adversary are entirely different stories that have also seen major developments in recent years.
Quantum LDPC codes beating \(N^{1/2}\) barrier
Quantum error correcting codes have seen a major advance in the past few months, breaking a barrier that stood for more than two decades.
Error correcting codes cope with errors by introducing redundancy in the form of parity checks between the bits of the codeword. For example, say the first three bits of the codeword satisfy \(x_{1} \oplus x_{2} \oplus x_{3} = 0\). A low-density parity check code (LDPC code) is one that is specified by parity checks, all of which are on a constant number of bits (say four bits). LDPC codes are an important family of error correcting codes fundamental to both the theory and practice, with the first construction dating back to Gallager in 1960.
A quantum error correcting code encodes quantum information, making it resilient to noise. Concretely, suppose we have a single qubit of the form \(\alpha\left| 0 \right\rangle + \beta|1\rangle\). Then a quantum error correcting code encodes the qubit as the joint state of, say, \(N\) qubits. Here each qubit is manifested by a physical system such as the spin of a particle. The goal is to be able to recover the original qubit \(\alpha\left| 0 \right\rangle + \beta|1\rangle\) even if some number \(D\) of these \(N\) qubits are corrupted. The distance \(D\) of the code is the maximum number of qubit corruptions from which the codeword can still be recovered.
In the context of quantum error correcting codes, a "parity check” corresponds to a physical interaction (technically, a term in the Hamiltonian) between the particles corresponding to those qubits. From the standpoint of a physical implementation, it is important that the parity checks are local, in that each interaction is on a small number of qubits and hopefully between qubits that are physically close to each other. Being an LDPC code is thus especially desirable in the quantum setting. Even from a theoretical perspective, constructing quantum LDPC codes is likely an important step toward the quantum PCP theorem.
The celebrated toric code proposed by Kitaev in 1997 achieves a distance of \(O(N^{\frac{1}{2}})\) using \(N\) qubits. There have been many subsequent constructions of quantum LDPCs, but the distance of the code remained at \(O(N^{\frac{1}{2}}poly(\log{
As I write the rest of this segment, I find that it is akin to a travelogue through a scenic terrain like Yosemite. It is impossible for the traveler to faithfully reproduce much of what he/she saw in a few short passages. Yet it would be remiss not to paint impressionistic glimpses of the trip with the hopes that a captivated reader might make the trip through these parts, when they find the time.
Our journey begins with quantum error correction. In classical error correction, one is encoding a bit that is \(|0\rangle\) or \(|1\rangle\), using a set of bits. A quantum error correcting code encodes a qubit \(\alpha\left| 0 \right\rangle + \beta|1\rangle\), or equivalently a two-dimensional complex valued vector \((\alpha,\beta)\), as a joint state of a number of \(k\) qubits (a complex valued vector with \(2^{k}\) coordinates). In classical codes, the only form of corruption is bit flips that interchange \(\left| 0 \right\rangle \leftrightarrow \left| 1 \right\rangle\), while in quantum codes, the corruption of a qubit is an arbitrary rotation (complex unitary) applied to the two-dimensional space corresponding to the qubit.
At this point in our journey, quantum error correcting codes should seem like an altogether different beast from classical codes; they encode using continuous real numbers and deal with continuous errors.
Here is the first twist in our journey: one can construct quantum error correcting codes (over complex numbers) using two related classical error correcting codes (over {0,1})! This construction, known as the Calderbank-Shor-Steane (CSS) code, will serve as a blueprint for the quantum LDPC constructions we will see. The crucial underlying insight behind CSS codes is that being resilient to every possible rotation of qubits can be reduced to being resilient to two very specific kinds of errors.
To construct a quantum code via the CSS construction, one needs two classical error correcting codes with a specific constraint between the two codes. It is at this point in the journey, by what seems like a very fortunate coincidence, that we unexpectedly find ourselves in the realm of topology! Specifically, the two related classical codes over {0,1} needed for CSS construction turn out to be precisely the same as objects in topology, namely the homology of chain complexes.
By virtue of this mysterious coincidence, surfaces and manifolds automatically yield corresponding quantum error correcting codes! In particular, starting with the surface of a torus, one arrives at Kitaev’s celebrated toric code construction from the late 1990s.
With the toric code, one arrives at a junction in our journey where topology, quantum computation, quantum error correction, and condensed matter physics all seem to meet. After briefly marveling at all the mountain ranges that lie beyond to explore in every direction from here, we continue our journey toward our chosen destination, a quantum LDPC code.
The toric code achieves a distance of \(\Theta(N^{\frac{1}{2}})\). By using a carefully chosen manifold in place of the torus, one obtains the construction by Freedman-Meyer and Luo that beats \(N^{1/2}\) by a polylogarithmic factor for the first time. Almost two decades later, in 2020, Evra, Kaufman, and Zemor improved the rate by polylogarithmic factors via a code based on algebraic objects called the Bruhat-Tits buildings connected to constructions of high-dimensional expanders.
Finally, we arrive at the work of Hastings, Haah, and O’Donnell, which beats the \(N^{1/2}\) barrier. The key ingredient in their construction is a simple geometric notion called a twisted bundle, which we will elucidate a little here. Through the connection between topology and quantum codes, operations on manifolds translate into corresponding operations on quantum codes. For example, product of manifolds translates into a homological product of the corresponding quantum codes.
The canonical product of a line segment and a circle yields a circular band. Notice that a circular band is obtained by associating a line segment with each point on a circle. The twisted bundle is a notion of product of manifolds that yields a Mobius strip instead of the circular band. Specifically, in a Mobius strip, one associates a line segment with each point on a circular loop, but there is a "twist” that drastically changes the topology. A twisted product with random twists lies at the heart of the new quantum LDPC construction, a fitting addition to our journey so far.
(The motivated reader might find interesting the mini crash course in quantum error correction at the Simons Institute and this lecture by Ryan O’Donnell on their latest work.)
Convex sets are pretty good expanders
In the epic poem The Aeneid, Queen Dido was promised as much land as could be enclosed in the hide of a bull. What she faced was the classic problem of isoperimetry: Among all curves of length one, which curve encloses the maximum area? The ancient Greeks already knew that the answer is the circle. More generally, in every dimension, the ball has the largest volume for a given boundary.
Consider the following variant of the problem: Given a unit sphere in three or more dimensions, what is the optimal partition into two equal parts by volume that minimizes the separating boundary? As one might expect, the answer is to cut the sphere with a hyperplane through its center. More generally, one can ask, what is the optimal way to cut a region of space into two parts?
Before we consider the question, we need to resolve how to compare partitions with different set sizes.
To this end, define the notion of isoperimetric ratio as:
The isoperimetric ratio of a subset \(S\) is the ratio of its "surface area” to its volume or its complement, whichever is smaller. Formally, it is given by \({\ ϕ}\left( S \right) = vol(\partial S)/\min(vol\left( S \right),vol\left( S^{C} \right))\), where \(\partial S\ \)denotes the boundary of \(S\) while \(\text{vol}\) denotes volume.
Minimizing the isoperimetric ratio amounts to finding a partition that incurs the least boundary per unit of volume separated. Borrowing terminology from the setting of discrete graphs, the partition minimizing isoperimetric ratio is the “sparsest cut.”
Classic results show that the sparsest cuts in a Euclidean sphere or in the Gaussian distribution are given by hyperplanes. Of course, hyperplanes are not the optimal sparsest cuts in general.
The Kannan-Lovasz-Simonovits (KLS) conjecture asserts that hyperplanes are nearly optimal sparsest cuts in all convex sets, up to a constant factor. The KLS conjecture is one of the most important open problems in convex geometry.
Using the technique of stochastic localization introduced by Ronen Eldan, Yin Tat Lee and Santosh Vempala made a fundamental advance on the KLS conjecture in 2019, showing that the conjecture is true up to a \(O(n^{\frac{1}{4}})\) factor for n-dimensional convex sets. In a recent work, Yuansi Chen showed that the approach of Lee and Vempala can be used to prove the conjecture up to a subpolynomial (\(n^{o(1)})\) factor. The announcement of Chen’s result was timed perfectly so that it was the topic for the final lecture of the reading group on stochastic localization at the Simons Institute program on Probability, Geometry, and Computation in High Dimensions in December.
Not only is the KLS conjecture an aesthetically appealing claim about convex sets, but it has several important consequences. The conjecture was originally motivated by its application to MCMC algorithms for estimating volume of convex sets. However, as it turns out, resolving the KLS conjecture will considerably expand our understanding of convex sets by implying well-known conjectures about convex sets. Specifically, an affirmation of the KLS conjecture would confirm the following conjectures in convex geometry:
-
Slicing conjecture: Given a convex body of unit volume in n-dimensions, there exists some hyperplane section, aka slice, of the convex body with (n-1)-dimensional volume that's at least a constant (independent of dimension).
-
Thin-shell conjecture: Suppose we sample a vector \(x\) with standard Gaussian coordinates. Then its length \(\left\| x \right\|\) is about \(\sqrt{n}\), with a variance of only \(O(1)\). In other words, even though technically the Gaussian distribution is supported over all of Euclidean space, almost all of the mass of the Gaussian distribution is supported on a thin shell close to radius \(\sqrt{n}\). The thin-shell conjecture asserts that the same holds for every convex set in isotropic position — i.e., most of the volume of the convex set is close to a thin shell.
-
Central limit theorem: Fix a direction \(v\). Pick a random point \(x\) from the convex body, and consider the projection \(\left\langle v,x \right\rangle\). It is conjectured that for every convex body, for most directions \(v\), the distribution \(\langle v,x\rangle\) is close to Gaussian. If the convex body is the hypercube, then this is the standard central limit theorem.
The KLS conjecture has many more implications to our understanding of convex sets in high-dimensional space. We refer the reader to the survey by Lee and Vempala.