Monday, August 26th, 2013

9:00 am9:45 am

We show optimal (up to a constant factor) NP-hardness for maximum constraint satisfaction problem with k variables per constraint (Max-k-CSP), whenever k is larger than the domain size. This follows from our main result concerning CSPs given by a predicate: a CSP is approximation resistant if its predicate contains a subgroup that is balanced pairwise independent. Our main result is analogous to Austrin and Mossel’s, bypassing their Unique-Games Conjecture assumption whenever the predicate is an abelian subgroup.

Our main ingredient is a new gap-amplification technique inspired by XOR-lemmas. Using this technique, we also improve the NP-hardness of approximating Independent-Set on bounded-degree graphs, Almost-Coloring, Two-Prover-One-Round-Game, and various other problems.

11:30 am11:55 am
This talk shall focus on the problem of finding large independent sets in hypergraphs with certain useful structural properties. In particular, we shall describe tight inapproximability for independent set in: (i) almost 2-colorable 3-uniform hypergraphs and, (ii) k-uniform hypergraphs with a blocked-partiteness property–a generalization of the k-partiteness property studied in the work of Bansal and Khot [BK10]. Both the above results were known earlier [GS11, BK10] only assuming the Unique Games Conjecture, and we shall describe the Fourier analytic techniques used to obtain them unconditionally. The talk shall also sketch the applications of the second result which yield tight inapproximability for two well known scheduling problems–Concurrent Open Shop and the Assembly Line Problem.
1:45 pm2:30 pm

The projection games problem is the following problem in combinatorial optimization:

Given a bipartite graph G=(A,B,E), sets of labels Sigma_A, Sigma_B to the vertices, and functions pi_e:Sigma_A->Sigma_B on the edges ("projections"), the goal is to find a labeling of the vertices f_A:A-->Sigma_A, f_B:B-->Sigma_B that satisfies as many of the projections as possible, i.e., for as many e=(a,b) in E as possible pi_e(f_A(a))=f_B(b).

The projection games problem is known to be very hard to approximate: even if there exists a labeling that satisfies all edges, it is NP-hard to find a labeling that satisfies epsilon=o(1) fraction of the edges.
This theorem ("the strong PCP theorem") is the starting point of most optimal NP-hardness of approximation results. In this talk we'll discuss new approximation algorithms for projection games.

This is joint work with Pasin Manurangsi.

3:00 pm3:25 pm

A boolean predicate $f:\{0,1\}^k\to\{0,1\}$ is said to be {\em somewhat approximation resistant} if for some constant $\tau > \frac{|f^{-1}(1)|}{2^k}$, given a $\tau$-satisfiable instance of the $\maxkcsp(f)$ problem, it is NP-hard to find an assignment that {\it strictly beats} the naive algorithm that outputs a uniformly random assignment. Let $\tau(f)$ denote the supremum over all $\tau$ for which this holds. It is known that a predicate is somewhat approximation resistant precisely when its Fourier degree is at least $3$. For such predicates, we give a characterization of the {\it hardness gap} $(\tau(f) - \frac{|f^{-1}(1)|}{2^k})$ up to a factor of $O(k^5)$. We relate the hardness gap to the Fourier mass on coefficients of degree 3 or higher, giving hardness results when the Fourier mass is large and an SDP-based approximation algorithm for the case when the Fourier mass is small. We also give a similar characterization of the {\it integrality gap} for the natural SDP relaxation of $\maxkcsp(f)$ after $\Omega(n)$ rounds of the Lasserre hierarchy.

3:40 pm4:05 pm

For a predicate $f:{-1,1}^k \mapsto {0,1}$ with $\rho(f) = \frac{|f^{-1}(1)|}{2^k}$, we call the predicate strongly approximation resistant if given a near-satisfiable instance of CSP$(f)$, it is computationally hard to find an assignment such that the fraction of constraints satisfied is outside the range $[\rho(f)-\Omega(1), \rho(f)+\Omega(1)]$.

We present a characterization of strongly approximation resistant predicates under the Unique Games Conjecture. We also present characterizations in the mixed linear and semi-definite programming hierarchy and the Sherali-Adams linear programming hierarchy. In the former case, the characterization coincides with the one based on UGC. Each of the two characterizations is in terms of existence of a probability measure on a natural convex polytope associated with the predicate.

The predicate is called approximation resistant if given a near-satisfiable instance of CSP$(f)$, it is computationally hard to find an assignment such that the fraction of constraints satisfied is at least $\rho(f)+\Omega(1)$. When the predicate is odd, i.e. $f(-z)=1-f(z),\forall z\in {-1,1}^k$, it is easily observed that the notion of approximation resistance coincides with that of strong approximation resistance. Hence for odd predicates, in all the above settings, our characterization of strong approximation resistance is also a characterization of approximation resistance.

Joint work with Subhash Khot and Pratik Worah.

4:20 pm4:45 pm

We present a new data structure for the c-approximate near neighbor problem (ANN) in the Euclidean space. For n points in R^d, our algorithm achieves O(dn^{ ho}) query time and O(n^{1 + ho} + nd) space, where ho <= 7/(8c^2) + O(1 / c^3) + o(1). This is the first improvement over the result by Andoni and Indyk (FOCS 2006) and the first data structure that bypasses a locality-sensitive hashing lower bound proved by O'Donnell, Wu and Zhou (ITCS 2011). By a standard reduction we obtain a data structure for the Hamming space and ell_1 norm with ho <= 7/(8c) + O(1/c^{3/2}) + o(1), which is the first improvement over the result of Indyk and Motwani (STOC 1998).

Joint work with Piotr Indyk, Huy L. Nguyen, Ilya Razenshteyn. arxiv.org/abs/1306.1547

Tuesday, August 27th, 2013

9:00 am9:45 am

Given two Gaussian vectors that are positively correlated, what is the probability that they both land in some fixed set A? Borell proved that this probability is maximized (over sets A with a given volume) when A is a half-space. We will give a new and simple proof of this fact, which also gives some stronger results. In particular, we can show that half-spaces uniquely maximize the probability above, and that sets which almost maximize this probability must be close to half-spaces. We will also mention some applications to testing, and to the analysis of the Goemans-Williamson algorithm.

11:30 am11:55 am

We will survey various generalizations of the results of Kahn, Kalai, and Linial, and that of Friedgut (Friedgut's Junta theorem) to graphs other than the hypercube. Of particular interest will be their generalization to the setting of Cartesian product of arbitrary undirected graphs (or equivalently, reversible Markov chains) because of their applications to hardness of approximation. We will sketch a proof in this setting, extending the arguments of Rossignol, and Falik-Samorodnitsky.

This is joint work with Madhur Tulsiani.

1:45 pm2:30 pm

We give an O(1/epsilon)-query property testing algorithm which distinguishes whether an unknown set has surface area at most A or (is epsilon-far from) surface area at least (4/pi) A. Our result works under n-dimensional Lebesgue measure or n-dimensional Gaussian measure. Previous work only treated the 1-dimensional case.

3:00 pm3:25 pm

A partially symmetric function is a boolean function that is invariant to the reordering of all but a constant number of its variables. The set of partially symmetric functions includes juntas, symmetric functions, and a number of other functions as well. They were first studied by Shannon (1949) in the context of circuit complexity and have since been studied (under various names) in many other areas of theoretical computer science as well. Most recently, partially symmetric functions appeared in the context of property testing, where they are conjectured to essentially characterize the set of functions for which isomorphism testing can be done with a constant number of queries.

In this talk, we will examine how tools from the analysis of boolean functions might be used to understand partially symmetric functions and resolve some fundamental open problems related to the se functions.

Wednesday, August 28th, 2013

9:00 am9:45 am

We show that no polynomial-sized linear programming relaxation can achieve better than a 1/2-approximation for MAX-CUT, a 7/8-approximation for MAX-3SAT, or a 3/4-approximation for MAX-2SAT.

This is accomplished by bringing together two formerly disparate lines of research.  On the one hand, there has been a recent sequence of exciting lower bounds on the size of extended formulations for various polytopes that arise in combinatorial optimization.  At the same time, researchers have extensively studied the power of specific LP hierarchies for approximating NP-hard problems. We show that for max-CSP problems, general polynomial-sized LPs are exactly as power as LPs arising from a constant number of rounds of the Sherali-Adams hierarchy.

Joint work with Siu On Chan, Prasad Raghavendra, and David Steurer.

10:15 am11:00 am

We give a new framework for proving the existence of low-degree, polynomial approximators for Boolean functions with respect to broad classes of non-product distributions. Our proofs do not rely on the dominant Fourier-based methods from computational learning theory. Instead, we derive our approximations using techniques related to the classical method of moments.

The main new application is a polynomial-time algorithm for agnostically learning any function of a constant number of halfspaces (i.e., depth-2 neural networks) with respect to any log-concave distribution on R^n (for any constant accuracy parameter). This result was not known even for the case of PAC learning the intersection of two halfspaces.

Additionally, we show that in the smoothed-analysis setting, the above results hold with respect to any distribution that obeys a sub-exponential tail bound (i.e., sub-exponential or sub-gaussian densities).

Joint work with Raghu Meka.

11:30 am11:55 am

We give a deterministic algorithm for approximately computing the fraction of Boolean assignments that satisfy a degree-2 polynomial threshold function. Given a degree-2 input polynomial p(x_1,...,x_n) and a parameter  eps > 0, the algorithm approximates Pr[p(x)>=0] to within an additive \pm \eps in time poly(n,2^{\poly(1/\eps)}). (Note that it is NP-hard to determine whether the above probability is nonzero, so any sort of multiplicative approximation is almost certainly impossible even for efficient randomized algorithms.) This is the first deterministic algorithm for this counting problem in which the running time is a fixed polynomial in n for any constant eps > 0. For "regular" polynomials p (those in which no individual variable's influence is large compared to the sum of all n variable influences) our algorithm runs in poly(n,1/eps) time.

1:45 pm2:30 pm

We present a deterministic algorithm to epsilon-approximately count the number of satisfying assignments of a k-junta of degree-2 PTFs with a running time of poly(n). exp(k,epsilon). In particular, the algorithm has a poly(n) running time for slightly superconstant k and subconstant eps. It is a significant extension of a recent result on deterministic counting for degree-2 PTFs and is based on using multidimensional CLTs for gaussian polynomials derived from a combination of Stein's method and Malliavin calculus. Also, it provides the first instance of a deterministic poly-time counting algorithm for a non-trivial class of depth-3 circuits.

3:00 pm3:25 pm

A common theme in many areas of theoretical computer science is that it is sometimes much easier to compute approximations to a function than to compute the function exactly, while sometimes approximation does not make things any easier. In this talk, we explore the hardness of approximating functions in the circuit complexity model. Specifically, we restrict our attention to depth-2 circuits (i.e., DNFs and CNFs) and we ask how large such a circuit must be to agree with a given target function on 99% of all possible inputs. As we will see during the talk, the study of this question leads to surprising results and to interesting connections between complexity theory and information theory, the analysis of Boolean functions, and extremal combinatorics.

This talk is based on joint works with Eric Blais.

Thursday, August 29th, 2013

9:00 am9:45 am

Let F be a prime field. An affine-invariant property is a property of functions on F^n that is closed under taking affine transformations of the domain. We prove that every affine-invariant property with a local characterization is testable. In fact, we show that for any such property, there is a test that, given an input function, makes a constant number of queries, always accepts if it satisfies the property, and otherwise rejects with a positive probability depending only on the distance of the function from the property.

10:15 am11:00 am

In this talk we show several results regarding Boolean functions with small spectral norm (the spectral norm of f is $\|\hat{f}\|_1=\sum_{\alpha}|\hat{f}(\alpha)|$). Specifically, we prove the following results for functions $f:\{0,1\}^n \to \{0,1\}$ with $\|\hat{f}\|_1=A$.

1. There is a subspace $V$ of co-dimension at most $A^2$ such that $f|_V$ is constant.

2. f can be computed by a parity decision tree of size $2^{A^2}n^{2A}$. (a parity decision tree is a decision tree whose nodes are labeled with arbitrary linear functions.)

3. If in addition f has at most s nonzero Fourier coefficients, then f can be computed by a parity decision tree of depth $A^2 \log s$.

4. For every $0<\epsilon$ there is a parity decision tree of depth $O(A^2 + \log(1/\epsilon))$ and size $2^{O(A^2)} \cdot \min\{1/\epsilon^2,O(\log(1/\epsilon))^{2A}\}$ that \epsilon-approximates f. Furthermore, this tree can be learned, with probability $1-\delta$, using $\poly(n,\exp(A^2),1/\epsilon,\log(1/\delta))$ membership queries.

All the results above also hold (with a slight change in parameters) to functions $f:Z_p^n\to \{0,1\}$.

11:30 am11:55 am

A locally testable code (LTC) is an error correcting code which admits a very efficient procedure for testing membership in the code: a local tester queries few locations in the received word but still distinguishes codewords from words that are far from the code. The main open questions about LTCs are about the tradeoffs possible between rate, distance and query complexity.

We show that the local testability of a binary linear code is related (and in fact equivalent) to the L_1 embeddability of a related Cayley graph. Thus one can reformulate existential questions about LTCs as questions about the existance of certain Cayley graphs that admit constant distortion embeddings into L_1. We discuss what this tells us about existing results on LTCs and what it might tell us about new results.

Joint work with Salil Vadhan (Harvard) and Yuan Zhou (CMU).

1:45 pm2:30 pm

Let H, G be graphs on k (small) and n (large) vertices respectively. We denote by p(H,G) the probability that a randomly chosen set of k vertices in G spans a subgraph isomorphic to H. For fixed k, the vector (p(H,G) | H is a k-vertex graph) is called the k-profile of G.

Q1: What do the k-profiles of large graphs look like? ("Local graph theory")

Q2: Given information about G's k-profile, what conclusions can we draw? (Local-global graph theory).

Similar questions can be asked concerning numerous other combinatorial objects such as tournaments, permutations etc. In this talk I will present some of what is already known in this area as well as the many intriguing questions that are still open.

The talk covers joint work with: Chaim Even Zohar, Hao Huang, Avraham Morgenstern, Humberto Naves, Yuval Peled, Benny Sudakov.

3:00 pm3:25 pm

We generalize algorithms from computational learning theory that are successful under the uniform distribution on the Boolean hypercube ${0,1}^n$ to algorithms successful on permutation invariant distributions, distributions where the probability mass remains constant upon permutations in the instances. While the tools in our generalization mimic those used for the Boolean hypercube, the fact that permutation invariant distributions are not product distributions presents a significant obstacle.

Under the uniform distribution, halfspaces can be agnostically learned in polynomial time for constant $eps$. The main tools used are a theorem of Peres~cite{Peres:04} bounding the emph{noise sensitivity} of a halfspace, a result of~cite{KOS:04} that this theorem implies Fourier concentration, and a modification of the Low-Degree algorithm of Linial, Mansour, and Nisan~cite{ LMN:93} made by Kalai et. al.~cite{Kalai2008a}. These results are extended to arbitrary product distributions in~cite{BOW2010}.

We prove analogous results for permutation invariant distributions; more generally, we work in the domain of the symmetric group. We define noise sensitivity in this setting, and show that noise sensitivity has a nice combinatorial interpretation in terms of Young tableaux. The main technical innovations involve techniques from the representation theory of the symmetric group, especially the combinatorics of Young tableaux. We show that low noise sensitivity implies concentration on "simple" components of the Fourier spectrum, and that this fact will allow us to agnostically learn halfspaces under permutation invariant distributions to constant accuracy in roughly the same time as in the uniform distribution over the Boolean hypercube case.

Friday, August 30th, 2013

9:00 am9:45 am

Finding cliques in random graphs and the closely related "planted" clique variant, where a clique of size k is planted in a random G(n,1/2) graph, have been the focus of substantial study in algorithm design. Despite much effort, the best known polynomial-time algorithms only solve the problem for k ~ sqrt(n). Here we show that beating sqrt(n) would require substantially new algorithmic ideas, by proving an integrality gap for the problem in the Lasserre/Sum of Squares hierarchy, the most powerful class of semi-definite programming algorithms we know of. Our (average case) lower bound uses tools from the classical theory of association schemes and some new large deviation bounds for matrix-valued polynomials which could be of independent interest.

This is joint work with Avi Wigderson.

10:15 am11:00 am

Fourier PCA is Principal Component Analysis of the covariance matrix obtained after reweighting a distribution with a random Fourier weighting. It can also be viewed as PCA applied to the Hessian matrix of functions of the characteristic function of the underlying distribution. Extending this technique to higher derivative tensors and developing a general tensor decomposition method, we derive the following results: (1) a polynomial-time algorithm for general independent component analysis (ICA), not requiring the component distributions to be discrete or distinguishable from Gaussian in their fourth moment (unlike in the previous work); (2) the first polynomial-time algorithm for underdetermined ICA, where the number of components can be arbitrarily higher than the dimension; (3) an alternative algorithm for learning mixtures of spherical Gaussians with linearly independent means. These results also hold in the presence of Gaussian noise.

11:30 am11:55 am

The theory of learning under the uniform distribution is rich and deep. It is connected to cryptography, computational complexity, and analysis of boolean functions to name a few areas. This theory however is very limited in applications due to the fact that the uniform distribution and the corresponding Fourier basis is rarely encountered as a statistical model.

A family of distributions that vastly generalizes the uniform distribution on the Boolean cube is that of distributions represented by Markov Random Fields (MRF). Markov Random Fields are one of the main tools for modeling high dimensional data in all areas of statistics and machine learning. In this paper we initiate the investigation of extending central ideas, methods and algorithms from the theory of learning under the uniform distribution to the setup of learning concepts given examples from MRF distributions. In particular, our results establish a novel connection between properties of MCMC sampling of MRFs and learning under the MRF distribution.

Based on joint work with Elchanan Mossel.