Monday, September 22nd, 2014

9:30 am10:30 am

I will present a new method to analyze and design iterative optimization algorithms built on the framework of Integral Quadratic Constraints (IQC) from robust control theory. IQCs provide sufficient conditions for the stability of complicated interconnected systems, and these conditions can be checked by semidefinite programming. I will discuss how to adapt IQC theory to study optimization algorithms, proving new inequalities about convex functions. Using these inequalities, I will derive upper bounds on convergence rates for the gradient method, the heavy-ball method, Nesterov's accelerated method, and related variants by solving small, simple semidefinite programming problems. I will close with a discussion of how these techniques can be used to search for algorithms with desired performance characteristics, establishing a new methodology for algorithm design.

Joint work with Laurent Lessard and Andrew Packard.

1:45 pm2:30 pm

For many latent variable models, learning can be done by tensor decomposition on the third-order moment matrix. However, many previous works either focus on the undercomplete case (when the number of hidden components is smaller than the dimension), or require higher order moments. In this talk I will show simple tensor power method still works even for overcomplete case.

Based on joint works with Anima Anandkumar and Majid Janzamin.

3:00 pm3:45 pm

Robust PCA is the well known task of recovering a low rank matrix from measurements corrupted with sparse noise. A popular recovery method is convex relaxation, where the rank and sparsity constraints are replaced by their convex surrogates. This can provably recover the low rank and sparse components under natural constraints such as incoherence and sparsity. In this talk, I will describe how non-convex methods can provably solve the robust PCA problem and also match the guarantees for the convex method. At the same time, our non-convex method has significantly lower computational complexity, since it involves finding only low rank approximations of the observed matrix. Thus, non-convex methods can achieve best of both the worlds: guaranteed recovery and low computational complexity.

This is joint work with Praneeth Netrapalli, U N Niranjan, Prateek Jain, and Sujay Sanghavi.

4:15 pm5:00 pm

Recovering a structured object given a small number of linear observations has been well-studied in recent years. Examples include sparse vectors, low-rank matrices, and the sum of sparse and low-rank matrices, among others. In many applications in signal processing and machine learning, the object of interest has several structures simultaneously, e.g., a matrix that is simultaneously sparse and low-rank. Often norms that promote each structure are known, and allow for recovery using an order-wise optimal number of measurements (e.g., l1 norm for sparsity, nuclear norm for matrix rank), and it is common in practice to minimize a combination of such norms.

We show that multi-objective optimization with these norms can do no better, order-wise, than exploiting only one of the present structures. This suggests that to fully exploit the multiple structures, we need an entirely new convex relaxation, not one that combines the convex relaxations for each structure. We then specialize our result to the case of sparse and low-rank matrices, and  show that a nonconvex formulation of the problem can recover the model from very few measurements  (on the order of the degrees of freedom of the matrix), whereas the convex problem obtained from a combination of the l1 and nuclear norms requires many more measurements. Our framework applies to arbitrary norms as well as to a wide range of measurement ensembles. This allows us to give performance bounds for problems such as sparse phase retrieval and low-rank tensor completion.

This talk is based on joint work with Samet Oymak, Amin Jalali, Yonina Eldar, and Babak Hassibi.

Tuesday, September 23rd, 2014

1:45 pm2:30 pm

We give a polynomial time algorithm for the lossy population recovery problem. In this problem, the goal is to approximately learn an unknown distribution on binary strings of length n from lossy samples: for some parameter mu each coordinate of the sample is preserved with probability mu and otherwise is replaced by a '?'. The running time and number of samples needed for our algorithm is polynomial in n and 1/eps for each fixed mu > 0. This improves on algorithm of Wigderson and Yehudayoff that runs in quasi-polynomial time for any mu > 0 and the polynomial time algorithm of Dvir et al which was shown to work for mu > 0.30 by Batman et al.

The heart of the matter is that we are faced with a certain inverse problem Ax = b, where we are given a noisy approximation to b and the condition number of A is exponentially large. Yet, because we are interested in a single component of x, we can solve this problem despite the fact that it is not well conditioned. Hence, the condition number is not always a barrier to solving statistical inverse problems. Our approach is based on the notion of a robust local inverse (which was introduced in previous work) and we analyze our construction using tools from complex analysis.

This is based on joint work with Mike Saks.

3:00 pm3:45 pm

We describe a seriation algorithm for ranking a set of n items given pairwise comparisons between these items. Intuitively, the algorithm assigns similar rankings to items that compare similarly with all others. It does so by constructing a similarity matrix from pairwise comparisons, using seriation methods to reorder this matrix and construct a ranking. We first show that this spectral seriation algorithm recovers the true ranking when all pairwise comparisons are observed and consistent with a total order. We then show that ranking reconstruction is still exact even when some pairwise comparisons are corrupted or missing, and that seriation based spectral ranking is more robust to noise than other scoring methods. An additional benefit of the seriation formulation is that it allows us to solve semi-supervised ranking problems. Experiments on both synthetic and real datasets demonstrate that seriation based spectral ranking achieves competitive and in some cases superior performance compared to classical ranking methods.

This is joint work with Fajwel Fogel and Milan Vojnovic.

4:15 pm5:00 pm

Signomial programs (SPs) are optimization problems consisting of an objective and constraints specified by signomials, which are sums of exponentials of linear functionals of a decision variable. SPs are non-convex optimization problems in general, and instances of NP-hard problems can be reduced to SPs. We describe a hierarchy of convex relaxations to obtain successively tighter lower bounds of the optimal value in SPs. This sequence of lower bounds is computed by solving increasingly larger-sized relative entropy optimization problems, which are convex programs specified in terms of linear and relative entropy functions. Our approach relies crucially on the observation that the relative entropy function – by virtue of its joint convexity with respect to both arguments – provides a convex parametrization of certain sets of globally nonnegative signomials with efficiently computable nonnegativity certificates via the arithmetic-geometric-mean (AM/GM) inequality. By appealing to representation theorems from real algebraic geometry, we demonstrate that our sequence of lower bounds converges to the global optimum for broad classes of SPs. Finally, we discuss numerical experiments that demonstrate the effectiveness of our approach.

Joint work with Parikshit Shah.

Wednesday, September 24th, 2014

9:30 am10:30 am

In many practical applications, optimization problems arise not in isolation, but as multiple instances of similar problems. In resource management for example, the instances take the form of linear programs involving the same coefficient matrix that describes a fixed network distribution structure, while the cost and “right-hand side” vectors change over time. In statistics, problems of cross-validation may involve set of least-squares problems where the different “design matrices” are very close to each other. In such instances, it makes sense to spend processing time on the common part of the problems, in order to compress it in some way, and speed up the overall computation.

In this work we propose an approach to multiple-instance conic optimization based on “robust sketching”, where the data matrix of an optimization problem is approximated by a "sketch", that is, a simpler matrix that preserves some property of interest, on which computations can be performed faster than the original, while feasibility with respect to the original problem is preserved. We examine applications of the concept in the areas of machine learning and in the context of very large LPs arising in energy management.

11:15 am12:15 pm

We present joint work with Rainer Sinn on the algebraic geometry that underlies semidefinite programming. Our focus is on spectrahedral shadows, that is, convex sets representable by linear matrix inequalities. We characterize the polynomials that vanish on the boundary of a spectrahedral shadow when the defining matrices are generic. The colorful pictures shown in this lecture can be enjoyed by everyone.

1:45 pm2:30 pm

In this talk I will show how one can use the methodology of spectral bounds and SDP hierarchies, which were originally developed for optimization problems on finite graphs, to special classes of infinite graphs. The infinite graphs I am considering possess a lot of geometric structure and so tools from harmonic analysis can be used in proofs and in explicit computations. For many classical packing problems in discrete geometry this approach gives the best known results.

3:00 pm3:45 pm

The positive semidefinite (psd) rank of a nonnegative real matrix M is the smallest integer k for which it is possible to find psd matrices A_i assigned to the rows of M and B_j assigned to the columns of M, of size k x k, such that (i,j)-entry of M is the inner product of A_i and B_j. This is an example of a cone rank of a nonnegative matrix similar to nonnegative rank, and was introduced for studying sdp-representations of convex sets. I will present the main results and open questions we have so far on psd rank. The talk will be largely based on a recent survey written with Hamza Fawzi, Joao Gouveia, Pablo Parrilo and Richard Robinson.

4:15 pm5:00 pm

The Cheeger-Alon-Milman inequality provides an algorithmic method to verify that a graph is an edge-expander, by certifying that its expansion (minimum ratio of the number of edges leaving a subset of vertices to the total number of edge incident to the subset) lies between $\lambda_2$ and $\sqrt{2 \lambda_2}$. In this talk, we present a Cheeger inequality for vertex expansion (minimum ratio of number of vertices adjacent to a subset to the size of the subset), a parameter of fundamental importance, which is also NP-hard and approximable to within $O(\sqrt{\log n}) OPT$ in polynomial-time. The analog of $\lambda_2$ is $\lambda_\infty$, a quantity defined by Bobkov, Houdre and Tetali, which provides a lower bound on the vertex expansion, and can be approximated by an SDP. The resulting algorithmic inequality says that the vertex expansion is sandwiched between $\lambda_\infty$ and $\sqrt{log d \lambda_infty}$, where $d$ is the maximum degree of the graph. Moreover, this bound is the best possible under the Small Set Expansion Hypothesis of Raghavendra and Steurer, implying in particular that unlike edge expansion, constant vertex expansion cannot be certified in polynomial time. We will also discuss generalizations to hypergraphs by Louis.

This is joint work with Anand Louis and Prasad Raghavendra.

Thursday, September 25th, 2014

9:30 am10:30 am

Low rank tensor decompositions are a powerful tool for learning generative models, and uniqueness results give them a significant advantage over matrix decomposition methods. In this talk, I will describe two recent results:

(a) a robust version of a classical uniqueness theorem of Kruskal for tensor decompositions which immediately implies polynomial identifiability in many settings.

(b) a smoothed analysis framework for analyzing algorithms for tensor decomposition which models realistic instances of learning problems and allows us to overcome many of the usual limitations of using tensor methods.

Based on joint works with Aditya Bhaskara, Ankur Moitra and Aravindan Vijayaraghavan.

 
11:15 am12:15 pm

It is not so surprising that quantum mechanics presents hard new problems for optimization. For example, finding the lowest energy configuration of a physical system or simulating its dynamics both become more computationally difficult when we consider quantum systems. Less obvious is that the mathematics of quantum information can yield new methods of analyzing classical hard problems in optimization. In both directions, the link involves optimization problems related to tensors and polynomials. I will survey connections in both directions and discuss some promising open problems.

The talk will not assume a background in quantum information theory. It is partly based on arXiv preprints 1205.4484, 1210.6367 and 1310.0017.

1:45 pm2:30 pm

Noncommutative polynomial optimization (NPO) studies the problem of minimizing polynomials of noncommuting variables subject to non-trivial polynomial constraints. Such variables must be understood as operators acting on a not-a-priori-defined Hilbert space. In this talk I will introduce a practical hierarchy of SDP relaxations to solve NPO problems with dimension constraints. In those problems, the space where the noncommuting variables act is restricted to have a dimension below a threshold, defined by the user. Dimension constrained NPO problems appear naturally in quantum information theory and convex optimization (one such famous problem is the computation of the SDP rank of a positive matrix). I will argue that, for any dimension threshold, the new hierarchy can be run with exponentially fewer resources than the unconstrained one. Finally, I will discuss the efficiency of the new hierarchy by applying it to solve a number of NPO problems which were previously thought to be intractable in a normal desktop. Joint work with T. Vértesi and A. Winter.

3:00 pm3:45 pm

Convex relaxations based on different hierarchies of linear/semi–definite programs have been used recently to devise approximation algorithms for various optimization problems. The approximation guarantee of these algorithms improves with the number of rounds r in the hierarchy, though the complexity of solving (or even writing down the solution for) the r'th level program grows as $n^{\Omega(r)}$ where n is the input size.

In this work, we observe that many of these algorithms are based on local rounding procedures that only use a small part of the SDP solution (of size $n^{O(1)}2^{O(r)}$ instead of $n^{\Omega(r)}$). We give an algorithm to find the requisite portion in time polynomial in its size. The challenge in achieving this is that the required portion of the solution is not fixed a priori but depends on other parts of the solution, sometimes in a complicated iterative manner. Our solver leads to $poly(n) 2^{O(r)}$ time algorithms to obtain the same guarantees in many cases as the earlier $n^{O(r)}$ time algorithms based on r rounds of the Lasserre/SOS hierarchy. In particular, guarantees based on $O(\log n)$ rounds can be realized in polynomial time.

Joint work with Venkatesan Guruswami.

4:15 pm5:00 pm

We present a general method for bounding the accuracy of semi-definite programming relaxations for polynomial optimization. As an example, we apply our mehod to obtain accuracy bounds for optimization over the hypersphere and the boolean cube. The method allows the explicit recovery of approximate representing measures for moment matrix relaxations. Our techniques are loosely inspired by methods used in quantum optics.

Joint work with Andrew Doherty and Pablo Parrilo.

Friday, September 26th, 2014

9:30 am10:30 am

Controlling the eigenvalues of a matrix is an important part of optimization, as many times the condition number of a matrix dictates the effectiveness (or time complexity) of an algorithm. We will discuss various problems relating to controlling eigenvalues; in particular, finding large subsets of well-conditioned vectors inside a given (not well-conditioned) set and dividing sets of vectors into subsets each of which as similar condition number top the original. This will make use of a new probabilistic existence argument we call the "method of interlacing polynomials." Only one of the results we discuss has an algorithmic version (and the algorithm is somewhat independent from the method) and so we will discuss a number of open algorithmic questions.

This is joint work with Dan Spielman and Nikhil Srivastava.

11:15 am12:15 pm

The study of first-order methods has largely dominated research in continuous optimization for the last decade, yet still the range of problems for which optimal first-order methods have been developed is surprisingly limited, even though much has been achieved in some areas with high profile, such as compressed sensing.

We present a simple transformation of any linear (or semidefinite or hyperbolic) program into an equivalent convex optimization problem whose only constraints are linear equations. The objective function is defined on the whole space, making virtually all subgradient methods be immediately applicable. We observe, moreover, that the objective function is naturally smoothed, thereby allowing most first-order methods to be applied.

We develop complexity bounds in the unsmoothed case for a particular subgradient method, and in the smoothed case for Nesterov's original optimal first-order method for smooth functions. We achieve the desired bounds on the number of iterations.

Perhaps most surprising is that the transformation is simple and so is the basic theory, and yet the approach has been overlooked until now, a blind spot. Once the transformation is realized, the remaining effort in establishing complexity bounds is mainly straightforward, by making use of various works of Nesterov.

1:45 pm2:30 pm

In the theory of modern interior-point algorithms, approaches that treat primal and the dual problems in a symmetric way have led to some of the deepest theoretical results as well as some of the most successful software in that area.

I will present some mathematical foundations for the design and analysis of primal-dual symmetric algorithms for convex optimization problems. These foundations make the generalization of such methods from linear and semidefinite optimization settings to general convex optimization setting possible. I will conclude with a complexity analysis which applies to a wide range of primal-dual interior-point algorithms. Our bound on the number of iterations extends the current best iteration complexity bounds from the special cases of linear and semidefinite optimization to hyperbolic cone programming as well as to the general convex optimization setting. There will be connections to Quasi-Newton methods and Riemannian geometry, among others.

This talk is based on joint work with Tor Myklebust.

3:00 pm3:45 pm

If a polynomial is everywhere non negative, it is a sum of square of rational fraction (which is the positive solution of Hilbert's 17th problem). This is an example of a certificate for positivity (more precisely non-negativity), i.e. an algebraic identify certifiying that the polynomial is non-negative.

But how to construct this sum of squares from a proof of the non negativity?

Artin's initial proof of Hilbert's 17th problem in 1927 was very indirect, and did not produce explicitely the sum of squares. Effective methods have been designed since then (starting from Kreisel and his students) but the degree estimate of the corresponding constructions were primitive recursive functions.

In this talk, we recall the initial proof of Artin and present a new constructive proof for Hilbert's 17th problem, and show that following this construction, the degree of the polynomials in the identity is bounded by an elementary recursive function in the number of variables, and the degree of the polynomial (more precisely a tower of five exponentials). We end with a discussion on what could be hoped for.

Based on joint work with Henri Lombardi and Daniel Perrucci.