Friday, December 22nd, 2017

From the Inside: Optimization (2 of 2)

by Daniel Dadush, CWI

Optimization plays an important role in many of the technologies and services we use today, where its application has improved the efficiency of complex systems, enabled powerful approaches in machine learning and statistical inference, and provided powerful tools to hedge against an uncertain future. Salient examples of optimization include methods for training neural networks to yield improved machine translation of text and automatic classification of images, algorithms for minimizing the cost of routing goods across large and complex networks, and methods for automatically selecting the best ads to display during online search.

While the field of optimization is now more than a century old, researchers in optimization to this day do not have well-defined homes, and are instead spread across a variety of disciplines including theoretical computer science, operations research, electrical engineering, economics, and applied mathematics. Because of this, the key problems and tools developed in optimization were not the product of any one discipline, but were instead developed over time within many different fields. Pushing the frontiers of optimization (whether by expanding the reach of current techniques, finding new applications, or developing new methodologies) thus naturally calls for combining ideas from these many different areas and creating spaces for fruitful interdisciplinary collaborations, a central part of the Simons Institute's mission. Despite the wide variety of optimization problems, they can generally be categorized into two main types: those that are discrete and those that are continuous.

Discrete optimization is generally concerned with problems having a very large but finite set of solutions, and began in earnest during the 1950s and 1960s with the work of pioneers such as George Dantzig and Jack Edmonds on classical combinatorial problems such as the traveling salesman problem – given n cities with prescribed distances, find the shortest tour visiting all the cities, and the matching problem – given n individuals and a set of pairwise compatibilities, find the largest disjoint set of pairings of compatible individuals. The discrete optimization problems studied today are nearly endless in variety and are derived from diverse application domains such as computational biology, game theory, logistics, network design and more. When can such problems be solved efficiently, and if so, how efficiently? If they cannot be solved efficiently, can we efficiently compute good approximate solutions for them? These questions led to the development of the fundamental areas of combinatorial optimization, computational complexity and approximation algorithms, and they remain principal motivating questions today.

Continuous optimization is the older of the two fields, and treats optimization problems for which the variables can take on real values. Deriving its roots from calculus, one is often concerned with minimizing a “well-behaved” (e.g. differentiable) function, where tools such as gradient descent, already introduced by Augustin Cauchy in 1847, can be applied. Building on this theory came the addition of structured constraints (e.g.), which allows us to set the conditions that solutions must satisfy to be meaningful (e.g. don’t exceed capacity). Constrained continuous optimization problems are ubiquitous, and include regression problems, optimal control, shape optimization, minimum cost flow, optimal pricing, and much more. Given a continuum of solutions, one must be generally be content with approximation, and thus algorithms in this field most often measure their efficiency by how many steps are needed to find an -approximate solution. Perhaps the crown jewel of continuous optimization has been the development of convex optimization, which represents a very broad set of problems where “local optimality = global optimality” and provides the largest known class of efficiently solvable problems.

While the above areas are clearly distinct, they share many fundamental connections. For example, it has long been understood that for many discrete problems, the most effective way to solve them is to cleverly approximate them by a convex problem, which can be efficiently solved at the price of adding artificial solutions “in between” the discrete ones, and then “round” the continuous solution back to a discrete one. Vice versa, many continuous problems are endowed with discrete structure, which can come from structured sparsity in the data, network or graph structure in the underlying problem, or more directly if the problem requires both discrete and continuous variables. Despite such connections, for many years the active areas of research within both continuous and discrete optimization were quite separate. Over the last decade, partially motivated by the ever-growing need to solve larger-scale problems, the search for more efficient ways to blend the discrete and continuous viewpoints has led to many recent breakthroughs, including the development of faster interior point methods, new approaches to the traveling salesman problem, nearly linear time solvers for structured linear systems, and more.

The primary goal of this semester’s Simons Institute program on Bridging Continuous and Discrete Optimization was to create an exciting space for the exchange of ideas and techniques between the two branches of optimization. A second goal was to expose the varied group of long-term participants to the sheer number of connections between optimization and wider mathematical areas, from machine learning to high dimensional geometry and probability, and real algebraic geometry. Given the ambitious scope, the program was run alone this semester as a “jumbo” program. The main events this semester consisted of five week-long workshops, each dedicated to an important trend of contemporary research. A new addition this semester was the inclusion of participant-organized day- or half-day-long minisymposia on topics of general interest. These included robot motion planning and control, fairness in automated decision making, topics in spectral graph theory, and the incredible work of Michael B. Cohen. Apart from the formal events, there were also a large number of participant-organized seminars and reading groups, on topics including spectral graph theory, hyperbolic and real stable polynomials, SDP and LP hierarchies, the flow cut gap, discrepancy theory, and interior point methods.

The program was kicked-off by a “boot camp” intended to put the conference participants on the same page by providing introductions to cutting-edge research areas in optimization. Topics included the use of duality in optimization, first order methods, interior point methods, online and robust optimization, SDP and LP hierarchies, and extended formulations. This was a great way to help bridge the differences in background among the conference participants, hailing from diverse areas including computer science, operations research and mathematics.

The second workshop was devoted to progress in using continuous relaxations to solve discrete problems, one of the main hallmarks of approximation algorithms. Some of the research lines that were addressed included: When can we avoid solving expensive semi-definite relaxations? Are there new ways to round classical relaxations for the TSP? How can we develop tighter relaxations for scheduling and routing problems? What can be achieved via generic discrepancy or dependent rounding schemes? How can we use the power of polynomials to approximately count? The problems discussed spanned a representative cross-section of approximation algorithms, and illustrated the exciting range of mathematical areas that were needed to make progress.

The third workshop, on Fast Iterative Methods in Continuous Optimization, was targeted at the need for solving optimization problems of ever-growing scale. Recent drivers of interest in this area include the impressive successes in machine learning related to training large-scale neural networks over massive data-sets, and the flurry of applications and complexity improvements enabled by fast solvers for structured linear systems, the most fundamental primitive in optimization. Much attention was given to cheap iterative methods, often first order or less, able to achieve acceptable levels of accuracy after only a few iterations. Quite a few recent breakthroughs have been made in reducing the error or structural parameter dependence in the iteration count, for methods such as stochastic gradient descent, coordinate descent, accelerated coordinate descent, and so forth. Non-convexity surfaced many times, a feature of many machine learning problems, and questions relating to when non-convexity is a barrier or not to global optimization, or how to converge quickly to local minima, were addressed. New ways of applying dimension reduction strategies to develop fast or low space solvers were explored. Lastly, examinations of how these methods really work and can be implemented in practice were provided.

The fourth workshop, on Hierarchies, Extended Formulations and Matrix-Analytic Techniques, related the recent developments in our understanding of automatic strengthening procedures for continuous relaxations, our ability to get succinct representations for useful or complex convex hulls, and techniques for leveraging the properties of polynomials in optimization. The workshop illustrated a true convergence across separate mathematical disciplines, where the complementary nature of the different lines of research came into plain focus. Strong upper and lower bounds on the size of extended formulations, and analogously, degree bounds for LP and SDP hierarchies for approximating CSPs, planted problems, classical combinatorial problems and more general problems were given. These results leveraged techniques from a variety of areas, including communication and circuit complexity, Fourier analysis, matrix concentration and real algebraic geometry. New rounding algorithms based or inspired by SDP hierarchies were presented, with applications to learning, polynomial optimization and quantum communication. Lastly, novel applications of real stable polynomials to determinantal maximization problems as well as counting problems were explored.

The final workshop on Optimization, Statistics, Machine Learning and Uncertainty explored the key challenges in these areas as well as the many connections between them. A wide variety of machine learning and related optimization problems were discussed during this workshop, from empirical risk minimization to structured bandit problems, adaptive matrix completion, online linear programming, and topic modeling. Methods for dealing with and quantifying uncertainty such as distributionally robust learning, the price of correlation, and martingale difference inequalities were presented. The properties of gradient descent such as implicit self-regularization, and convergence in non-convex settings were explored. Lastly, the avoidance of over-fitting using techniques from differential privacy, and the generic tradeoffs between stability and convergence were considered.

Related Articles