Thursday, April 21st, 2016

Research Vignette: Strengthening SETH Lower Bounds

by Amir Abboud

A central goal of complexity theory is to understand and prove lower bounds for the time complexity of fundamental problems. One of the most important computational problems is Edit Distance, the problem of computing the minimum number of edit operations (insertions, deletions, and substitutions) required to transform one sequence into another. A classical dynamic programming algorithm that is taught in basic algorithms courses solves Edit Distance on sequences of length n in O(n2) time. Generalizations of Edit Distance are of fundamental importance in computational biology and genomics, where n is typically a few billions, and this quadratic runtime is prohibitive. A faster, e.g. linear-time, algorithm would have far reaching consequences: the paper introducing BLAST, a heuristic algorithm for a generalization of Edit Distance, that runs in linear time but is not guaranteed to return an optimal solution, has been cited more than fifty thousand times. Despite decades of attempts, no upper bound below the O(n2/log2n) bound of Masek and Paterson is known for Edit Distance.

This is a situation where lower bounds are highly desirable, but unfortunately, the current state of the art in complexity is far from providing a lower bound that is close to quadratic for any natural problem in NP, let alone Edit Distance. Therefore, researchers have turned their attention to conditional lower bounds, and a recent celebrated result by Backurs and Indyk shows that Edit Distance cannot be solved in truly subquadratic O(n2−ε) time, for some ε>0, unless the Strong Exponential Time Hypothesis (SETH) is false (this is among the few STOC papers that made it to the Boston Globe news website). SETH is a pessimistic version of PNP, which essentially states that there is no ε>0 such that SAT on CNF formulas on n variables and O(n) clauses can be solved in O((2−ε)n) time.

Other interesting recent results show that under SETH, the current algorithms for many central problems in diverse areas of computer science are optimal, up to no(1) factors. These areas include pattern matching, graph algorithms, computational geometry, data mining, and economics, and the list is growing by the day. These conditional lower bounds are a part of a more general line of work in which one bases the hardness of important problems in P on well-known conjectures about the exact complexity of other famous problems. Other conjectures concern the exact complexity of 3-SUM (given n integers, do three of them sum to zero?) and All-Pairs-Shortest-Path (compute all pairwise distances in a weighted graph). This line of work was the focus of the third workshop in the Fall 2015 Simons Institute Program on Fine-Grained Complexity and Algorithm Design, entitled Computational Complexity of Low-Polynomial Time Problems. It seems that we are rapidly approaching an exciting state of affairs in which a tight lower bound for many problems of practical significance can be derived from a small set of such conjectures.

One of the major goals of this research is to strengthen the foundations on which these results are based. We do not have strong evidence supporting these conjectures, and perhaps the main reason to believe them is the fact that no one knows how to refute them. For example, SETH was introduced by Impagliazzo and Paturi as a plausible explanation for the lack of (2−ε)n algorithms for CNF-SAT, despite its being one of the most extensively studied problems in computer science. Can we find more evidence for these conjectures? Can we show surprising consequences of refuting them? Alternatively, can we replace them with more believable ones?

In the rest of this Vignette I will discuss a recent paper with Thomas Dueholm Hansen, Virginia Vassilevska Williams, and Ryan Williams, in which we show that many of the SETH lower bounds can be based on an assumption that is much more believable than SETH.

A Hierarchy of SAT Problems

A weakness of SETH as a hardness hypothesis is that it is an assumption about CNF-SAT, as opposed to a more general SAT problem. Consider a problem in which you are given some representation of a function B on n input bits and are asked whether B is satisfiable. If B is treated as a black-box that we only have input/output access to, then any algorithm will need to spend Ω(2n) time in the worst case. Of course, a clever algorithm should attempt to analyze B in order to decide satisfiability in o(2n) time. Whether this is possible depends on how complex and obfuscating the representation is. There is a whole spectrum of increasing complexity of representations. For each class C we can define the corresponding C-SETH, stating that SAT on functions from C cannot be solved in (2−ε)n time. For example, NC-SETH would be the assumption that Circuit-SAT on polynomial size polylog depth circuits (NC circuits) cannot be solved in (2−ε)n time. It is believed that NC circuits are capable of implementing cryptographic primitives like One Way Functions, for which the ability to hide satisfiability is essential. In sharp contrast, the original SETH is equivalent to the assumption that even representations that are very low on this spectrum, namely CNF formulas, are enough to obfuscate satisfiability.

As our class C gets more complex and rich, the C-SETH becomes more credible and appealing as a basis for conditional lower bounds than SETH. However, all previous SETH lower bound proofs relied heavily on the simple nature of CNFs. In our paper, we prove the first lower bounds under the much more reliable C-SETH, for classes C that are far more expressive than CNFs, like NC circuits. Our main result is a new efficient reduction from SAT on powerful computational models like NC circuits to Edit Distance, and many other important problems in P (e.g. computing the longest common subsequence of two binary strings). Thus, we are able to replace SETH with NC-SETH, and derive far more remarkable consequences from truly subquadratic algorithms for Edit Distance.

The Circuit Lower Bounds Barrier for Algorithm Design

Finding functions that provably cannot be computed by efficient circuits, i.e. proving “circuit lower bounds," is widely considered a hopelessly difficult task. Designing faster algorithms, on the other hand, is traditionally thought of as a more intuitive task. Surprising results in complexity theory show that faster algorithms for problems like Circuit-SAT imply new circuit lower bounds (this connection was the focus of the first workshop in the Fine-Grained Complexity and Algorithm Design program, Connections Between Algorithm Design and Complexity Theory). Optimists interpret such results as progress towards proving circuit lower bounds: all we have to do is prove upper bounds. Pessimists look at them as a barrier: designing faster algorithms for some problems is at least as difficult as proving circuit lower bounds.

Our work takes this connection one step further. A corollary of our work is that breakthrough circuit lower bounds would follow even from algorithms for Edit Distance and many other natural problems in P that algorithm designers have been studying for years. Moreover, we show that our reduction allows us to derive consequences even from mildly subquadratic algorithms, a feature that did not exist in the previous conditional lower bounds in P.

A Polylog Shaved is a Lower Bound Made

Given the status of Edit Distance and LCS as core computer science problems, any asymptotic improvement over the longstanding O(n2/log2n) upper bound is highly desirable. Recently, an algorithmic technique called the polynomial method has allowed for superpolylogarithmic shavings for other core problems like All Pairs Shortest Path. A natural open question is whether this technique can lead to an n2/logω(1)n algorithm for Edit Distance as well. We show that such an algorithm would immediately prove a major consequence (that the class NEXP does not have non-uniform log-depth circuits). A more fine-grained examination shows that even shaving a logcn factor, for a specific constant c ≤ 103, already implies new circuit lower bounds.

One striking interpretation of these results is that when an undergraduate student learns the simple dynamic programming algorithms for Edit Distance or Longest Common Subsequence and wonders whether there is a faster algorithm, he or she is implicitly trying to resolve very difficult open questions in complexity theory.

References

  1. Amir Abboud, Thomas Dueholm Hansen, Virginia Vassilevska Williams, Ryan Williams. "Simulating Branching Programs with Edit Distance and Friends Or: A Polylog Shaved is a Lower Bound Made." STOC '16, to appear.

Related Articles:

Letter from the Director, Spring 2016
From the Inside: Counting Complexity and Phase Transitions
From the Inside: Algorithmic Challenges in Genomics
Research Vignette: Simplicity, Optimality and Efficiency Tradeoffs in Economics
Looking Ahead: Logic and Uncertainty