Abstracts

Succinct Data Representations and Applications

Monday, September 16th, 2013

Input-sparsity Time Algorithms for Embeddings and Regression Problems

8:30 am–9:15 am

Speaker: Michael Mahoney, Stanford University

An algorithm runs in "input-sparsity time" if the running time in the RAM model is bounded above by a constant times the size of the input, for arbitrary or worst-case input. Somewhat remarkably, recent work in Randomized Numerical Linear Algebra has shown that fundamental problems such as least-squares and low-rank approximation can be solved to epsilon-accuracy in input-sparsity time. I will describe several related results in constructing low-distortion embeddings for Lp spaces in input-sparsity time, as well as the application of these results to solving problems such as least-squares, least-absolute deviations, and quantile regression. Of particular importance in making these theoretical ideas practical is understanding the tradeoffs between the running time to construct the embedding and the distortion quality of the embedding.

Input-sparsity Time Algorithms for Embeddings and Regression Problems (slides)

Something for Almost Nothing: Advances in Sub-Linear Time Algorithms

9:15 am–10:00 am

Speaker: Ronitt Rubinfeld, Massachusetts Institute of Technology

Linear-time algorithms have long been considered the gold standard of computational efficiency. Indeed, it is hard to imagine doing better than that, since for a nontrivial problem, any algorithm must consider all of the input in order to make a decision. However, as extremely large data sets are pervasive, it is natural to wonder what one can do in {em sub-linear} time. Over the past two decades, several surprising advances have been made on designing such algorithms. We will give a non-exhaustive survey of this emerging area, highlighting recent progress and directions for further research.

Randomized Numerical Linear Algebra

10:30 am–11:15 am

Speaker: Petros Drineas, Rensselaer Polytechnic Institute

The introduction of randomization in the design and analysis of algorithms for matrix computations (such as matrix multiplication, least-squares regression, the Singular Value Decomposition (SVD), etc.) over the last decade provided a new paradigm and a complementary perspective to traditional numerical linear algebra approaches. These novel approaches were motivated by technological developments in many areas of scientific research that permit the automatic generation of large data sets, which are often modeled as matrices.

In this talk we will outline how such approaches can be used to approximate problems ranging from matrix multiplication and the Singular Value Decomposition (SVD) of matrices to approximately solving least-squares problems and systems of linear equations. Application of such methods to data analysis will also be discussed.

Randomized Numerical Linear Algebra (slides)

Simple and Deterministic Matrix Sketches

11:15 am–12:00 pm

Speaker: Edo Liberty, Yahoo! Research

A sketch of a matrix $A$ is another matrix $B$ which is significantly smaller than $A$, but still approximates it well.

Finding such sketches efficiently is an important building block in modern algorithms for approximating, for example, the PCA of massive matrices. This task is made more challenging in the streaming model, where each row of the input matrix can be processed only once and storage is severely limited.

In this paper, we adapt a well known streaming algorithm for approximating item frequencies to the matrix sketching setting. The algorithm receives $n$ rows of a large matrix $A in R^{n imes m}$ one after the other, in a streaming fashion. It maintains a sketch $B in R^ { ell imes m}$ containing only $ell ll n$ rows but still guarantees that $A^TA pprox B^TB$. More accurately,

$o r a l l x, | x | = 10 l e | A x |^{2} - | B x |^{2} l e 2 | A |_{f}^{2} / e l l$

This algorithm's error decays proportionally to $1/ell$ using $O(m ell)$ space. In comparison, random-projection, hashing or sampling based algorithms produce convergence bounds proportional to $1/sqrt{ell}$. Sketch updates per row in $A$ require amortized $O(mell)$ operations and the algorithm is perfectly parallelizable. Our experiments corroborate the algorithm's scalability and improved convergence rate. The presented algorithm also stands out in that it is deterministic, simple to implement, and elementary to prove.

Simple and Deterministic Matrix Sketches (slides)

Optimizing with Tensors

1:30 pm–2:15 pm

Speaker: Ravi Kannan, Microsoft Research India

Constraint Satisfaction Problems (CSP) can be formulated as tensor optimization problems. So also the interesting planted clique problem. Much about tensors is known to be hard or impossible, But the focus of this talk will be the possible. In particular, we show that the spectral norm of any r-dimensional array (for r fixed) can be approximated to additive error $\varepsilon$ times its Frobenius norm and this is sufficient to solve dense instances of MAX-CSP's as well as a more general class.

Joint work with Frieze, de la Vega, Karpinski, Vempala.

Optimizing with Tensors (slides)

Tuesday, September 17th, 2013

Homomorphic Sketches: Shrinking Big Data without Sacrificing Structure

9:00 am–9:45 am

Speaker: Andrew McGregor, University of Massachusetts

Fingerprinting is a widely-used technique for efficiently verifying that two large files are identical. More generally, sketching is a form of lossy compression based on random linear projections that, for example, could be used to estimate the Hamming distance between the files. Both techniques are particularly useful for monitoring data streams or minimizing communication in distributed and parallel computing. Many applications have been studied extensively including sparse signal recovery, locality sensitive hashing, and linear regression.

Our goal is to extend these popular techniques and design random projections that preserve combinatorial and group-theoretic structure in large data sets. In this talk, we present recent results on analyzing spectral graph properties and fingerprinting misaligned text. Both results rely on developing homomorphic sketches that support operations on the compressed data that correspond to operations on the uncompressed data. Applications include the first sublinear space algorithms for processing dynamic graph streams.

Includes joint work with Kook Jin Ahn, Alexandr Andoni, Assaf Goldberger, Sudipto Guha, and Ely Porat.

Homomorphic Sketches: Shrinking Big Data without Sacrificing Structure (slides)

Power to the Points: Local Certificates for Clusterings

9:45 am–10:30 am

Speaker: Suresh Venkatasubramanian, University of Utah

We build long chains of inference based on the results of data mining. But how do we certify the results of these inference so that we can say things like "there's a 51% chance you're a US person?" And more importantly, how can we generate compact certificates to give to users to validate labels being assigned to them?

Formally, suppose I'm given a clustering of a set of points into groups. How do I assign a score (or set of scores) to a point that captures the strength of its assignment to a group? In this talk I'll describe a method that draws on Voronoi-based interpolation techniques to define such a score, and uses epsilon-samples and convex body sampling to compute the score efficiently. I'll then illustrate the use of such a score with example applications.

This is joint work with Parasaran Raman.

Power to the Points: Local Certificates for Clusterings (slides)

How Robust are Linear Sketches to Adaptive Inputs?

11:00 am–11:45 am

Speaker: David Woodruff, IBM Almaden

Linear sketches turn an n-dimensional input into a concise lower-dimensional representation via a linear transformation. In almost any realistic setting a linear sketch faces the possibility that its inputs are correlated with previous evaluations of the sketch. Known techniques no longer guarantee the correctness of the output in the presence of such correlations. We therefore ask: are linear sketches inherently non-robust to adaptively chosen inputs? We give a strong affirmative answer to this question. Specifically, we show that no linear sketch approximates the Euclidean norm of its input to within an arbitrary multiplicative approximation factor on a polynomial number of adaptively chosen inputs. The result remains true even if the dimension of the sketch is d = n - o(n) and the sketch is given unbounded computation time. Our result is based on an algorithm with running time polynomial in d that adaptively finds a distribution over inputs on which the sketch is incorrect with constant probability. Our result implies several corollaries for related problems including lp-norm estimation and l2/l2 compressed sensing in the presence of adaptively chosen inputs.

Joint work with Moritz Hardt.

How Robust are Linear Sketches to Adaptive Inputs? (slides)

Nimble Algorithms for Cloud Computing

11:45 am–12:30 pm

Speaker: Santosh Vempala, Georgia Institute of Technology

Cloud computing suggests a model where communication between storage/computation devices is a bottleneck resource. An algorithm is nimble if it uses only sublinear (ideally polylogarithmic) communication and polynomial time and space. We expand on the model proposed by Cormode and Muthukrishnan and develop nimble algorithms for computing frequency moments, counting homomorphisms, low-rank approximation and k-means clustering.

This is joint work with Ravi Kannan and David Woodruff.

Nimble Algorithms for Cloud Computing (slides)

Sampling for Subset Selection and Applications

2:00 pm–2:45 pm

Speaker: Amit Deshpande, Microsoft Research India

Row/column sampling techniques were initially introduced to compute low-rank matrix approximation faster than the computation of SVD. In this talk, I will give a quick overview of squared-length sampling, adaptive sampling, volume sampling, relevance-score sampling etc., survey their connections to other topics such as determinantal point processes, rounding Lasserre SDP solutions, graph sparsification, convex geometry, and discuss interesting open problems in these.

Sampling for Subset Selection and Applications (slides)

Spectral Sparsification of Graphs

2:45 pm–3:30 pm

Speaker: Nikhil Srivastava, Microsoft Research India

Many important properties of an undirected graph manifest themselves spectrally in the eigenvalues or quadratic forms of matrices related to the graph. For instance, the connectivity structure, electrical properties, and random walk behavior of a graph are determined by its Laplacian matrix. A spectral sparsifier of a graph G is a sparse graph H on the same set of vertices such that the Laplacians of H and G are close, so that H captures the spectral behavior of G while being much cheaper to store and perform computations on. We survey a line of work showing that a nearly linear size spectral sparsifier exists for every graph and can be computed in the one pass streaming model with small update time per edge, and discuss possible improvements which could make this algorithm more useful in practice.

Fast Testing of Graph Properties

4:00 pm–4:45 pm

Speaker: Artur Czumaj, University of Warwick

We review recent results for sublinear-time testing of fundamental graph properties in sparse graphs.

We consider the problem in which an input graph G is represented as an adjacency matrix and we're asking a question whether G has a given property P or is "far" from any graph satisfying P. The central question in this area is which properties can be tested very quickly, in particular, in constant-time, independently of the size of the input graph.

We will present recent advances in this area, focusing on planar graphs and graphs with poor expansion properties.

This is talk is based on joint papers with Christian Sohler and others.

Fast Testing of Graph Properties (slides)

Inferring Structural Properties: Graphs and Bridges

4:45 pm–5:30 pm

Speaker: Anna Gilbert, University of Michigan

I will discuss two recent projects that strive to measure less but infer some or more about buildings/bridges and graphs. In the first project, our goal is to collect data about buildings and bridges for efficient, effective structural health monitoring. In the second project, our goal is to collect boundary measurements on a graph and to infer its "material" properties, using a mathematical generalization of optical tomography.

Inferring Structural Properties: Graphs and Bridges (slides)

Wednesday, September 18th, 2013

Convex Relaxations for Permutation Problems

9:00 am–9:45 am

Speaker: Alexandre d'Aspremont, CNRS - ENS Paris

Seriation seeks to reconstruct a linear order between variables using unsorted similarity information. It has direct applications in archeology and shotgun gene sequencing for example. We prove the equivalence between the seriation and the combinatorial 2-sum problem (a quadratic minimization problem over permutations) over a class of similarity matrices. The seriation problem can be solved exactly by a spectral algorithm in the noiseless case and we produce a convex relaxation for the 2-sum problem to improve the robustness of solutions in a noisy setting. This relaxation also allows us to impose additional structural constraints on the solution, to solve semi-supervised seriation problems. We present numerical experiments on archeological data, Markov chains and gene sequences.

Convex Relaxations for Permutation Problems (slides)

Computational and Statistical Tradeoffs via Convex Relaxation

9:45 am–10:30 am

Speaker: Venkat Chandrasekaran, California Institute of Technology

Modern massive datasets create a fundamental problem at the intersection of the computational and statistical sciences: how to provide guarantees on the quality of statistical inference given bounds on computational resources such as time or space. Our approach to this problem is to define a notion of “algorithmic weakening,” in which a hierarchy of algorithms is ordered by both computational efﬁciency and statistical efﬁciency, allowing the growing strength of the data at scale to be traded off against the need for sophisticated processing. We illustrate this approach in the setting of denoising problems, using convex relaxation as the core inferential tool. Hierarchies of convex relaxations have been widely used in theoretical computer science to yield tractable approximation algorithms to many computationally intractable tasks. In this talk we show how to endow such hierarchies with a statistical characterization and thereby obtain concrete tradeoffs relating algorithmic runtime to amount of data.

Joint work with Michael Jordan.

Memory Limited, Streaming PCA

11:00 am–11:45 am

Speaker: Constantine Caramanis, University of Texas, Austin

We consider streaming, one-pass principal component analysis (PCA), in the high-dimensional regime, with limited memory. Here, $p$-dimensional samples are presented sequentially, and the goal is to produce the $k$-dimensional subspace that best approximates these points. Standard algorithms require $O(p^2)$ memory; meanwhile no algorithm can do better than $O(kp)$ memory, since this is what the output itself requires. Memory (or storage) complexity is most meaningful when understood in the context of computational and sample complexity. Sample complexity for high-dimensional PCA is typically studied in the setting of the {em spiked covariance model}, where $p$-dimensional points are generated from a population covariance equal to the identity (white noise) plus a low-dimensional perturbation (the spike) which is the signal to be recovered. It is now well-understood that the spi ke can be recovered when the number of samples, $n$, scales proportionally with the dimension, $p$. Yet, all algorithms that provably achieve this, have memory complexity $O(p^2)$. Meanwhile, algorithms with memory-complexity $O(kp)$ do not have provable bounds on sample complexity comparable to $p$. We present an algorithm that achieves both: it uses $O(kp)$ memory (meaning storage of any kind) and is able to compute the $k$-dimensional spike with $O(p log p)$ sample-complexity.

Memory Limited, Streaming PCA (slides)

Noisy Estimation of Simultaneously Structured Models

11:45 am–12:30 pm

Speaker: Maryam Fazel, University of Washington

Models or signals exhibiting low dimensional behavior (e.g., sparse signals) play an important role in signal processing and machine learning. In this talk, we focus on models that have multiple structures simultaneously; e.g., matrices that are both low rank and sparse, arising in phase retrieval, quadratic compressed sensing, and sparse PCA. We consider estimating such models from observations corrupted by additive Gaussian noise.

We provide tight upper and lower bounds on the mean squared error (MSE) of a convex denoising program that uses a combination of regularizers. In the case of low rank and sparse matrices, we quantify the gap between the MSE of the convex program and the best achievable error, and present a simple (nonconvex) thresholding algorithm that outperforms its convex counterpart and achieves almost optimal MSE.

A Dual Approach to Sparse Optimization

2:00 pm–2:45 pm

Speaker: Michael Friedlander, University of British Columbia

A feature common to many sparse optimization problems is that the number of variables may be significantly larger than the number of constraints—e.g., the standard matrix-lifting approach for binary optimization results in a problem where the number of variables is quadratic in the number of constraints. We consider a duality framework applicable to a wide range of nonsmooth sparse optimization problems that allows us to leverage the relatively small number of constraints. Preliminary numerical results illustrate our approach and its flexibility.

Min-Wise Hashing for Large-Scale Regression and Classification

2:45 pm–3:30 pm

Speaker: Nicolai Meinshausen, University of Oxford and ETH Zürich

We study large-scale regression analysis where both the number of variables, p, and the number of observations, n, may be large and in the order of millions or more. This is very different from the now well-studied high-dimensional regression context of "large p, small n". For example, in our "large p, large n" setting, an ordinary least squares estimator may be inappropriate for computational, rather than statistical, reasons. In general one must seek a compromise between statistical and computational efficiency. Furthermore, in contrast to the common assumption of signal sparsity for high dimensional data, here it is the design matrices which are typically sparse in applications. Our approach for dealing with this large, sparse data is based on b-bit min-wise hashing (Li and Koenig, 2011). This can be viewed as a dimensionality reduction technique for a sparse binary design matrix; our variant, which we call min-wise hash random-sign mapping (MRS mapping) also handles the real-valued case, and is more amenable to statistical analysis. For both linear and the logistic models, we give finite-sample bounds on the prediction error of procedures which perform regression in the new lower-dimensional space after applying MRS mapping. In particular, we show that the average prediction error goes to 0 asymptotically as long as q*l(b)^2/n goes to 0 for n to infinity, where q is the maximal number of non-zero entries in each row of the design matrix and l(b) is the l2-norm of an optimal coefficient for the linear predictor. We also show that ordinary least squares or ridge regression applied to the reduced data can actually allow us to capture interactions in the original data. In addition to the results on prediction, we show how one can recover measures of the importance of the original variables. A simulation study and some applications help illustrate the circumstances under which we can expect MRS mapping to perform well.

Min-Wise Hashing for Large-Scale Regression and Classification (slides)

Variable Selection with Error Control: Another Look at Stability Selection

4:00 pm–4:45 pm

Speaker: Richard Samworth, University of Cambridge

Stability selection was recently introduced by Meinshausen and Buhlmann as a very general technique designed to improve the performance of a variable selection algorithm. It is based on aggregating the results of applying a selection procedure to subsamples of the data. We introduce a variant, called Complementary Pairs Stability Selection, and derive bounds both on the expected number of variables included by complementary pairs stability selection that have low selection probability under the original procedure, and on the expected number of high selection probability variables that are excluded. These results require no (e.g. exchangeability) assumptions on the underlying model or on the quality of the original selection procedure. Under reasonable shape restrictions, the bounds can be further tightened, yielding improved error control, and therefore increasing the applicability of the methodology.

Variable Selection with Error Control: Another Look at Stability Selection (slides)

Mixed and Covariate-Dependent Graphical Models

4:45 pm–5:30 pm

Speaker: Liza Levina, University of Michigan

Graphical models are a popular tool for understanding dependency structure of multivariate data, and sparse graphical models can provide an informative and interpretative summary of high-dimensional data. The commonly used graphical models are usually one of two types: Gaussian graphical models (for continuous data) and the Ising models or Markov networks (for binary and discrete data). However, in practice both types of variables are frequently present in the same dataset, creating the need for mixed graphical models. Some models for these were developed in the earlier literature, but none of them scale to high dimensions. We propose a novel graphical model for mixed data, which is simple enough to be suitable for high-dimensional data, yet flexible enough to represent all possible graph structures, and develop a computationally efficient algorithm for fitting the model. We will also discuss another extension of the graphical model that allows the graph to depend on additional covariates, and apply it to data on genetic instability in tumor samples.

Mixed and Covariate-Dependent Graphical Models (slides)

Thursday, September 19th, 2013

Succinct Representations and Efficient Algorithms Based on Linear Programming

8:30 am–9:15 am

Speaker: Sudipto Guha, University of Pennsylvania

While there has been significant strides made in developing small space algorithms in many constrained settings generally referred to massive data algorithms, a natural and ever present question remains: Can these techniques focused on constrained representations allow us to solve problems which we did not know how to solve in an unconstrained setting earlier?

In this talk we investigate classic variants of the Maximum Matching problem. We show that using ideas from succinct representations and linear programming, we can provide near optimal approximation schemed for several variants in near linear time which we did not know before. These ideas also improve the number of passes required by popular algorithmic frameworks such as MapReduce or Semi-Streaming for computing near optimal solutions to the maximum matching problem (as well as variants).

Dimensionality Reduction Via Sparse Matrices

9:15 am–10:00 am

Speaker: Jelani Nelson, Harvard University

This talk will discuss sparse Johnson-Lindenstrauss transforms, i.e. sparse linear maps into much lower dimension which preserve the Euclidean geometry of a set of vectors. Both upper and lower bounds will be presented, as well as applications to certain domains such as numerical linear algebra.

Dimensionality Reduction Via Sparse Matrices (slides)

Probabilistic Hashing Methods for Succinct Data Representations and Applications

10:30 am–11:15 am

Speaker: Ping Li, Rutgers University

The talk will cover a variety of probabilistic hashing methods for compact data representations and their applications in search and learning with massive, high-dimensional, and possibly dynamic streaming data. For example, fitting logistic regression (or SVM) with billion or billion square variables will be challenging and highly useful in the context of search; and the recent development of b-bit minwise hashing and one permutation hashing will be highly effective for this type of applications. Another exciting example is sparse recovery (compressed sensing). This talk will also present new hashing algorithms for accomplishing efficient signal recovery.

Probabilistic Hashing Methods for Succinct Data Representations and Applications (slides)

Optimal Detection of Sparse Signal Segments

11:15 am–12:00 pm

Speaker: Ming Yuan, University of Wisconsin-Madison

In many applications involving high dimensional data, we are interested in identifying sparse segments of signal in a noisy background. The signals can be copy number variation, or objects in video surveillance among others. We study the relationship between the detectability of a signal, and its shape as well as duration; and the implications of such a relationship on detection strategies.

Two Concise Representations for High-Dimensional Visual Data

1:30 pm–2:15 pm

Speaker: John Wright, Columbia University

Succinct data models have been widely used for imaging tasks such as recognition, denoising and superresolution. Their success depends crucially on the ability to learn models that accurately reflect the physical and statistical structure of the processes that generate the data. We describe two situations in which it is possible to do this, with provable performance guarantees. In the context of visual object recognition, we describe concise models which provably identify objects under arbitrary distant illumination. Our results show how to build a model that provably accepts any image that is sufficiently close to an image of the object of interest, and reject any that is not. We prove bounds that relate the complexity of the representation to the complexity of the object to be recognized, as measured by its defect from convexity. Our proofs and practical algorithms leverage t he the structure in the data—in particular, the existence of sparse and low-rank decompositions for the direct illumination operator. For other visual data problems such as denoising or superresolution, the ability to learn sparse models from data appears to be crucial to achieving good practical performance. For this problem, there is well-developed practice, but theoretical guarantees for the efficient algorithms are difficult to obtain, due to the bilinear nature of the problem. Here, again, we describe one situation in which mathematical guarantees are possible—namely, if the target dictionary is square and nonsingular, and the coefficients satisfy an appropriate random model. In this case, a simple heuristic based on convex programming guarantees to efficiently recover the target coefficients and dictionary.

This talk describes joint work with Yuqian Zhang (Columbia University), Cun Mu (Columbia University), Han-Wen Kuo (Columbia University), Dan Spielman (Yale University) and Huan Wang (Yahoo! Research).

Structured Sparsity-Inducing Norms Through Submodular Functions

2:15 pm–3:00 pm

Speaker: Francis Bach, INRIA and ENS Paris

Sparse methods for supervised learning aim at ﬁnding good linear predictors from as few variables as possible, i.e., with small cardinality of their supports. This combinatorial selection problem is often turned into a convex optimization problem by replacing the cardinality function by its convex envelope (tightest convex lower bound), in this case the L1-norm. In this work, we investigate more general set-functions than the cardinality, that may incorporate prior knowledge or structural constraints which are common in many applications: namely, we show that for nondecreasing submodular set-functions, the corresponding convex envelope can be obtained from its Lovasz extension, a common tool in submodular analysis. This deﬁnes a family of polyhedral norms, for which we provide generic algorithmic tools (subgradients and proximal operators) and theoretical results (conditions for support recovery or high-dimensional inference). By selecting speciﬁc submodular functions, we can give a new interpretation to known norms, such as those based on rank-statistics or grouped norms with potentially overlapping groups; we also deﬁne new norms, in particular ones that can be used as non-factorial priors for supervised learning.

Structured Sparsity-Inducing Norms Through Submodular Functions (slides)

Going Off the Grid

3:30 pm–4:15 pm

Speaker: Benjamin Recht, UC Berkeley

We often model signals from the physical world with continuous parameterizations. Unfortunately, continuous models pose problems for the tools of sparse approximation, and popular discrete approximations are fraught with theoretical and algorithmic issues. In this talk, I will propose a general, convex-optimization framework—called atomic-norm denoising—that obviates discretization and gridding by generalizing sparse approximation to continuous dictionaries.

As an extended example that highlights the salient features of the atomic-norm framework, I will highlight the problem of estimating the frequencies and phases of a mixture of complex exponentials from noisy, incomplete data. I will demonstrate that atomic-norm denoising outperforms state-of-the-art spectral and sparse-approximation methods in both theory and practice. I will then close with a discussion of additional applications of the atomic-norm framework including deconvolution, deblurring, and system identification.

Joint work with Badri Bhaskar, Parikshit Shah, and Gongguo Tang.

Going Off the Grid (slides)