CCSI/GMOS Student Seminar
Room 116
Speaker 1: Eren Kizildag
Title: Points of Shallow Neural Networks with Quadratic Activation Function
Abstract: We consider the teacher-student setting of learning shallow neural networks with quadratic activations and planted weight matrix W∗∈ℝm×d, where m is the width of the hidden layer and d≤m is the data dimension. We study the optimization landscape associated with the empirical and the population squared risk of the problem. Under the assumption the planted weights are full-rank we obtain the following results. First, we establish that the landscape of the empirical risk admits an "energy barrier" separating rank-deficient W from W∗: if W is rank deficient, then its risk is bounded away from zero by an amount we quantify. We then couple this result by showing that, assuming number N of samples grows at least like a polynomial function of d, all full-rank approximate stationary points of the empirical risk are nearly global optimum. These two results allow us to prove that gradient descent, when initialized below the energy barrier, approximately minimizes the empirical risk and recovers the planted weights in polynomial-time. Next, we show that initializing below this barrier is in fact easily achieved when the weights are randomly generated under relatively weak assumptions. We show that provided the network is sufficiently overparameterized, initializing with an appropriate multiple of the identity suffices to obtain a risk below the energy barrier. At a technical level, the last result is a consequence of the semicircle law for the Wishart ensemble and could be of independent interest. Finally, we study the minimizers of the empirical risk and identify a simple necessary and sufficient geometric condition on the training data under which any minimizer has necessarily zero generalization error. We show that as soon as N≥N∗=d(d+1)/2, randomly generated data enjoys this geometric condition almost surely, while that ceases to be true if N is less than N∗.
Joint work with David Gamarnik and Ilias Zadik. Link to paper: https://arxiv.org/abs/1912.01599
Speaker 2: Jeff Xu
Title: Sum of Squares Lower Bounds for Sparse Independent Set
The Sum-of-Squares (SoS) hierarchy of semidefinite programs is a powerful algorithmic paradigm which captures state-of-the-art algorithmic guarantees for a wide array of problems. In the average case setting, SoS lower bounds provide strong evidence of algorithmic hardness or information-computation gaps. Prior to this work, SoS lower bounds have been obtained for problems in the ``dense" input regime, where the input is a collection of independent Rademacher or Gaussian random variables, while the sparse regime has remained out of reach. We make the first progress in this direction by obtaining strong SoS lower bounds for the problem of Independent Set on sparse random graphs.
We prove that with high probability over an \Erdos-R{\'e}nyi random graph $G\sim G_{n,\frac{d}{n}}$ with average degree $d>\log^2 n$, degree-$\dsos$ SoS fails to refute the existence of an independent set of size $k = \Omega\left(\frac{n}{\sqrt{d}(\log n)(\dsos)^{c_0}} \right)$ in $G$ (where $c_0$ is an absolute constant), whereas the true size of the largest independent set in $G$ is $O\left(\frac{n\log d}{d}\right)$.
Joint work with Chris Jones, Aaron Potechin, Goutham Rajendran and Madhur Tulsiani. https://arxiv.org/abs/2111.09250