Deep learning models are often successfully trained using gradient descent, despite the worst case hardness of the underlying non-convex optimization problem. The key question is then under what conditions can one prove that optimization will succeed. Here we provide, for the first time, a result of this kind for a one hidden layer ConvNet with no overlap and ReLU activation. For this architecture we show that learning is hard in the general case, but that when the input distribution is Gaussian, gradient descent converges to the global optimum in polynomial time. I will additionally discuss an alternative approach to sidestepping the complexity of deep learning optimization using improper learning.