Will Vanishing Gradients Ever Vanish from Deep Learning?

Friday, October 6th, 2017, 9:30 am–10:15 am

Add to Calendar

iCalendar
Outlook
Google

Event:

Fast Iterative Methods in Optimization

Speaker:

Moritz Hardt, UC Berkeley

Much empirical work in deep learning has gone into avoiding vanishing gradients, a necessary condition for the success of stochastic gradient methods. This raises the question of whether we can provably rule out vanishing gradients for some expressive model architectures? I will point out several obstacles, as well as positive results for some simplified architectures, specifically, linearized residual networks, and linear dynamical systems.

Based on joint works with Ma and Recht.

Download Video [2.44 GB .mp4]