Talks

Non-Parametric Convergence Rates for Plain Vanilla Stochastic Gradient Descent

Monday, December 6th, 2021, 1:00 pm–1:15 pm

Add to Calendar

iCalendar
Outlook
Google

Event:

Deep Learning Theory Symposium

Speaker:

Rapha ël Berthier (École polytechnique fédérale de Lausanne)

Location:

Calvin Lab Auditorium

Most theoretical guarantees for stochastic gradient descent (SGD) assume that the iterates are averaged, that the stepsizes are decreasing, and/or that the objective is regularized. However, practice shows that these tricks are less necessary than theoretically expected. I will present an analysis of SGD that uses none of these tricks: we analyze the behavior of the last iterate of fixed step-size, non-regularized SGD. Our results apply for kernel regression, i.e., infinite-dimensional linear regression. As a special case, we analyze an online algorithm for estimating a real function on the unit interval from the observation of its value at randomly sampled points.