First-Order Stochastic Optimization

Tuesday, August 28th, 2018, 11:00 am–12:00 pm

Add to Calendar

iCalendar
Outlook
Google

Event:

Foundations of Data Science Boot Camp

Speaker:

Rachel Ward, University of Texas at Austin

Location:

Calvin Lab Auditorium

Stochastic Gradient Descent (SGD) is the basic first-order stochastic optimization algorithm behind powerful deep learning architectures that are becoming increasingly omnipresent in society. In this lecture, we motivate the use of stochastic first-order methods and recall some convergence results for SGD. We then discuss the notion of importance sampling for SGD and how it can improve the convergence rate. Finally, we discuss methods for making SGD more "robust" to hyper-parameters of the algorithm, such as the step size, using "on the fly" adaptive step size methods such as AdaGrad, and present some theoretical results.