Policy Gradients in General-Sum Dynamic Games: When Do They Even Converge?

Monday, May 2nd, 2022, 2:30 pm–3:00 pm

Add to Calendar

iCalendar
Outlook
Google

Event:

Multi-Agent Reinforcement Learning and Bandit Learning

Speaker:

Eric Mazumdar (Caltech)

Location:

Calvin Lab Auditorium

In this talk I will present work showing that agents using simple policy gradient algorithms in arguably the simplest class of continuous action- and state-space multi-agent control problem: general-sum linear quadratic games, have no guarantees of asymptotic convergence, and that proximal point and extra-gradients will not solve these issues. I will then focus in on zero-sum LQ games in which stronger convergence guarantees are possible when agents use independent policy gradients with a finite timescale separation.