Optimizing Average Reward MDPs with a Generative Model

Monday, November 15th, 2021, 9:30 am–10:00 am

Add to Calendar

Event:

Theory of Reinforcement Learning Reunion

Speaker:

Aaron Sidford (Stanford University)

Location:

Calvin Lab Auditorium

Markov Decision Processes (MDPs) are a fundamental mathematical model to reason about uncertainty and play a key role in reinforcement learning theory. In this talk, I will discuss recent advances in optimizing MDPs given by a generative model. Though near-optimal sample complexities are known for approximately optimizing MDPs with discounted reward functions, obtaining a similar characterization of the average-reward functions has been elusive. In this talk, I will discuss how to improve the sample complexity for this problem using convex optimization tools and how to obtain near-optimal sample complexities in certain regimes by reduction to solving discounted MDPs. This talk will feature joint work with Yujia Jin (arxiv:2008.12776 and arXiv:2106.07046).

Attachment	Size
Slides	2.55 MB