![](https://old.simons.berkeley.edu/sites/default/files/styles/workshop_main/public/theory_of_reinforcement_learning_0.png?itok=2qgh3N76)
Optimizing Average Reward MDPs with a Generative Model
Aaron Sidford (Stanford University)
Calvin Lab Auditorium
Markov Decision Processes (MDPs) are a fundamental mathematical model to reason about uncertainty and play a key role in reinforcement learning theory. In this talk, I will discuss recent advances in optimizing MDPs given by a generative model. Though near-optimal sample complexities are known for approximately optimizing MDPs with discounted reward functions, obtaining a similar characterization of the average-reward functions has been elusive. In this talk, I will discuss how to improve the sample complexity for this problem using convex optimization tools and how to obtain near-optimal sample complexities in certain regimes by reduction to solving discounted MDPs. This talk will feature joint work with Yujia Jin (arxiv:2008.12776 and arXiv:2106.07046).
Attachment | Size |
---|---|
![]() | 2.55 MB |