Data-Driven Decision Processes Seminar
Kuang Xu (Stanford)
Room 116
Title: Nonstationary Bandit Learning via Predictive Sampling
Abstract: I will discuss designing algorithms for non-stationary multi-armed bandits. Taking the popular Thompson sampling algorithm as a starting point, I will illustrate why conventional thinking centered around estimating mean rewards omits a crucial aspect of active learning in a non-stationary environment, by failing to account for the future value of learned information. We propose the predictive sampling algorithm, which appropriately deprioritizes exploration for information that does not offer long-lasting benefits, and discuss both analytical and numerical results that show how the algorithm could outperform existing approaches in different settings. Joint work with Yueyang Liu and Benjamin Van Roy from Stanford University.
Bio: Kuang Xu is an Associate Professor at the Stanford Graduate School of Business, and Associate Professor by courtesy at Stanford Electrical Engineering. His research focuses on principles for decision-making in a stochastic system, with applications to operations, experimentation and logistics. He has received a First Place in the INFORMS George E. Nicholson Student Paper Competition, a Best Paper Award as well as Outstanding Student Paper Award at ACM SIGMETRICS, and an ACM SIGMETRICS Rising Star Research Award. He currently serves as an Associate Editor for Operations Research and Management Science. Outside of academia, he serves as the Chief Data Science Advisor for Shipt Inc.