Spring 2022

Learning & Games Reading Group: Equilibrium Computation and Machine Learning

Tuesday, February 8th, 2022, 11:00 am12:30 pm

Add to Calendar

Parent Program: 

Dimitris Tsipras (Stanford) and Sir. Marc Lanctot (DeepMind)


Calvin Lab Room 116

The goal of our first meeting is to dig deeper and discuss the applications of Min-Max Optimization and Game Equilibrium in practice.

Title: Mastering the game of Go with deep neural networks and tree search [zoom]

Abstract: The game of Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves. In this talk I will cover AlphaGo, an approach to computer Go that uses ‘value networks’ to evaluate board positions and ‘policy networks’ to select moves. These deep neural networks are trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play. Without any lookahead search, the neural networks play Go at the level of state-of-the-art Monte Carlo tree search programs that simulate thousands of random games of self-play. We also introduce a new search algorithm that combines Monte Carlo simulation with value and policy networks. Using this search algorithm, our program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0, and later to defeat a previous Go professional Lee Sedol. This is the first time that a computer program has defeated a human professional player in the full-sized game of Go. Toward the end of the talk, I will give a short survey of what has happened since this landmark event in AI, and some open challenges for continued work in this area.

Bio: Marc Lanctot is a research scientist at DeepMind. Previously, he was a post-doctoral researcher at the Maastricht University Games and AI Group, working with Mark Winands. During his PhD, he worked at University of Alberta with Michael Bowling on sampling algorithms for equilibrium computation and decision-making in games. Before his PhD, he did an undergrad and Master's at McGill University's School of Computer Science and Games Research @ McGill, under the supervision of Clark Verbrugge. He is interested in general multiagent learning (and planning), computational game theory, reinforcement learning, and game-tree search. 

Paper connected with this talk:  A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. [paper] [preprint pdf] [blog post]

Title: Thinking about the worst (in machine learning) [in-person]

Abstract: While modern machine learning systems can succeed on challenging benchmarks, they are quite brittle: their performance significantly degrades when exposed to even small variations of their training environments. How can we build ML models that are more robust? In this talk, I will present a framework for reasoning about the worst-case behavior of our machine learning systems. I will then describe how this framework can be used in practice to train models that are invariant to a broad family of worst-case input perturbations. Finally, I will discuss how such robust learning can be fundamentally different from standard learning, presenting a unique set of challenges and opportunities.

Bio: Dimitris Tsipras is a postdoctoral researcher at Stanford, advised by Percy Liang and Greg Valiant.  He obtained his PhD from MIT, where he was advised by Aleksander Madry. His work focuses on understanding and improving the reliability of modern machine learning methods. 

Papers connected with this talk: