Gabriel Poesia

A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play (@ Science 2018)

David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap1, Karen Simonyan1, Demis Hassabis


This is the paper that describes AlphaZero: DeepMind's general game playing engine that learns through self-play. Algorithmically, AlphaZero is surprisingly simple, although I'm sure many of the seemingly small details turned out to be important for it to work. At the base of AlphaZero are three components: (i) a neural network that evaluates actions given game states (e.g. assigns a probability distribution to each move in a chess game), (ii) the Monte-Carlo Tree Search algorithm for exploring the space of moves, and (iii) a reinforcement learning training scheme based on self-play. The only domain-specific knowledge that AlphaZero has implicit access to is the game rules, since during MCTS it only considers valid moves. From that, it starts from random self-play and eventually masters the game (after using a significant amount of compute -- 5000 TPUs!).

Recently, I've been working in a project that is sort of an AlphaZero but for educational domains (e.g. solving equations, logic problems, simplifying fractions, etc). The broader question that I've been working on is: given that you have an AI expert in some domain (be it a game or something more traditional like high-school math), how can you use it to teach a human to do the same?