This is the paper that describes AlphaZero: DeepMind's general game playing engine that learns through self-play. Algorithmically, AlphaZero is surprisingly simple, although I'm sure many of the seemingly small details turned out to be important for it to work. At the base of AlphaZero are three components: (i) a neural network that evaluates actions given game states (e.g. assigns a probability distribution to each move in a chess game), (ii) the Monte-Carlo Tree Search algorithm for exploring the space of moves, and (iii) a reinforcement learning training scheme based on self-play. The only domain-specific knowledge that AlphaZero has implicit access to is the game rules, since during MCTS it only considers valid moves. From that, it starts from random self-play and eventually masters the game (after using a significant amount of compute -- 5000 TPUs!).
Recently, I've been working in a project that is sort of an AlphaZero but for educational domains (e.g. solving equations, logic problems, simplifying fractions, etc). The broader question that I've been working on is: given that you have an AI expert in some domain (be it a game or something more traditional like high-school math), how can you use it to teach a human to do the same?