AI has made quite remarkable progress in general game playing in the last decade, with AlphaZero being a prime example. Given a proper game -- a definition of possible states, a finite set of valid moves, and a notion of rewards --, modern deep reinforcement learning can master a range of games without requiring human data. This base methodology has been adapted to rather nonstandard games, such as protein folding and finding matrix multiplication algorithms.
One of the most important "games" that humans play is mathematics. Math feels like a game in several ways, including that many people have fun with it, and that the rules are in principle well-defined. One of my main research questions is: how can we train agents to play the game of mathematics, in a general fashion, without relying on human-written proofs?
Interactive theorem provers, like Lean, Coq and Isabelle, get very close to properly defining a general game of mathematics. In any of them, we can state arbitrary definitions, axioms, theorems, express proofs, and get a reward once we find a proof to a proposition. In fact, interacting with them is quite fun and does feel like a game. But one of the main differences between them and a game like chess is that their action space -- the set of valid moves at a given state -- is infinite. This pretty much rules out learning from scratch a la AlphaZero, since even proposing valid actions for a given state becomes challenging.
There are trivial ways to get a finite action space. For example, one alternative could be that each action "adds a character" to the current proof, and an additional special action submits it to Lean, producing a reward if the proof type-checks. But this space is too sparse - even the simplest possible Lean proof (refl) would be extremely unlikely to be generated.
My main research project, Peano, is a theorem-proving environment based on dependent type theory where the action space is finite. Unlike existing languages, which were designed for humans to write proofs, Peano is designed with automated proof search in mind, while keeping proofs high-level and highly readable by humans (unlike, e.g. first-order resolution-based theorem provers). Peano derives actions directly using the type system (instead of arbitrary strings that the Lean runtime accepts, for instance, like in environments such as leam-gym). In an ongoing continuation of our first paper, I wrote a Peano version of the Natural Number Game, a popular project used to teach Lean. I am using it as a first target for an agent that is learning theorem proving from scratch. Even this simple set of theorems is challenging for fully automated proof search, since it constantly involves induction and higher-order reasoning.
In the original Peano paper, we formalized sections of Khan Academy algebra curriculum in Peano, and trained agents to solve problems without training data. Making progress in this setting requires learning abstractions (tactics) from the agent's own solutions, much like DreamCoder did for program synthesis. In an ongoing collaboration, we're extending tactic induction to Coq.
My work typically leverages language models to learn policies and value functions. I have worked on methods for reinforcement learning in settings with variable action spaces and sparse binary rewards, like those that arise in mathematical reasoning (Contrastive Policy Learning, NeurIPS 2021), as well as for constrained decoding from pre-trained language models (Synchromesh, ICLR 2022), to sample while satisfying a set of constraints, such as a grammar or even basic typing rules. More recently, we've introduced a class of tools for language models called guides, where the model itself can decide when to use constrained generation to guarantee validity of its outputs. In this work on Certified Reasoning, We used Peano and Synchromesh to construct a "logic guide", allowing language models to formalize assumptions given in natural language and then ensure that their logical deductions are correct, yielding a logically sound chain-of-thought. The framework of guides is quite flexible and allows many new tools to be integrated with language models; we're working on other guides, including for information retrieval.
Ultimately, one of my main goals is to use formal theorem proving environments in math education. I strongly believe these tools can help demistify what constitutes a valid proof, one of the major learning challenges when students start their journey in higher-level mathematics. They can also provide instant, rich feedback on your reasoning, which is expensive and extremely delayed when taking classes and submitting problem sets in LaTeX (imagine learning programming without being able to run your programs!). The dream is to transform as much of mathematics as possible in (literally) a game, that anyone can play, appreciate, enjoy and create.