Gabriel Poesia

Autoformalization with Large Language Models (@ arXiv 2022)

Yuhuai Wu, Albert Q. Jiang, Wenda Li, Markus N. Rabe, Charles Staats, Mateja Jamnik, Christian Szegedy


This paper investigating the use of recent large language models (namely, OpenAI Codex and Google's recent PaLM) to autoformalize mathematical statements: translating them from natural language into Isabelle. Codex seems to do better than PaLM at this task, despite being smaller, but likely because of it's training data distribution. They mostly focus on translating math olympiad problems to formal statements, showing that these LLMs have some capability to perform this task from few-shot examples. It's not obvious they would, as the kind of relevant paired data for formal proofs is almost non-existent on the Web, so this is an interesting empirical observation.

As a side note, translating olympiad problems also shows one mismatch of formal theorem provers to the kind of challenge that is presented for students. Many of the exercises are about computing the value of a very complicated expression, which one needs to decompose or make several observations and reductions so that that is doable by hand, without a computer, in a short amount of time. At the end, though, the student reaches a number, which is obviously not told beforehand. Since theorem statements in formal languages come before their proofs, it's not straightforward to translate the problem of "find the value of $\cdots$" into a formal theorem statement. In this paper, they do the translation by appending the answer to the problem statement. This is a valid choice for this experimental investigation, but shows that existing languages and environments might not be natural for expressing the kinds of exercises we are used to in educational settings.