They present GPT-f, a transformer-based model that guides a search algorithm for proofs in the Metamath system. Technically, the approach is simple (which is good): they straight use a language model to produce terms that are likely to come next in the proof. They tightly integrate that with a Metamath implementation (which is an advantage of using Metamath instead of Coq, Lean et al: much easier to implement and faster to check).

Technical results are in the paper, so I’ll focus on two impressions. First: I find it very surprising that pre-training helps as much as it does. They pre-train the model on mathematical text, like math papers, math.stackexchange and Github (less math-like, but certainly more than average Web page). Second, it’s hard to argue with the results when their system contributed new proofs that were accepted by the Metamath community. In the end, the paper has quotes from people involved with Metamath, and although I’m not in a position to actually understand the proofs, the reception makes it clear that they are meaningful. In short, language models are powerful, even for highly structured outputs. Let’s use more of them!