# Proof Artifact Co-training for Theorem Proving with Language Models (@ ICLR 2022)

### Jesse Michael Han, Jason Rute, Yuhuai Wu, Edward W. Ayers, Stanislas Polu

This paper describes a simple data augmentation scheme for learning language models
for theorem proving. The main challenge is that there is limited data for theorem
proving when compared with other domains: you can get millions of Python repositories
from Github, for instance, but much fewer sources for Lean proofs (the theorem proving
language the authors use). Their idea is to use proof objects to create a number of
auxiliary tasks, which all benefit from the uniformity of training language models
for different tasks. Given the tasks all require learning relevant structures of
proofs and of the underlying language, this co-training scheme improves downstream
performance.