This paper shows that pre-training models on simple artificial tasks that capture primitive notions of reasoning can be useful for transfering to downstream tasks on very different surface-level data (but that in principle includes those reasoning primitives).
The core idea is nice, and is based on the typology of reasoning patterns due to [Charles Peirce]: one can either reason deductively (derive new facts from known facts and inference rules), inductively (derive underlying rules that imply observed facts), and abductively (derive premises that would imply observed facts given known deductive rules). These three modes of reasoning are then realized in three sequence-to-sequence tasks based on string substitutions. Deduction is to perform the substitutions, induction is to infer the rules of substitutions given the starting and end strings, and abduction is to infer the starting strings given the results and the substitution rules.
The downstream tasks are several language modeling tasks that are connected to theorem proving. It's interesting that transfer does happen, since these tasks usually have limited training data. This is a cool technical finding, although I'm generally unconvinced about these proxy tasks.