# Proving Theorems using Incremental Learning and Hindsight Experience Replay (@ arXiv 2021)

### Eser Aygün, Laurent Orseau, Ankit Anand, Xavier Glorot, Vlad Firoiu, Lei M. Zhang, Doina Precup, Shibl Mourad

This paper proposes a simple and neat idea: using Hindsight Experience Replay in training neural theorem provers
to generate useful training data from failed proof attempts. In their case, they target first-order logic proofs
without equality, a subset of the TPTP benchmark. The reviewers mainly complained about the experimental comparisons
against E, which don't really show overall improvements, and don't contain any other learning-based method.

Their search algorithm is interesting: instead of basing it on top of an off-the-shelf FOL prover like E,
they propose this simple 3-priority-queue approach which guarantees that all clauses are visited at some point,
but can still prioritize clauses ranked highly by the neural model.

Other than the experimental doubts, I like this idea, and had thought of it in a similar context. This is one advantage of
deductive proof systems: *every time* you apply a deduction rule, you prove *something*, even though it might not
be even going towards the direction of the goal. This is not the case in top-down approaches, where one tries to reduce
the goal to the axioms, but don't necessarily have a complete proof along the way (except if some sub-goals are closed).