This paper proposes a simple and neat idea: using Hindsight Experience Replay in training neural theorem provers to generate useful training data from failed proof attempts. In their case, they target first-order logic proofs without equality, a subset of the TPTP benchmark. The reviewers mainly complained about the experimental comparisons against E, which don't really show overall improvements, and don't contain any other learning-based method.
Their search algorithm is interesting: instead of basing it on top of an off-the-shelf FOL prover like E, they propose this simple 3-priority-queue approach which guarantees that all clauses are visited at some point, but can still prioritize clauses ranked highly by the neural model.
Other than the experimental doubts, I like this idea, and had thought of it in a similar context. This is one advantage of deductive proof systems: every time you apply a deduction rule, you prove something, even though it might not be even going towards the direction of the goal. This is not the case in top-down approaches, where one tries to reduce the goal to the axioms, but don't necessarily have a complete proof along the way (except if some sub-goals are closed).