When doing almost any task, people clearly think in a hierarchy of levels of detail: if you think of how to visit a friend, your first plan might be "get my bike, then bike to their place", only so that "get my bike" expands into "find the key to the bike lock" and "find my helmet", etc, until we actually get to the lowest-level procedural skill of actually riding the bike. The idea that artificial agents should similarly plan their actions in this hierarchical fashion has also existed in Reinforcement Learning for decades, originally in grid world settings and simple environments. Scaling it up to complex tasks with minimal supervision on how to do this hierarchy largely remains a challenge.
This paper shows one way to do this in three search problems, in a really appealing framework: they simply train a deep generative model to generate sub-goals that are trained to be approximately $k$-steps ahead of the current state, and that should be on the path to the solution. They train this generator from expert trajectories, which are available in the 3 problems they choose (Rubik's Cube, where they train by generating solutions in reverse, Sokoban and INT). At test time, the execution is straightforward: you first query the generator to generate a candidate set of sub-goals, which shouldn't be far from the current state. You then run a standard search algorithm to try to reach one of those subgoal states. You repeat until you get to an actual goal state.
This strategy is simple and seems to work surprisingly well, provided you have enough data to train the subgoal generator. This is fine in puzzles and in INT, which has an infinite generator of theorems and proofs, but might be a real problem for generalizing this idea. Nevertheless, it's interesting to see that, at least in the ideal case, it can be made to work.