Gabriel Poesia

Learning Macro-Actions in Reinforcement Learning (@ NeurIPS 1998)

Jette Randlov

Link

This is a classical paper on learning ``macro-actions'', which is a simpler version of what is more recently known as ``skills''. The idea of the paper is quite simple: an agent typically learns a policy, which maps states into actions. The author proposes to also learn an action-to-action policy, which maps the last action to the next. This action-to-action policy can only learn to perform common sequences of actions, since it can't look at the state. If you learn both, the test-time policy can then be just a linear combination between your regular policy and this action-to-action policy.

In the case where there are inherent macro-actions in the environment, it's easy to see how this can dramatically speed-up learning by automatically generalizing to unseen states. This is shown in a few environments. It also does not work on an environment with just 3 actions, which makes sense - the space of action-to-action policies is too limited in this case ($3^3$ possible policies).