This is one of several attempts to replace the attention mechanism in Transformers by something that scales better. In this case, they construct a linear-time attention block by combining a local fixed-size sliding window around each token and a global attention on just a fixed number of task-specific tokens.