Gabriel Poesia

Longformer: The Long-Document Transformer (@ arXiv)

Iz Beltagy, Matthew E. Peters, Arman Cohan

Link

This is one of several attempts to replace the attention mechanism in Transformers by something that scales better. In this case, they construct a linear-time attention block by combining a local fixed-size sliding window around each token and a global attention on just a fixed number of task-specific tokens.