Nyströmformer: Approximating self-attention in linear tim... | Nyströmformer: Approximating self-attention in linear tim...
Nyströmformer: Approximating self-attention in linear time and memory via the Nyström method
<script async defer src="https://unpkg.com/medium-zoom-element@0/dist/medium-zoom-element.min.js"></script>
Introduction
Transformers have exhibited remarkable performance on various Natural Language Processing and Computer Vision tasks. Their success can be attributed to the self-attention mechanism, which captures the pairwise interactions between all the tokens in an input. However, the standard self-attention mechanism has a time and memory complexity of \(O(n^2)\) (where \(n\) is the length of the input sequence), making it expensive to train on long input sequences.

The Nyströmformer is one of many efficient Transformer models that approximates standard self-attention with \(O(n)\) complexity. Nyströmformer exhibits competitive performance on various downstream NLP and CV tasks while improving upon the efficiency of standard self-attention. The aim of this blog post is to give readers an overview of the Nyström method and how it can be adapted to approximate self-attention.https://github.com/huggingface/blog/blob/main/nystromformer.md blog/nystromformer.md at main · huggingface/blog