The Reformer - Pushing the limits of language modelingOpe... | The Reformer - Pushing the limits of language modelingOpe...
The Reformer - Pushing the limits of language modeling
Open In Colab

How the Reformer uses less than 8GB of RAM to train on sequences of half a million tokens
The Reformer model as introduced by Kitaev, Kaiser et al. (2020) is one of the most memory-efficient transformer models for long sequence modeling as of today.

Recently, long sequence modeling has experienced a surge of interest as can be seen by the many submissions from this year alone - Beltagy et al. (2020), Roy et al. (2020), Tay et al., Wang et al. to name a few. The motivation behind long sequence modeling is that many tasks in NLP, e.g. summarization, question answering, require the model to process longer input sequences than models, such as BERT, are able to handle. In tasks that require the model to process a large input sequence, long sequence models do not have to cut the input sequence to avoid memory overflow and thus have been shown to outperform standard "BERT"-like models cf. Beltagy et al. (2020).
https://github.com/huggingface/blog/blob/main/reformer.md blog/reformer.md at main · huggingface/blog