Faster Text Generation with TensorFlow and XLATL;DR: Text...

Faster Text Generation with TensorFlow and XLA
TL;DR: Text Generation on 🤗 transformers using TensorFlow can now be compiled with XLA. It is up to 100x faster than before, and even faster than PyTorch -- check the colab below! Open In Colab

Text Generation
As the quality of large language models increased, so did our expectations of what those models could do. Especially since the release of OpenAI's GPT-2, models with text generation capabilities have been in the spotlight. And for legitimate reasons -- these models can be used to summarize, translate, and they even have demonstrated zero-shot learning capabilities on some language tasks. This blog post will show how to take the most of this technology with TensorFlow.

The 🤗 transformers library started with NLP models, so it is natural that text generation is of utmost importance to us. It is part of Hugging Face democratization efforts to ensure it is accessible, easily controllable, and efficient. There is a previous blog post about the different types of text generation. Nevertheless, below there's a quick recap of the core functionality -- feel free to skip it if you're familiar with our generate function and want to jump straight into TensorFlow's specificities.

Let's start with the basics. Text generation can be deterministic or stochastic, depending on the do_sample flag. By default it's set to False, causing the output to be deterministic, which is also known as Greedy Decoding. When it's set to True, also known as Sampling, the output will be stochastic, but you can still obtain reproducible results through the seed argument (with the same format as in stateless TensorFlow random number generation). As a rule of thumb, you want deterministic generation if you wish to obtain factual information from the model and stochastic generation if you're aiming at more creative outputs.