Optimizing a Text-To-Speech model using 🤗 Transformers🤗...

Optimizing a Text-To-Speech model using 🤗 Transformers

🤗 Transformers provides many of the latest state-of-the-art (SoTA) models across domains and tasks. To get the best performance from these models, they need to be optimized for inference speed and memory usage.

The 🤗 Hugging Face ecosystem offers precisely such ready & easy to use optimization tools that can be applied across the board to all the models in the library. This makes it easy to reduce memory footprint and improve inference with just a few extra lines of code.

In this hands-on tutorial, I'll demonstrate how you can optimize Bark, a Text-To-Speech (TTS) model supported by 🤗 Transformers, based on three simple optimizations. These optimizations rely solely on the Transformers, Optimum and Accelerate libraries from the 🤗 ecosystem.

This tutorial is also a demonstration of how one can benchmark a non-optimized model and its varying optimizations.

For a more streamlined version of the tutorial with fewer explanations but all the code, see the accompanying Google Colab.

This blog post is organized as follows:https://github.com/huggingface/blog/blob/main/optimizing-bark.md

GitHub

blog/optimizing-bark.md at main · huggingface/blog

Public repo for HF blog posts. Contribute to huggingface/blog development by creating an account on GitHub.