Optimizing Stable Diffusion for Intel CPUs with NNCF and...

Optimizing Stable Diffusion for Intel CPUs with NNCF and 🤗 Optimum
Latent Diffusion models are game changers when it comes to solving text-to-image generation problems. Stable Diffusion is one of the most famous examples that got wide adoption in the community and industry. The idea behind the Stable Diffusion model is simple and compelling: you generate an image from a noise vector in multiple small steps refining the noise to a latent image representation. This approach works very well, but it can take a long time to generate an image if you do not have access to powerful GPUs.

Through the past five years, OpenVINO Toolkit encapsulated many features for high-performance inference. Initially designed for Computer Vision models, it still dominates in this domain showing best-in-class inference performance for many contemporary models, including Stable Diffusion. However, optimizing Stable Diffusion models for resource-constraint applications requires going far beyond just runtime optimizations. And this is where model optimization capabilities from OpenVINO Neural Network Compression Framework (NNCF) come into play.

In this blog post, we will outline the problems of optimizing Stable Diffusion models and propose a workflow that substantially reduces the latency of such models when running on a resource-constrained HW such as CPU. In particular, we achieved 5.1x inference acceleration and 4x model footprint reduction compared to PyTorch.