Riffusion: Optimized for Mobile DeploymentState-of-the-ar... | Riffusion: Optimized for Mobile DeploymentState-of-the-ar...
Riffusion: Optimized for Mobile Deployment
State-of-the-art generative AI model used to generate spectrogram images given any text input. These spectrograms can be converted into audio clips
Generates high resolution spectrograms images from text prompts using a latent diffusion model. This model uses CLIP ViT-L/14 as text encoder, U-Net based latent denoising, and VAE based decoder to generate the final image.

This model is an implementation of Riffusion found here. This repository provides scripts to run Riffusion on Qualcomm® devices. More details on model performance across various devices, can be found here.