Introducing Parler TTS v1 π - 885M (Mini) & 2.2B (Large) - fully open-source Text-to-Speech models! π€
> Trained on 45,000 hours of open speech (datasets released as well)
> Upto 4x faster generation thanks to torch compile & static KV cache (compared to previous v0.1 release)
> Mini trained on a larger text encoder, large trained on both larger text & decoder
> Also supports SDPA & Flash Attention 2 for an added speed boost
> In-built streaming, we provide a dedicated streaming class optimised for time to the first audio
> Better speaker consistency, more than a dozen speakers to choose from or create a speaker description prompt and use that
> Not convinced with a speaker? You can fine-tune the model on your dataset (only couple of hours would do)
Apache 2.0 licensed codebase, weights and datasets! π€
Can't wait to see what y'all would build with this!π«‘
> Trained on 45,000 hours of open speech (datasets released as well)
> Upto 4x faster generation thanks to torch compile & static KV cache (compared to previous v0.1 release)
> Mini trained on a larger text encoder, large trained on both larger text & decoder
> Also supports SDPA & Flash Attention 2 for an added speed boost
> In-built streaming, we provide a dedicated streaming class optimised for time to the first audio
> Better speaker consistency, more than a dozen speakers to choose from or create a speaker description prompt and use that
> Not convinced with a speaker? You can fine-tune the model on your dataset (only couple of hours would do)
Apache 2.0 licensed codebase, weights and datasets! π€
Can't wait to see what y'all would build with this!π«‘