Whisper MedusaWhisper is an advanced encoder-decoder mode... | Whisper MedusaWhisper is an advanced encoder-decoder mode...
Whisper Medusa
Whisper is an advanced encoder-decoder model for speech transcription and translation, processing audio through encoding and decoding stages. Given its large size and slow inference speed, various optimization strategies like Faster-Whisper and Speculative Decoding have been proposed to enhance performance. Our Medusa model builds on Whisper by predicting multiple tokens per iteration, which significantly improves speed with small degradation in WER. We train and evaluate our model on the LibriSpeech dataset, demonstrating speed improvements.https://huggingface.co/datasets/openslr/librispeech_asr openslr/librispeech_asr · Datasets at Hugging Face