“The primary intended users of [the Whisper] models are AI researchers studying robustness, generalization, capabilities, biases and constraints of the current model. However, Whisper is also potentially quite useful as an automatic speech recognition solution for developers, especially for English speech recognition,” OpenAI wrote in the GitHub repo for Whisper, from where several versions of the system can be downloaded. “[The models] show strong ASR results in ~10 languages. They may exhibit additional capabilities … if fine-tuned on certain tasks like voice activity detection, speaker classification or speaker diarization but have not been robustly evaluated in these area.”
Whisper has its limitations, particularly in the area of text prediction. Because the system was trained on a large amount of “noisy” data, OpenAI cautions Whisper might include words in its transcriptions that weren’t actually spoken — possibly because it’s both trying to predict the next word in audio and trying to transcribe the audio itself. Moreover, Whisper doesn’t perform equally well across languages, suffering from a higher error rate when it comes to speakers of languages that aren’t well-represented in the training data. Despite this, OpenAI sees Whisper’s transcription capabilities being used to improve existing accessibility tools.
Read more of this story at Slashdot.