Whisper
Voice & AudioOpenAI's open-source speech recognition model that converts spoken audio to text with high accuracy across 99 languages.
Whisper is OpenAI's speech-to-text model, released as open source in 2022. It transcribes audio into text with remarkable accuracy, handling accents, background noise, and technical vocabulary better than most commercial alternatives.
Because Whisper is open source, it can be run locally (complete privacy, no API costs) or accessed via OpenAI's API. It supports 99 languages and can translate foreign language audio directly to English. It handles various audio formats and even works with poor-quality recordings.
Whisper powers many AI applications: meeting transcription (Otter.ai, tl;dv, Fathom), podcast transcription, subtitle generation, voice input for chatbots, and accessibility tools. Its open-source nature has made accurate speech recognition essentially free.
Real-World Example
Whisper is the open-source speech recognition model behind many transcription tools on Coda One — from Otter.ai's meeting notes to tl;dv's recording summaries.
Related Terms
More in Voice & Audio
FAQ
What is Whisper?
OpenAI's open-source speech recognition model that converts spoken audio to text with high accuracy across 99 languages.
How is Whisper used in practice?
Whisper is the open-source speech recognition model behind many transcription tools on Coda One — from Otter.ai's meeting notes to tl;dv's recording summaries.
What concepts are related to Whisper?
Key related concepts include Voice AI, Open Source (AI). Understanding these together gives a more complete picture of how Whisper fits into the AI landscape.