Accurate and efficient speech-to-text transcription for audio and video data.