OpenAI Whisper - Robust Speech Recognition via Large-Scale Weak Supervision

Overview

This course is focused on developing a robust automated speech recognition system. It begins by providing an overview of the research paper related to this system and moves to collecting a large-scale, weakly supervised dataset. The course then addresses evaluation metric issues and Word Error Rate (WER). It then covers effective robustness, scaling laws in progress and discusses how to decode in an effective way. There is a code walk-through to explain the model architecture and diagram, highlighting how this relates to the code. The transcription task is discussed, including how to load the audio, create mel spectrograms and detect language. There is also information on how to suppress token logits and detect voice activity. Finally, the course ends with decoding and heuristics.

Syllabus

Intro
Paper overview
Collecting a large scale weakly supervised dataset
Evaluation metric issues WER
Effective robustness
Scaling laws in progress
Decoding is hacky
Code walk-through
Model architecture diagram vs code
Transcription task
Loading the audio, mel spectrograms
Language detection
Transcription task continued
Suppressing token logits
Voice activity detection
Decoding and heuristics
Outro

Taught by

Aleksa Gordić - The AI Epiphany

Reviews

Start your review of OpenAI Whisper - Robust Speech Recognition via Large-Scale Weak Supervision

Udemy, Coursera, 2U/edX Face Lawsuits Over Meta Pixel Use

Most common

Popular subjects

Popular courses

OpenAI Whisper - Robust Speech Recognition via Large-Scale Weak Supervision

Overview

Syllabus

Taught by

Reviews

Udemy, Coursera, 2U/edX Face Lawsuits Over Meta Pixel Use

Taught by

Never Stop Learning.