Developing and understanding Automatic Speech Recognition (ASR) systems is an inter-disciplinary activity, taking expertise in linguistics, computer science, mathematics, and electrical engineering.
When a human speaks a word, they cause their voice to make a time-varying pattern of sounds. These sounds are waves of pressure that propagate through the air. The sounds are captured by a sensor, such as a microphone or microphone array, and turned into a sequence of numbers representing the pressure change over time. The automatic speech recognition system converts this time-pressure signal into a time-frequency-energy signal. It has been trained on a curated set of labeled speech sounds, and labels the sounds it is presented with. These acoustic labels are combined with a model of word pronunciation and a model of word sequences, to create a textual representation of what was said.
Instead of exploring one part of this process deeply, this course is designed to give an overview of the components of a modern ASR system. In each lecture, we describe a component's purpose and general structure. In each lab, the student creates a functioning block of the system. At the end of the course, we will have built a speech recognition system almost entirely out of Python code.
MOOCs stand for Massive Open Online Courses. These arefree online courses from universities around the world (eg. StanfordHarvardMIT) offered to anyone with an internet connection.
How do I register?
To register for a course, click on "Go to Class" button on the course page. This will take you to the providers website where you can register for the course.
How do these MOOCs or free online courses work?
MOOCs are designed for an online audience, teaching primarily through short (5-20 min.) pre recorded video lectures, that you watch on weekly schedule when convenient for you. They also have student discussion forums, homework/assignments, and online quizzes or exams.