Speech recognition, often called automatic speech recognition, is the process by which a computer recognizes what a person said. If you are familiar with speech recognition, it's probably from applications based around the telephone. If you've ever called a company and a computer asked you to say the name of the person you want to talk to, the computer recognized the name you said through speech recognition.

This is very different from a computer actually understanding what you said. When two people speak to one another, they both recognize the words and the meaning behind them. Computers, on the other hand, are only capable of the first thing: they can recognize individual words and phrases, but they don't really understand speech in the same way humans do.

Speech recognition is still useful, however, because we don't need computers to actually carry on conversations with us — we just need to give them commands. When you type a word or phrase, the computer doesn't actually understand English, but it recognizes the command and software tells the computer what to do when that command is recognized.

The same is true of speech recognition software. Users speak commands that are recognized by a piece of software called the ASR, for Automatic Speech Recognizer. The ASR then tells the speech application what the user said, and the application determines what to do next.

In speech applications such as dictation software, the application's response to hearing a recognized word may be to write it in a word processor. In an interactive voice response system, the speech application might recognize a person's name and route a caller to that person's phone.

Speech recognition is also different from voice recognition, though many people use the terms interchangeably. In a technical sense, voice recognition is strictly about trying to recognize individual voices, not what the speaker said. It is a form of biometrics, the process of identifying a specific individual, often used for security applications.

Because we all have distinct speaking styles — this is why you can tell your mom's voice from your favorite radio talk show host's — computers can take a sample of speech and analyze it for distinct characteristics, creating a "voice print" that is unique to an individual in the same way a fingerprint is. A common voice recognition system might make the user speak a password. It would then compare the speaker's voice print to a stored voice print and authenticate the user if they matched.

Though speech recognition uses some of the same fundamental technology as voice recognition, it is different because it does not try to identify individuals. Rather it tries to recognize what individuals say. It's the difference between knowing who is speaking and what is said.

Though they vary greatly, recognizers generally use a similar process to figure out what a speaker said:

  1. The ASR loads a list of words to be recognized. This list of words is called a grammar.
  2. The ASR loads audio from the speaker. This audio is represented as a waveform, essentially the mathematical representation of sound.
  3. The ASR compares the waveform to its own acoustic models. These are databases that contain information about the waveforms of individual sounds and are what allow the engine to recognize speech.
  4. The ASR compares the words in the grammar to the results it obtained from searching its acoustic models.
  5. It then determines which words in the grammar the audio most closely matches and returns a result.

Software Type