Timeline

 

Introduction           

What is SR?

Speech Recognition vs. Voice Recognition

Technology of SR

Timeline

The Players          

Future

Limitations and Potential

References

Speech Recognition has been of interest to scientists for more than a century. The evolution of the study of Speech Recognition began with extensive research into the complex biological nature of the human voice. Applications to these findings on audible speech started to take form with the help of the computer. Today, SR systems are quickly becoming reliable and accurate programs used to recognize and interpret the spoken word.

 

Important Events in the Evolution of Speech Recognition

 

1874: Alexander Graham Bell proves that frequency harmonics from an electrical signal can be divided. This eventually leads to the digitization of speech.

1952: Bell Labs develops the first effective speech recognizer (97% accurate) using the simple frequency splitter technology similar to the one developed by Alexander Graham Bell 78 years ago.

1969: Computer systems such as the Vicens System and the Medress System are developed with limited vocabularies in memory. Intermittent speaking, or discrete speech, is required for the voice to be recognized.

1971-1976: U.S. Defense Advanced Research Projects Agency (DARPA) funds projects for Speech Understanding Research (SUR).

The major findings from this initiative include improvements to reduce the problem  of variability in the voice:

Japanese scientists Itakura, Skoe, and Chiba create dynammic programming which becomes the standard for optimal non-linear time alignments.

Markov Modeling by Jim Baker and IBM’s Fred Jelinek - a mathematical model for located invariant information in the speech signal[1].

 Mid 1970’s:

Itakura conducts research and develops product on the basis that noises may sound similar, but actually are different. His product tested 97.3% accurate with a vocabulary of about 200 words.

Bell Labs develops a system with a 97.1% accuracy rate that can interpret voices from different people (however the vocabulary bank is very small).

Late 1970’s: commercial Speech Recognition packages become available. Prices range from $259 to $100,000.

1980’s: The SR market splits in two directions: 1) Call center Speech Recognition systems and 2) Speech-to-text applications.

1990’s: Processing power of the personal computer reaches the necessary level for SR software to be used effectively by ordinary user (about 200Mhz).

1999: SR programs can understand continuous speech from multiple users with up to 99% accuracy. The time needed for system to learn a user’s voice is under 10 minutes. Vocabulary databases exceed 250,000 terms and 160,000 active words.

 
 
Copyright © 1999 Ira Greenberg and Andrew Bate.  All Rights Reserved.