Limitations and Potential

 

Introduction           

What is SR?

Speech Recognition vs. Voice Recognition

Technology of SR

Timeline

The Players          

Future

Limitations and Potential

References

In the past, the major constraint to developing the perfect Speech Recognition system was the limited processing power of the computer's microprocessor. Once this obstacle was overcome with the development of the microchip, the true limitations of SR technology became visible: the ability to develop logarithms sophisticated enough to nearly perfectly understand, interpret, and respond to voice commands. The answers to this problem still elude the most successful research institutions. For instance, some systems can understand input from a variety of users but with a limited vocabulary bank. Conversely, other systems recognize over 200,000 words but from only a very limited number of users. There does not exist a program that can comprehend extensive vocabularies from various speakers.

The commercial programs available today require a "training session" with each user, which may last over an hour. During this time, not only does the user have to learn how to speak to the machine, but the computer also needs to become accustomed to the user's voice. This may be a constraint on productivity because of the lost hours, but this also presents another problem. This new problem lies in the fact that systems will need to learn to understand multiple users in a short time (or instantly). For instance, when we go to the McDonald's drive-thru window and order a burger with ketchup, we will expect the computer system to recognize our verbal input immediately. It would not be fast food if we had to train the Speech Recognition program for half an hour!

Another limitation to the use of current SR tools is that there are nearly unlimited variables comprising the noise of voice. For example, when we answer a phone call just as we wake from sleep, our voice sounds different than after we cheered all night at a basketball game. Additionally, background noise poses limiting factors on the effectiveness of SR technology. It is relatively easy for the computer to filter background noise when we are speaking in a quiet office, but if we were to say the same phrase on a busy street the SR systems will be confused.

Even though the current SR systems have limitations, significant progress has been made in developing a perfectly reliable SR program. Once these frustrating hindrances are overcome, the potential for SR technology is enormous. The traditional methods of inputting data into a computer such as a mouse and keyboard will  become obsolete. Furthermore, the interaction between the user and the computer will commonly be speak/listen/speak… as opposed to mechanical-input/read/mechanical-input… For a further look into the future and potential of Speech Recognition technology visit the Future.

The current commercial SR products do not have the capability to be used on a wide-scale basis.  Depending on the application of this technology, it may or may not be an appropriate time to adopt these SR systems.  For instance, the SR technology may be effectively implemented to reduce costs in call centers.  However, SR technology is perhaps not at a level suitable to increase the productivity of office tasks.  With the rate the SR software applications are developing, it will soon be beneficial to employ Speech Recognition technology in everyday functions.
 
Copyright © 1999 Ira Greenberg and Andrew Bate.  All Rights Reserved.