Emerging Technology Review: Speech Recognition

Anne Kurtzahn

Patrick Samsel


What is Speech Recognition?

Speech recognition (SR) is an emerging technology that will impact the convergence of the telephony, television and computing industries. SR technology has been available for many years. However, it has not been practical due to the high cost of applications and computing resources and the lack of common standards to integrate SR technology with software applications.

The business community has not yet fully embraced SR -- the voice-to-text (dictation) applications only generated $48 million of revenue in the US during 1996. According to William Meisel, of the "Speech Recognition Update" newsletter, business has not yet moved to SR technology due to:

However, these concerns are being addressed. The SR area should experience significant growth and have a substantial impact on business and society over the next 10 - 20 years, particulary in telephony (call center management, voice mail, PDAs) and voice-to-text (VTT) applications.

Speech Recognition Technology and Applications

Speech recognition is an enabling technology that may radically change the interface between humans and computers (and other devices having computational abilities). The current interface with these devices is the keyboard/keypad and mouse. However, SR is a complex technological challenge. In order to achieve SR a computer must perform the following functions:

SR requires a software application "engine" with logic built in to decipher and act on the spoken word. Numerous engines exist, follow this link to see a list of engines and their capabilities . There are three main development weaknesses with most available SR engines:

  1. Inability to decipher conversational speech. Most engines are capable of interpreting words that are spoken clearly with a specific cadence in an environment free of significant background interference/noise. This weakness requires users to develop SR "computing skills". The user needs to learn the language of the specific SR engine. Work is being done at Stanford University's Applied Speech Technology Laboratory to develop Conversational Speech Recogonition (CSR). Conversational means that the engine is able to interpret a user's skills and ask appropriate questions to ensure that the correct commands are being executed. In essence, CSR adapts to the user instead of making the user adapt to it.

  2. Lack of standards for quick and economical application development. The Speech Recognition Programming Interface Committee (SRAPI) is working with a consortium of technology firms to develop standards that will bring the capabilities of SR into mainstream acceptance and use.

  3. Ability to interpret the context of the speaker is a critical limitation of current technology. It is difficult to program an engine to recognize and interpret speaker context. Victor Zue at MIT is working to improve this situation by developing engines that can operate within the context of specified content domains. An article describing Zue's work and the limitations that context impose on SR applications can be found in the Economist .

The implications of this technology will be far ranging once the cost of computing resources become reasonable, standards are developed, and competent SR engines are developed. For the most part, the conditions mentioned above have already been met. Costs have been reduced - SR products can now be run on existing Pentium level PCs. Standards have developed but still need improvement - click here for a discussion of standards. Numerous engines have been developed with a broad range of capability. These advancements have resulted in products like: Interactive Voice Response systems (IVRs) including call center activities and voice mail systems. Cyber Voice Incorporated is one of many firms that are using SR technology to improve customer call center operations. Voice-to-text (VTT) applications that take dictation of words and numbers and automatically insert them into word processors/spreadsheets and are used to perform common computing commands. IBM and other software providers offer applications with similar features. A list of other commercially available SR applications shows the potential of SR technology.

Benefits of SR Applications

Potential Future Applications

Related Links

The Applied Speech Laboratory at CSLI
MIT Lincoln Laboratory Speech Systems Technology Group
Commercial Speech Recognition
References and Books on Speech Recognition

If you have any questions or comments, please contact:
Anne Kurtzahn or Patrick Samsel