|
|
IntroductionWhat is SR?Speech Recognition vs. Voice RecognitionTechnology of SRTimelineThe PlayersFutureLimitations and PotentialReferences |
Commercial ApplicationsThe first commercial applications of computer aided
voice recognition came in the medical and legal fields.
Physicians and attorneys used to dictate notes on a case to an
answering service and a secretary would type the report.
As the power of the computer hardware and software improved, the speech recognition capabilities of the computer
became sufficient to transcribe
these dictations. Rather than
having someone re-type the entire report, a human was merely needed to proofread
the document after the computer constructed a rough draft.
Soon the necessity for a human proofreader will vanish as the technology
becomes even more powerful.
The need for an accurate and efficient method of transcription provided
the impetus for today’s commercial voice recognition software.
There are three major players in the end-user
commercial application of speech recognition; IBM, Lernaut and Hauspie, and
Dragon Systems. These three
companies provide software packages that convert audible words into digital data
that the computer applications can transform into usable data. IBM's ViaVoice, L&H's VoiceXPress, and Dragon System's
Naturally Speaking are very similar products that are comparable in price,
ease-of-use, and features. The
deluxe version of these programs costs about $150 and has a vocabulary of over
200,000 words. They will convert
voice data into usable data for most popular software applications and have
customized interfaces for the Microsoft suite of applications. These programs are programmed to recognize and correctly
interpret dates, currency and numbers. The
user can control the operations of the computer (such as opening and
closing files and browsing the Web) through voice commands and macros.
The software will also read text and numbers to the user in a human voice.
All of these voice recognition programs require an intense training
session (from 15 minutes to an hour) to learn the specific patterns of an
individual's voice. As computer
processor speeds have improved, so has the accuracy and speed of these voice
recognition software applications. VoiceXML
In March 2, 1999, twenty leading speech, Internet and
communications technology companies announced the formation of the Voice
eXtensive Markup Language Forum to develop a standard in voice recognition
technology. The VXML Forum
"aims to drive the market for voice- and phone-enabled Internet access by
promoting a standard specification for VXML, a computer language used to used to
create Web content and services that can be accessed by phone."[1]
Once a standard in the computer community is established, there will be an
increased adaptation of voice recognition technology by third party software
developers. Even simple programs
will be able to incorporate voice recognition technology without a large
investment in development time and skill. Natural Language Speech Assistant by Motorola and
Unisys
The Natural Language Speech Assistant (NLSA) is a
developer’s toolkit for the development of software that enables customers to
access the data they need using their own everyday language (or natural
language), rather than restricting the responses to keypad entries or
single-word answers. The NLSA
equips developers with the tools necessary for writing speech-enabled applications.
This eliminates the need to learn the details of programming speech recognizing
programs. In addition, it protects
programmer's development investments in order to migrate towards different
speech recognizers. Furthermore, NLSA will hopefully enhance current Internet
Voice Recognition applications as well as develop new and more sophisticated
applications by capitalizing on the speech technology available today. A
number of large corporations have
dedicated immense budgets towards researching SR technologies. Here are some of the
prominent players: Unisys
Corporation
Unisys is developing the technology
to successfully contact customers through a strategy that incorporates multiple
mediums
including the press, TV, Internet, phone, fax, mail and face-to-face contact through
"call centers." By being
able to respond to the customer's voice commands, the call center can
find a solution to best fit the customer's need through one of these media options. Unisys
claims that intelligent call centers are able to represent the company well and
promote efficiency in communication across an organization. VoxGateway by Motorola
VoxGateway software (part of Motorola's Mobile
Internet Exchange) is a server product, which runs applications developed in the
VoxML programming language. This software is also licensed in an OEM version to
technology partners that create products, systems and applications enabling
voice access to the Internet. The VoxGateway software will support applications
written using the Unisys tools, Motorola's VoxML language and in the emerging
VoiceXML specification. VoxGateway and Natura
|
Copyright © 1999 Ira Greenberg and Andrew Bate. All Rights Reserved.
|