Text-to-Speech Technology
Introduction: The Value of Text-to-Speech Technology

Speech synthesis, or text-to-speech (TTS) processing, refers to the conversion of on-line text to spoken words. You type in your stand-up comedy routine and the friendly computer reads it back, verbatim. It simply converts ASCII text into synthetic speech and plays it back to the listener. The best systems employ an expert system or knowledge base that helps the computer pronounce words, abbreviations, acronyms, and symbols that don't comply with phonetic rules. The technology can be used as an educational tool, to assist those who are mentally or physically impaired or to help in overcoming language barriers.
With the integration of computer telephony and new desktop applications that include telephone access, remote users can easily retrieve textual information over the telephone (both wireless and wireline) from their PCs or workstations. The majority of today’s networked PCs and workstations are voice-enabled, offering desktop applications that include telephone access. Users are beginning to expect software developers to include text-to-speech technology in their applications, providing a simple means to retrieve textual information over the phone from any remote location.
History
Text to speech was started in 1980 and was led by Professor Munoz. Two years earlier, in 1978, a prototype for a time domain synthesis with limited vocabulary had been applied to a Spanish talking calculator.
Text-to-speech conversion involves both textual analysis and speech synthesis in real time. Although text-to-speech has been commercially available for over 12 years, few developers could afford to include it in their applications because these tasks once required expensive, specialized hardware that drove up the cost of applications and discouraged widespread use of the technology.
Today, the cost of computer-based telephony hardware has declined significantly and extensive CPU power is readily available, allowing applications to be implemented in a more cost-effective fashion. Furthermore, the availability of industry-standard speech APIs and innovative platforms. In short, the previous barriers to the deployment of text-to-speech have disappeared, and this powerful technology is poised to explode in numerous markets.
Current Applications for Text-to-Speech Technology
While future applications are limitless, current advancements do not fully meet the requirements for these technologies. The following are the current applications of the text-to-speech.
Interactive Language Learning Software
When learning a language, it is extremely important to progress at one’s own pace. The capability to speed up and slow down the speech, to pause and resume, and to have access to the ability to hear correct pronunciation is extremely important.
Speech over the Internet
Recently, multimedia firms have shown considerable interest in sending speech over the Internet. A talking page on the World Wide Web is a powerful and very effective way to present messages and views.
For anyone who wishes to incorporate text-to-speech into a talking home page, intelligibility and pronunciation accuracy are key requirements. Voice variety and foreign languages may also be useful to achieve different effects.
Electronic Mail and Fax Reading for Unified Messaging Applications
If the grammar is perfectly constructed, most text-to-speech engines can read the text accurately. Many information services are becoming available, and callers want access to them. These on-line services commonly provide access to forums, databases and e-mail that is stored in text format. Although these systems are usually accessed from a data terminal, text-to-speech technology offers alternative access by voice.
Games and Entertainment
For games and entertainment applications, the availability of a variety of different synthetic voices is a primary requirement. Presenting an end user with a choice of pronouncing a string of text in a male, female, child, or cartoon character voice can significantly increase the entertainment value.
In this type of software, text-to-speech is offered purely for fun and enjoyment, not for information transfer. Here, the actual intelligibility or pronunciation accuracy of the text-to-speech is not as crucial as the number of user-selectable voices offered. It is expected that the end user be able to adjust the pitch rate, volume, and quality of any voice through a simple, yet powerful graphical user interface. In addition to tuning voices, game users may wish to customize pronunciation of certain words through a user-definable dictionary.
Conclusion
The widespread availability of powerful and inexpensive hardware for servers and desktop PC’s has allowed text-to-speech applications to become more complex and demanding. The suitability of a particular text-to-speech product to an application depends on more than just the sound of the voice. Ease of integration into an application and the availability of a software development kit are equally as important as voice quality and intelligibility.

Technology and application vendors need to be aware of the individual requirements of their target markets. Unlike ten years ago, text-to-speech product features must now be customized for each market -one size no longer fits all.
Related sites and writings are as follows:
Products
References