A Glimpse of Voice Technology

By:
Vishad Garg
Momentum India Pvt. Ltd.
vishadg@momentum-tech.com
vishadgarg@gmail.com
91-9611077772
September 12, 2001

January, 2002

Momentum Confidential
1
Agenda
 Automated Voice Processing
 Voice Portal
 Voice XML
 Voice Portal at Work

January, 2002

Momentum Confidential
2
Definition
“Automated Voice Processing is the act of answering,
routing, and handling phone calls with a computer-based
system. The call processing system answers and
processes calls according to the needs of the caller and
the person and/or company being called.”

January, 2002

Momentum Confidential
3
Applications of Voice Processing
Interactive Voice Response (IVR)
Voice Mail
Automatic Call Distribution(ACD)
Audiotext
Predictive Dialer
Voice Portal

January, 2002

Momentum Confidential
4
Interactive Voice Response (IVR)
“IVR systems facilitate people-to-computer/database
communications.It automates the handling of calls by
interacting with one or more online databases.”
IVR system works on the premise of:
 Data Capture
 Information Delivery
 Computer Telephony Integration (CTI) Link

January, 2002

Momentum Confidential
5
Voice Mail
“Voice
mail
enhances
people-to-people
communication. Voice mail is an umbrella covering a
variety of automated voice processing features including
voice mailboxes for storing and forwarding messages,
voice menus for routing and responding to calls, recorded
announcements for selectively disseminating information,
and information access to databases.”

January, 2002

Momentum Confidential
6
Automatic Call Distribution(ACD)
“ACD facilitate distribution of incoming calls based upon
some algorithms to a group of people (agents) that can
answer the calls.It uses the facility of ANI and DNIS to
perform it.”

January, 2002

Momentum Confidential
7
Audio text
“Audio text is a service that allows callers to access

prerecorded information on a topic of interest to them. It
allows multiple callers to retrieve recorded
announcements containing information that would
otherwise have been given by a person. The information
retrieved is general and not specific to each caller .”

January, 2002

Momentum Confidential
8
Predictive Dialer
“Predictive Dialer facilitate launching of calls and monitor
their progress.Only connected calls are passed to
agents.”

January, 2002

Momentum Confidential
9
Agenda
 Automated Voice Processing
 Voice Portal
 Voice XML
 Voice Portal at Work

January, 2002

Momentum Confidential
10
Definition
“The convergence of the richness of the web and the
accessibility of the phone is forming a vast new network a voice portal, where internet content can be
accessed from any phone, anywhere, using human
voice”.
“Speech enabled access to web-based information”.

January, 2002

Momentum Confidential
11
Voice Portal vs. Web Portal
“Leverages the Internet for application development and
delivery.”
Phone instead of PC
VoiceXML instead of HTML
A voice browser instead of an ordinary web browser.

January, 2002

Momentum Confidential
12
Why bring the internet to voice applications?
Standard language enables portability.
High level domain-specific language simplifies

application development.
Can consolidate voice and web applications.
Cost of creating a speech-based portal platform
continues to decline.
Internet has raised public expectations, with people
growing used to having information at their fingertips
when they want it. Once people get accustomed to
immediate news, weather reports or stock quotes over
the Internet, the transition to the phone makes perfect
sense.
January, 2002

Momentum Confidential
13
Voice Portal Key Components





Automatic Speech Recognition(ASR)
Voice Browser
Text-To-Speech
VoiceXML

January, 2002

Momentum Confidential
14
Automatic Speech Recognition
Automatic Speech Recognition (ASR) is the
technology that allows a machine to understand
human speech.
 Takes human speech input, digitizes it, and converts
it into a machine-readable string of text.
 A component called a recognizer then manipulates
the text into a form that the recognizer uses to identify
what the speaker said.


January, 2002

Momentum Confidential
15
Voice Browser/Interpreter
Document-Server

 A document server processes request
from a client application, the voice XML
interpreter. The server produces VXML
document in reply, which is processed by
the voice XML interpreter.

VXML Browser

Implementation Platform

January, 2002

 VoiceXML interpreter is responsible for
detecting an incoming call, acquiring the
initial voice XML document and answering
the call.

Momentum Confidential
16
Text-To-Speech(TTS)
TTS converts text strings inputs to the spoken outputs
 TTS is increasingly being used to speak e-mail and
Web-based text to callers


January, 2002

Momentum Confidential
17
Agenda
 Automated Voice Processing
 Voice Portal
 Voice XML
 Voice Portal at Work

January, 2002

Momentum Confidential
18
What is VXML
Voice extensible markup Language
 A language for specifying voice/audio dialogs
 Voice dialogs use audio prompts and text- to- speech
(TTS) for output; touch- tone keys (DTMF) and
automatic speech recognition (ASR) for input.
 Main input/ output device (initially) is the phone.


January, 2002

Momentum Confidential
19
Goal of VXML
Bring full power of web development and content
delivery to voice response applications
 Shield authors from low level programming and
platform specific details.
 Enables Integration of Voice Services with data
services using Client Server paradigm
 Voice service is viewed as a sequence of interaction
dialog between a user and an implementation
platform.


January, 2002

Momentum Confidential
20
Scope of VXML







Output of Synthesized speech
Output of audio files
Recognition of spoken input
Recognition of DTMF input
Recording of spoken input
Telephony features such as call transfer and
disconnect

January, 2002

Momentum Confidential
21
VXML Concepts






Application
Dialog/Sub-dialog
Session
Grammar
Events

January, 2002

Momentum Confidential
22
Application


A set of Documents
sharing the same
application root document



Root document variable
and grammar available
when transitioning to other
document.

January, 2002

Root

D1

D2

D3

Momentum Confidential
23
Dialog/Sub-dialog
A dialog is an interaction with the user, means prompt
a menu and get some input
 Two kind of dialogs,‘Forms'and‘Menu’
 A sub-dialog is like a function call




Sub-dialog use for database query

January, 2002

Momentum Confidential
24
Session


A session begins when user starts to interact with a
voice XML interpreter, it continues as documents are
loaded and processed, and ends when requested by
the user.

January, 2002

Momentum Confidential
25
Grammar
A grammar is a set of phrases that a caller is
expected to say during a dialog in response to a
particular prompt.
 A grammar can be as simple as “yes” versus “no” as
large as a list of all the names of people living in a
city.
 A grammar file is a text file and it has the file
extension .grammar


January, 2002

Momentum Confidential
26
Events
VXML defines a mechanism for handling events not
covered by the form mechanism
 Events are thrown by the platform under variety of
circumstances, user does not respond, response not
recognize, help etc
 Events are caught by catch elements.


January, 2002

Momentum Confidential
27
Agenda
 Automated Voice Processing
 Voice Portal
 Voice XML
 Voice Portal at Work

January, 2002

Momentum Confidential
28
Momentum Voice Portal Development
Services
Momentum provides voice portal development services
using the latest and preeminent speech-recognition and
text-to-speech technology including Nuance, Speechworks
and Fonix.

January, 2002

Momentum Confidential
29
How We Do It?

Requirement
Analysis

January, 2002

Prototype

VUI Design

Application
Development

Testing

Deployment

Momentum Confidential
30
Momentum Travel Voice Portal
Momentum has developed a Voice Portal Demo
application, Momentum Travel Voice Portal
(MTVP). The MTVP provides a user interface
through voice to give functionalities for
purchasing and reserving travel packages.

January, 2002

Momentum Confidential
31
Nuance in MTVP
Momentum is using complete suite of Nuance voice
technology, which includes-

Nuance 7.0.3 for voice recognition, call control and
recording of prompt.

V-Builder for developing voice-user interface (VUI) that
defines flow of interaction.

Grammar-Builder to write grammars that represents
valid responses.

Nuance Speech Objects - Speech Objects are a set of
reusable components implemented as Java beans.VXML is
used as a development language for VUI.
January, 2002

Momentum Confidential
32
Demo
 
To try the MTVP demo, dial any of the following phone
number in US:
(800) 303-9987
(415) 869-6909
When the system asks you to enter a pin, you can dial
one of the following PINS: 823272/ 823273/823274

January, 2002

Momentum Confidential
33
Future Plans

We are also planning to embark upon voice driven Ecommerce applications, i.e. V-Commerce, Voice
Enabled Intranet and Unified Messaging.

January, 2002

Momentum Confidential
34

A glimpse of voice technology

  • 1.
    A Glimpse ofVoice Technology By: Vishad Garg Momentum India Pvt. Ltd. vishadg@momentum-tech.com vishadgarg@gmail.com 91-9611077772 September 12, 2001 January, 2002 Momentum Confidential 1
  • 2.
    Agenda  Automated VoiceProcessing  Voice Portal  Voice XML  Voice Portal at Work January, 2002 Momentum Confidential 2
  • 3.
    Definition “Automated Voice Processingis the act of answering, routing, and handling phone calls with a computer-based system. The call processing system answers and processes calls according to the needs of the caller and the person and/or company being called.” January, 2002 Momentum Confidential 3
  • 4.
    Applications of VoiceProcessing Interactive Voice Response (IVR) Voice Mail Automatic Call Distribution(ACD) Audiotext Predictive Dialer Voice Portal January, 2002 Momentum Confidential 4
  • 5.
    Interactive Voice Response(IVR) “IVR systems facilitate people-to-computer/database communications.It automates the handling of calls by interacting with one or more online databases.” IVR system works on the premise of:  Data Capture  Information Delivery  Computer Telephony Integration (CTI) Link January, 2002 Momentum Confidential 5
  • 6.
    Voice Mail “Voice mail enhances people-to-people communication. Voicemail is an umbrella covering a variety of automated voice processing features including voice mailboxes for storing and forwarding messages, voice menus for routing and responding to calls, recorded announcements for selectively disseminating information, and information access to databases.” January, 2002 Momentum Confidential 6
  • 7.
    Automatic Call Distribution(ACD) “ACDfacilitate distribution of incoming calls based upon some algorithms to a group of people (agents) that can answer the calls.It uses the facility of ANI and DNIS to perform it.” January, 2002 Momentum Confidential 7
  • 8.
    Audio text “Audio textis a service that allows callers to access prerecorded information on a topic of interest to them. It allows multiple callers to retrieve recorded announcements containing information that would otherwise have been given by a person. The information retrieved is general and not specific to each caller .” January, 2002 Momentum Confidential 8
  • 9.
    Predictive Dialer “Predictive Dialerfacilitate launching of calls and monitor their progress.Only connected calls are passed to agents.” January, 2002 Momentum Confidential 9
  • 10.
    Agenda  Automated VoiceProcessing  Voice Portal  Voice XML  Voice Portal at Work January, 2002 Momentum Confidential 10
  • 11.
    Definition “The convergence ofthe richness of the web and the accessibility of the phone is forming a vast new network a voice portal, where internet content can be accessed from any phone, anywhere, using human voice”. “Speech enabled access to web-based information”. January, 2002 Momentum Confidential 11
  • 12.
    Voice Portal vs.Web Portal “Leverages the Internet for application development and delivery.” Phone instead of PC VoiceXML instead of HTML A voice browser instead of an ordinary web browser. January, 2002 Momentum Confidential 12
  • 13.
    Why bring theinternet to voice applications? Standard language enables portability. High level domain-specific language simplifies application development. Can consolidate voice and web applications. Cost of creating a speech-based portal platform continues to decline. Internet has raised public expectations, with people growing used to having information at their fingertips when they want it. Once people get accustomed to immediate news, weather reports or stock quotes over the Internet, the transition to the phone makes perfect sense. January, 2002 Momentum Confidential 13
  • 14.
    Voice Portal KeyComponents     Automatic Speech Recognition(ASR) Voice Browser Text-To-Speech VoiceXML January, 2002 Momentum Confidential 14
  • 15.
    Automatic Speech Recognition AutomaticSpeech Recognition (ASR) is the technology that allows a machine to understand human speech.  Takes human speech input, digitizes it, and converts it into a machine-readable string of text.  A component called a recognizer then manipulates the text into a form that the recognizer uses to identify what the speaker said.  January, 2002 Momentum Confidential 15
  • 16.
    Voice Browser/Interpreter Document-Server  Adocument server processes request from a client application, the voice XML interpreter. The server produces VXML document in reply, which is processed by the voice XML interpreter. VXML Browser Implementation Platform January, 2002  VoiceXML interpreter is responsible for detecting an incoming call, acquiring the initial voice XML document and answering the call. Momentum Confidential 16
  • 17.
    Text-To-Speech(TTS) TTS converts textstrings inputs to the spoken outputs  TTS is increasingly being used to speak e-mail and Web-based text to callers  January, 2002 Momentum Confidential 17
  • 18.
    Agenda  Automated VoiceProcessing  Voice Portal  Voice XML  Voice Portal at Work January, 2002 Momentum Confidential 18
  • 19.
    What is VXML Voiceextensible markup Language  A language for specifying voice/audio dialogs  Voice dialogs use audio prompts and text- to- speech (TTS) for output; touch- tone keys (DTMF) and automatic speech recognition (ASR) for input.  Main input/ output device (initially) is the phone.  January, 2002 Momentum Confidential 19
  • 20.
    Goal of VXML Bringfull power of web development and content delivery to voice response applications  Shield authors from low level programming and platform specific details.  Enables Integration of Voice Services with data services using Client Server paradigm  Voice service is viewed as a sequence of interaction dialog between a user and an implementation platform.  January, 2002 Momentum Confidential 20
  • 21.
    Scope of VXML       Outputof Synthesized speech Output of audio files Recognition of spoken input Recognition of DTMF input Recording of spoken input Telephony features such as call transfer and disconnect January, 2002 Momentum Confidential 21
  • 22.
  • 23.
    Application  A set ofDocuments sharing the same application root document  Root document variable and grammar available when transitioning to other document. January, 2002 Root D1 D2 D3 Momentum Confidential 23
  • 24.
    Dialog/Sub-dialog A dialog isan interaction with the user, means prompt a menu and get some input  Two kind of dialogs,‘Forms'and‘Menu’  A sub-dialog is like a function call   Sub-dialog use for database query January, 2002 Momentum Confidential 24
  • 25.
    Session  A session beginswhen user starts to interact with a voice XML interpreter, it continues as documents are loaded and processed, and ends when requested by the user. January, 2002 Momentum Confidential 25
  • 26.
    Grammar A grammar isa set of phrases that a caller is expected to say during a dialog in response to a particular prompt.  A grammar can be as simple as “yes” versus “no” as large as a list of all the names of people living in a city.  A grammar file is a text file and it has the file extension .grammar  January, 2002 Momentum Confidential 26
  • 27.
    Events VXML defines amechanism for handling events not covered by the form mechanism  Events are thrown by the platform under variety of circumstances, user does not respond, response not recognize, help etc  Events are caught by catch elements.  January, 2002 Momentum Confidential 27
  • 28.
    Agenda  Automated VoiceProcessing  Voice Portal  Voice XML  Voice Portal at Work January, 2002 Momentum Confidential 28
  • 29.
    Momentum Voice PortalDevelopment Services Momentum provides voice portal development services using the latest and preeminent speech-recognition and text-to-speech technology including Nuance, Speechworks and Fonix. January, 2002 Momentum Confidential 29
  • 30.
    How We DoIt? Requirement Analysis January, 2002 Prototype VUI Design Application Development Testing Deployment Momentum Confidential 30
  • 31.
    Momentum Travel VoicePortal Momentum has developed a Voice Portal Demo application, Momentum Travel Voice Portal (MTVP). The MTVP provides a user interface through voice to give functionalities for purchasing and reserving travel packages. January, 2002 Momentum Confidential 31
  • 32.
    Nuance in MTVP Momentumis using complete suite of Nuance voice technology, which includes- Nuance 7.0.3 for voice recognition, call control and recording of prompt. V-Builder for developing voice-user interface (VUI) that defines flow of interaction. Grammar-Builder to write grammars that represents valid responses. Nuance Speech Objects - Speech Objects are a set of reusable components implemented as Java beans.VXML is used as a development language for VUI. January, 2002 Momentum Confidential 32
  • 33.
    Demo   To try theMTVP demo, dial any of the following phone number in US: (800) 303-9987 (415) 869-6909 When the system asks you to enter a pin, you can dial one of the following PINS: 823272/ 823273/823274 January, 2002 Momentum Confidential 33
  • 34.
    Future Plans We arealso planning to embark upon voice driven Ecommerce applications, i.e. V-Commerce, Voice Enabled Intranet and Unified Messaging. January, 2002 Momentum Confidential 34