oiceXML Language
This chapter provides a brief introduction to basic VoiceXML concepts and constructs, and
describes IBM's...
Sample program for the desired language. To run the other samples, refer to their respective
documentation.

Note:
       ...
•    an 8KHz 8-bit mu-law .au file
                                                  •    an 8KHz 8-bit mu-law .wav file
 ...
<div>          Identifies the type of text forSupported as documented in the VoiceXML 1.0
               TTS output.      ...
<ibmlexicon> Element added by IBM to serve The syntax for this element is:
             as a container for one or more <ib...
(http://www.ietf.org/rfc/rfc2616.txt).
<noinput>    Catches a noinput event.        Supported as documented in the VoiceXM...
On the DirectTalk deployment platform:

                                             This element is supported, except tha...
•   phone
                                                 •   digits
                                                 •  ...
will be performed, and the "busy" and "noanswer"
                                            statuses may be returned. If ...
pronunciation
                      engine is having                  A pronunciation of the word. The
                   ...
<submit next="http://voiceserver.somewhere.com/servlet/submitrecording"
method="post" enctype="multipart/form-data"/>



F...
Subdialogs

Another type of form item is the <subdialog> element, which creates a separate execution
context to gather inf...
Menus can accept voice and/or DTMF input; you can specify the acceptable type(s) of
input using the construct <property na...
<goto nextitem="field1"/>




<submit next="/servlet/login"/>




<choice next="http://www.yourbank.example/locations.vxml...
<!--comment-->


Comments can also span multiple lines:




<!--start of multi-line comment
    more comments-->


Built-I...
date      Users can say a date using months, days, and years, as well as the words
          yesterday, today, and tomorro...
value attribute within a prompt, the TTS engine will speak the natural number. In
         the above example, the TTS engi...
Using Prerecorded Audio Files

Prerecorded audio files must be:

   •    an 8KHz 8-bit mu-law .au file
   •    an 8KHz 8-b...
In a deployment (telephony) environment, the VoiceXML browser supports persistent
caching for grammars, documents, and aud...
Preventing Caching

You can use the <meta> element to prevent caching of a VoiceXML document by
specifying http-equiv="exp...
prevents it from
                                jumping to the
                                next URI.
error.noauthoriz...
browser throws       message "Sorry, must exit due to
              this event when it   processing error" and then exits....
the user says       adhere to the guidelines for self-
                                something that is   revealing help,...
unconditionally       interpreter exits. The default event
                                   transferred to        handle...
<noinput count="2">
   <prompt>For example, to call Joe Smith in Dallas, say, "Joe Smith in
Dallas"</prompt>
</noinput>


...
and User to User Information
                                                         functionality are not available.

  ...
provide this type of information. Table 10 documents the VoiceXML browser's support for
shadow variables.

Table 10. Shado...
deployment environment,
                                   not the desktop
                                   development ...
Each menu or form field in a VoiceXML application must define a set of acceptable user
responses. The menu uses <choice> e...
At this point, the system has collected the two fields needed to complete the login, so it
executes the <filled> element, ...
A JSGF grammar file consists of a grammar header and a grammar body. The grammar
header declares the version of JSGF and t...
to indicate that the previous item may occur one or more times

Comments in Grammars

You can specify comments in the head...
An inline grammar that contains XML reserved terms or non-terminals must be
       enclosed within a PCDATA block.

DTMF G...
In this example, the first grammar (for drinks) consists of a single rule, specified inline. In
contrast, the second gramm...
an environment variable that contains the pathname of the installation directory, and where
<locale> is en_US for US Engli...
Disabling Active Grammars

You can temporarily disable active grammars, including the VoiceXML browser's
grammars for buil...
#JSGF V1.0;
grammar citystate;
public <cityandstate> =
   <city> {this.city=$} [<state> {this.state=$}] [please] |
   <sta...
Table 13. Completetimeout

Scenario                                          Outcome
The utterance is a complete and termi...
<vxml version="1.0">
  <form id="ChargeCallers">
  <block>
  <!-- Obtaining the phone number from which the user phoned. -...
•    Track responses to advertising. You can list different telephone numbers in different
        ads, and then track the...
•   a human operator
    •   another VoiceXML application

This release of the WebSphere Voice Server supports only "blind...
Upcoming SlideShare
Loading in...5
×

oiceXML Language.doc.doc

2,002

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
2,002
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
17
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

oiceXML Language.doc.doc

  1. 1. oiceXML Language This chapter provides a brief introduction to basic VoiceXML concepts and constructs, and describes IBM's implementation of version 1.0 of VoiceXML. For a complete description of the functionality of the language, refer to the VoiceXML 1.0 specification; the information in this chapter is NOT a substitute for thoroughly reading the VoiceXML 1.0 specification. You can access the VoiceXML 1.0 specification from the Windows Start menu by choosing Programs -> IBM WebSphere Voice Server SDK -> VoiceXML 1.0 Specification, or you can download a copy from the VoiceXML Forum Web site at http://www.voicexml.org. This chapter discusses the following topics: • Compatibility with the VoiceXML 1.0 Specification • VoiceXML Sample Applications • VoiceXML Overview • A Simple VoiceXML Example • Grammars • Timeout Properties • Telephony Functionality Compatibility with the VoiceXML 1.0 Specification The VoiceXML browser in the WebSphere Voice Server products supports a subset of the VoiceXML Version 1.0 specification, as documented in this chapter. Additionally, IBM has extended the VoiceXML 1.0 specification with a set of features and to enhance the usability and functionality of the WebSphere Voice Server products; these extensions are also documented in this chapter. VoiceXML Sample Applications The WebSphere Voice Server SDK includes the sample VoiceXML applications shown in Table 1. The samples are located in subdirectories of %IBMVS%/samples/<locale> (where %IBMVS% is an environment variable that contains the pathname of the installation directory, and where <locale> is en_US for US English). For French, German, Italian, Japanese, Simplified Chinese, Spanish, and UK English language versions, see the appropriate appendixes. You can run the Audio Sample from the Windows Start menu by choosing Programs -> IBM WebSphere Voice Server SDK -> Audio Sample, and then selecting the Audio
  2. 2. Sample program for the desired language. To run the other samples, refer to their respective documentation. Note: The GrammarBuilder sample requires that you have a version of the IBM DB2 Universal Database in the same language because the database and table names are different for some language versions of DB2. For example, the US English GrammarBuilder sample will fail if you try to use it to access the French EXEMPLE database because it is expecting the US English database called SAMPLE. You can download a fully-functional trial version of DB2 in the desired language from: http://www.ibm.com/software/data/db2/udb/downloads.html VoiceXML Overview VoiceXML is an XML-based application, meaning that it is defined as a set of XML tags. This section introduces the following VoiceXML concepts and constructs: • VoiceXML Elements and Attributes • Dialog Structure • Built-In Field Types and Grammars • VoiceXML DTD • Recorded Audio • Document Fetching and Caching • Events • Variables and Expressions Speech recognition grammars are a key component of VoiceXML application design; they are discussed separately in Grammars. The VoiceXML browser supports some telephony features for use in a deployment environment; they are discussed in Telephony Functionality. VoiceXML Elements and Attributes Table 4 lists the VoiceXML elements (including IBM extensions) and provides implementation details specific to the VoiceXML browser in the WebSphere Voice Server products. Table 4. Summary of VoiceXML Elements and Attributes Element Description Implementation Details <assign> Assigns a value to a variable. Supported as documented in the VoiceXML 1.0 specification. <audio> Plays an audio file within a Supported as documented in the VoiceXML 1.0 prompt. specification. The supported audio formats are:
  3. 3. • an 8KHz 8-bit mu-law .au file • an 8KHz 8-bit mu-law .wav file • an 8KHz 8-bit a-law .au file • an 8KHz 8-bit a-law .wav file • an 8KHz 16-bit linear .au file • an 8KHz 16-bit linear .wav file Note: Use an 8-bit format whenever possible; 16-bit linear files take twice as much storage and require twice as long to download. The Voice Server SDK includes JMF and uses it to record and play back audio. Voice Server for DirectTalk uses the DirectTalk system for audio playback. Voice Server uses the telephony H.323 stack for audio playback. <block> Specifies a form item Supported as documented in the VoiceXML 1.0 containing non-interactive specification. executable content. <break> Inserts a pause in TTS output. Supported as documented in the VoiceXML 1.0 specification. <catch> Catches an event. Supported as documented in the VoiceXML 1.0 specification. <choice> Specifies a menu item. Supported as documented in the VoiceXML 1.0 specification. <clear> Clears a variable. Supported as documented in the VoiceXML 1.0 specification. <disconnect> Causes the VoiceXML browser Supported as documented in the VoiceXML 1.0 to disconnect from a user specification. telephone session. Applicable only to a telephony deployment environment shown in Table 2. On the desktop development environment (SDK) it is simulated by throwing a telephone.disconnect.hangup event.
  4. 4. <div> Identifies the type of text forSupported as documented in the VoiceXML 1.0 TTS output. specification. <dtmf> Specifies a DTMF grammar. Supported types are "application/x-jsgf", "application/jsgf", "test/x-jsgf", and "text/jsgf" as defined in Appendix D of the VoiceXML 1.0 specification. <else> Conditional statement used Supported as documented in the VoiceXML 1.0 with the <if> element. specification. <elseif> Conditional statement used Supported as documented in the VoiceXML 1.0 with the <if> element. specification. <emp> Specifies the emphasis for TTS Supported as documented in the VoiceXML 1.0 output. specification. <enumerate> Shorthand construct that causes When using the <enumerate> element to play menu the VoiceXML browser to choices via the text-to-speech engine, you do not speak the text of each <choice> have to add punctuation to control the length of the element when presenting the pauses between <choice> elements; the VoiceXML list of menu selections to the browser will automatically add the appropriate user. pauses and intonations when speaking the prompts. <error> Catches an error event. Supported as documented in the VoiceXML 1.0 specification. <exit> Exits a VoiceXML browser As per the VoiceXML 1.0 specification, the <exit> session. element returns control to the interpreter, which determines what action to take. The namelist attribute is ignored. DirectTalk supports the use of the expr attribute to associate data with the call. <field> Defines an input field in a Supported as documented in the VoiceXML 1.0 form. specification. <filled> Specifies an action to execute Supported as documented in the VoiceXML 1.0 when a field is filled. specification. <form> Specifies a dialog for Supported as documented in the VoiceXML 1.0 presenting and gathering specification. The VoiceXML browser also supports information. mixed initiative dialogs using a mechanism based on ECMAScript Action Tags. See Mixed Initiative Application and Form-Level Grammars for details. <goto> Specifies a transition to another Supported as documented in the VoiceXML 1.0 dialog or document. specification. <grammar> Defines a speech recognition Supported types are "application/x-jsgf", grammar. "application/jsgf", "test/x-jsgf", and "text/jsgf" as defined in Appendix D of the VoiceXML 1.0 specification. <help> Catches a help event. Supported as documented in the VoiceXML 1.0 specification.
  5. 5. <ibmlexicon> Element added by IBM to serve The syntax for this element is: as a container for one or more <ibmlexicon> one or more <word> elements <word> elements. </ibmlexicon> These elements are generally used to change the text- to-speech and speech recognition pronunciations for an entire document or application. In contrast, the <sayas> element is generally used to change pronunciations on a per-prompt basis and is for text- to-speech only (not for speech recognition). Note: This element is not available in the DirectTalk deployment environment. <ibmvoice> Element added by IBM to This element takes two attributes, as shown in Figure allow application developers to 3: control the text-to-speech gender engine's synthesized voice on a Valid values are male (the default) or female per-prompt basis. age Valid values are child, middle_adult (the default), or older_adult Note: This element is not available in the DirectTalk deployment environment. <if> Defines a conditional Supported as documented in the VoiceXML 1.0 statement. specification. <initial> Prompts for form-wide Supported as documented in the VoiceXML 1.0 information in a mixed- specification. initiative form. <link> Specifies a transition to a new Supported as documented in the VoiceXML 1.0 document or throws an event, specification. when activated. <menu> Specifies a dialog for selecting Supported as documented in the VoiceXML 1.0 from of a list of choices. specification. <meta> Specifies meta data about the The http-equiv value supports the expires attribute document. and any content that comes with it as documented in the VoiceXML 1.0 specification. The VoiceXML browser ignores the name attribute and any content specified with it; you can use these attributes to identify and assign values to the properties of a document, as defined by the HTML 4.0 specification (http://www.w3.org/TR/REC- html40/) and the HTTP 1.1 specification
  6. 6. (http://www.ietf.org/rfc/rfc2616.txt). <noinput> Catches a noinput event. Supported as documented in the VoiceXML 1.0 specification. <nomatch> Catches a nomatch event. Supported as documented in the VoiceXML 1.0 specification. <object> Specifies platform-specific The WebSphere Voice Server products do not objects. provide any objects. If the VoiceXML browser encounters this element, it will throw an error.unsupported.object error event. <option> Specifies a field option. Supported as documented in the VoiceXML 1.0 specification. <param> Specifies a parameter in an Supported as documented in the VoiceXML 1.0 object or subdialog. specification. <prompt> Plays TTS or recorded audio Supported as documented in the VoiceXML 1.0 output. specification. <property> Controls various aspects of the The confidencelevel, inputmodes, sensitivity, and behavior of the implementation speedvsaccuracy properties are not implemented in platform. this release. All other properties are supported as documented in the VoiceXML 1.0 Specification On DirectTalk, fetchaudio is not supported. Note: The streaming setting of the fetchhint property is not supported. If you specify a value of stream, safe is used instead. <pros> Controls the prosody of TTS Supported as documented in the VoiceXML 1.0 output. specification except for the DirectTalk deployment environment. <record> Records spoken user input. The SDK and WebSphere Voice Server record the audio as an 8kHz .au file. The <grammar> element contained in the <record> element is supported. This allows the application developer to specify a recognition grammar to terminate recording. The user must pause briefly before and after speaking the termination phrase. The final silence attribute is not implemented in this release.
  7. 7. On the DirectTalk deployment platform: This element is supported, except that speech recognition is not possible at the same time as recording. Therefore, interrupt using speech is not possible, and the <grammar> element is not supported. The caller's voice can be recorded as audio and exported. The beep parameter determines whether a beep tone is played to the caller immediately before recording begins. The default is beep=true (a beep is played). The maxtime parameter defaults to 60 seconds. Other parameters are not user-definable. The type attribute may take the following values: audio/basic Creates a headerless recording of 8kHz, 8-bit u-law encoding. audio/a-alaw-basic Creates a headerless recording of 8kHz, 8-bit a-law encoding. audio/wav Creates a Microsoft wav file of 8kHz, 16-bit, linear PCM encoding. Only the duration and termchar shadow variables have been implemented for <record>. Size is not available due to the fact that recordings are held in the DirectTalk voice segment format. The finalsilence attribute is a function of the base DirectTalk system and cannot be set by a VoiceXML application. <reprompt> Causes the form interpretation Supported as documented in the VoiceXML 1.0 algorithm to queue and play a specification. prompt when entering a form after an event. <return> Returns from a subdialog. Supported as documented in the VoiceXML 1.0 specification. <sayas> Controls pronunciation of Supported values for the class attribute are digits and words or phrases in TTS literal. output. On DirectTalk: the following class attribute values are supported for all available languages:
  8. 8. • phone • digits • currency • number • literal <script> Specifies a block of Supported as documented in the VoiceXML 1.0 ECMAScript code. specification. <subdialog> Invokes a new dialog as a The modal attribute is not implemented. The modal subdialog of the current one, in attribute is not implemented because the VoiceXML a new execution context. Forum has clarified that this attribute was not intended to be included in the VoiceXML 1.0 specification. <submit> Submits a list of variables to IBM has extended this element to support the HTML the document server. enctype value of "multipart/form-data" for the enctype attribute. To submit the results of a <record> element, you must use this new attribute value, as shown in Figure 4. <throw> Throws an event. Supported as documented in the VoiceXML 1.0 specification. <transfer> Connects the telephone caller Supports blind call transfer without status and to a third party. without dialing rules. The transfer destination may only be specified as a sequence of digits. For Applicable only to a telephony example, to transfer to the local number 5553859, deployment environment you can specify 05553859. To transfer to the long- shown in Table 2. On the distance number 9545553859, you can specify desktop development 019545553859. environment (SDK), it is Note: simulated by throwing a The prefix 0 is used here. Your site may or telephone.disconnect.transfer may not need a prefix. Please contact your event. telephony infrastructure site administrator. The bridge, connecttimeout, and maxtime attributes are not implemented in this release. The <grammar> and <filled> elements are not supported within a transfer because they are not relevant without the bridge attribute. In the DirectTalk deployment environment, bridged transfer can be supported if an associated Computer Telephony Integration (CTI) product such as CallPath is being used. In this case, if no speech or DTMF grammars are specified, then, if bridge="true" is specified, a supervised transfer
  9. 9. will be performed, and the "busy" and "noanswer" statuses may be returned. If DTMF or speech grammars are specified, the dialog will resume when a grammar is matched. Note: The VoiceXML browser disables all speech recognition grammars during a <transfer> operation. <value> Embeds a variable in a prompt. The class, mode, and recsrc attributes are not implemented in this release. Local audio prompt lookups using the builtin: protocol, and playback of concatenated audio prompts via the <value mode="recorded"> element with the recsrc attribute are ignored by the VoiceXML browser. <var> Declares a variable. Refer to Table 8 for information on variable scope and Table 10 for information on shadow variables. <vxml> Top-level container for all All VoiceXML documents must specify other VoiceXML elements in a version="1.0"; if the version attribute is missing, document. the VoiceXML browser will throw an error.badfetch event. On DirectTalk: the lang attribute is supported. For more information, refer to the DirectTalk publication Application Developing using Java and VoiceXML. On all other platforms, the lang attribute is not implemented in this release. Instead, the VoiceXML browser uses the value of the vxml.locale Java system property specified when the browser was started. <word> Element added by IBM to The syntax for this element is: allow application developers to <word spelling="text" specify: pronunciation="IPA_representation"/> • A pronunciation or override, if the default text-to-speech <word pronunciation for a spelling="text" word is not acceptable, sounds-like="Soundslike_representation"/> or The attributes for this element: • An alternate pronunciation, if the spelling speech recognition A spelling of the word
  10. 10. pronunciation engine is having A pronunciation of the word. The difficulty recognizing a International Phonetic Alphabet (IPA) word representation must be used to specify pronunciations. For more information see: http://www2.arts.gla.ac.uk/IPA/ipachart.ht ml and http://charts.unicode.org/Unicode.charts/n ormal/ U0250.html sounds-like An alternate spelling that indicates how the word sounds. For example, the sounds-like spelling "I triple E" may be specified for a spelling of "IEEE." If you use the Compose Pronunciations feature of the Voice Toolkit, described in WebSphere Voice Toolkit, the Unicode decimal representation for the string "I triple E" (shown in a syntactically correct <word> tag) is: <word spelling="IEEE" pronunciation="aɪ ˈtrɪ.pəl i"/> The pronunciation and sounds-like attributes are mutually exclusive. Note: This element is not available on the DirectTalk deployment platform. Langauge-specific limitations and considerations are given in the French, German, Italian, Japanese, Simplified Chinese, and Spanish language appendixes. Examples Figure 3. <ibmvoice> example <ibmvoice gender="female" age="child"> This text will be read using a female child synthesized voice.</ibmvoice> Figure 4. <submit> example
  11. 11. <submit next="http://voiceserver.somewhere.com/servlet/submitrecording" method="post" enctype="multipart/form-data"/> Figure 5. <ibmlexicon> and <word> example <ibmlexicon> <word>spelling="IEEE" sounds-like="I triple E"/> <ibmlexicon> Figure 6. <transfer> examples 1. Transfer to 4 digits extension <transfer name="myphone" dest="04111" bridge="false"/> 2. Transfer to a local number <transfer name="myphone" dest="09454111" bridge="false"/> 3. Transfer to a long distance number <transfer name="myphone" dest="01845226411" bridge="false"/> Note: The prefix 0 is used in Figure 6. Your site may or may not need a prefix. Please contact your telephony infrastructure site administrator. Dialog Structure VoiceXML documents are composed primarily of top-level elements called dialogs. There are two types of dialogs defined in the language: <form> and <menu>. Forms and Form Items Forms allow the user to provide voice or DTMF input by responding to one or more <field> elements. Fields Each field can contain one or more <prompt> elements that guide the user to provide the desired input. You can use the count attribute to vary the prompt text based on the number of times that the prompt has been played. Fields can also specify a type attribute or a <grammar> or <dtmf> element to define the valid input values for the field, and any <catch> elements necessary to process the events that might occur. Fields may also contain <filled> elements, which specify code to execute when a value is assigned to a field. You can reset one or more form items using the <clear> element.
  12. 12. Subdialogs Another type of form item is the <subdialog> element, which creates a separate execution context to gather information and return it to the form. For more information, see Subdialogs. Blocks If your form requires prompts or computation that do not involve user input (for example, welcome information), you can use the <block> element. This element is also a container for the <submit> element, which specifies the next URI to visit after the user has completed all the fields in the form. You can also jump to another form item in the current form, another dialog in the current document, or another document using the <goto> element. Types of Forms There are two types of form dialogs: • Machine-directed forms -- simple forms where each field or other form item is executed once and in a sequential order, as directed by the system. • Mixed-initiative forms-- more robust forms in which the system or the user can direct the dialog. When coding mixed-initiative forms, you can use form-level grammars (<form scope="dialog">) to allow the user to fill in multiple fields from a single utterance, or document-level grammars (<form scope="document">) to allow the form's grammars to be active in any dialog in the same VoiceXML document; if the document is the application root document, then the form's grammars are active in any dialog in any document within the application. You can use the <initial> element to prompt for form-wide information in a mixed-initiative dialog, before the user is prompted on a field-by-field basis. For more information, see Mixed Initiative Application and Form-Level Grammars. Menus A menu is essentially a simplified form with a single field. Menus present the user with a list of choices, and associate with each choice a URI identifying a VoiceXML page or element to visit if the user selects that choice. The grammar for a menu is constructed dynamically from the menu entries, which you specify using the <choice> element or the shortcut <enumerate/> construction; you can use the <menu> element's <scope> attribute to control the scope of the grammar. Note: The <enumerate> element instructs the VoiceXML browser to speak the text of each menu <choice> element when presenting the list of available selections to the user. If you want more control over the exact wording of the prompts (such as the ability to add words between menu items or to hide active entries in your menu), simply leave off the <enumerate> tag.
  13. 13. Menus can accept voice and/or DTMF input; you can specify the acceptable type(s) of input using the construct <property name="inputmodes" value="mode">, where mode is "dtmf", "voice", or "dtmf voice" (the default). If desired, you can implicitly assign DTMF key sequences to menu choices based on their position in the list of choices by using the construct <menu dtmf="true">. Figure 7 shows an example of a menu that accepts voice and/or DTMF input. Figure 7. Voice and/or DTMF Combination <?xml version="1.0"?> <vxml version="1.0"> <menu> <prompt>Welcome to the main menu. For news, press 1 or say one, news, or current news: </prompt> <choice next="http://www.voicexml.org/news/main.vxml"> <grammar type="application/x-jsgf"> news | current news | one </grammar> <dtmf>1</dtmf> </choice> <prompt>For weather, press 2 or say two, weather, or current weather:</ prompt> <Submit user's phone number to servlet for processing> <choice next="http://www.voicexml.org/weather/main.vxml"> <grammar type="application/x-jsgf"> weather | current weather | two </grammar> <dtmf>2</dtmf> </choice> </menu>< </vxml> Flow Control When the VoiceXML browser starts, it uses the URI you specify to request an initial VoiceXML document. Based on the interaction between the application and the user, the VoiceXML browser may jump to another dialog in the same VoiceXML document, or fetch and process a new VoiceXML document. VoiceXML provides a number of ways managing flow control. For example <link event="help"> <link next="http://www.yourbank.example/locations.vxml/">
  14. 14. <goto nextitem="field1"/> <submit next="/servlet/login"/> <choice next="http://www.yourbank.example/locations.vxml"> <throw event="exit"> When a dialog does not specify a transition to another dialog, the VoiceXML browser exits and the session terminates. Subdialogs VoiceXML subdialogs are roughly the equivalent of function or method calls. A subdialog is an entire form that is executed, the result of which is used to provide one input field to another form. You can use subdialogs to provide a disambiguation or confirmation dialog (as described in Managing Errors), as well as to create reusable dialog components for data collection and other common tasks. Executing a <subdialog> element temporarily suspends the execution context of the calling dialog and starts a new execution context, passing in the parameters specified by the <param> element. When the subdialog exits, it uses a <return> element to resume the calling dialog's execution context. The calling dialog accesses the results of the subdialog through the variable defined by the <subdialog> element's name attribute. Comments Information within a comment is ignored by the VoiceXML browser. Single line comments in VoiceXML documents use the following syntax
  15. 15. <!--comment--> Comments can also span multiple lines: <!--start of multi-line comment more comments--> Built-In Field Types and Grammars The VoiceXML browser supports the complete set of built-in field types that are defined in the VoiceXML 1.0 specification. These built-in field types specify the built-in grammar to load (which determines valid spoken and DTMF input) and how the TTS engine will pronounce the value in a prompt, as documented in Table 5. Additional examples of the types of input you might specify when using the built-in field types are shown in Table 41. For information on writing your own, application-specific grammars, see Grammars. Table 5. Built-In Types for US English Type Implementation Details boolean Users can say affirmative such as yes, okay, sure, and true or negative responses such as no, false, and negative. Users can also provide DTMF input: 1 is yes, and 2 is no. The return value sent is a boolean true or false. If the field name is subsequently used in a value attribute within a prompt, the TTS engine will speak either yes or no. currency Users can say US currency values in dollars and cents from 0 to $999,999,999,999.99 including common constructs such as "a buck fifty" or "nine ninety nine." Users can also provide DTMF input using the numbers 0 through 9 and optionally the * key (to indicate a decimal point), and must terminate DTMF entry using the # key. The return value sent is a string in the format UUUdddddddddddd.cc, where UUU is a currency indicator or null; currently, the only supported currency type is USD (for US dollars). If the field name is subsequently used in a value attribute within a prompt, the TTS engine will speak the currency value. For example, the TTS engine would speak "a buck fifty as one dollar and fifty cents and "nine ninety nine" as nine ninety nine.
  16. 16. date Users can say a date using months, days, and years, as well as the words yesterday, today, and tomorrow. Common constructs such as "March 3rd, 2000" or "the third of March 2000" are supported. Users can also provide DTMF input in the form yyyymmdd. Note: The date grammar does not perform leap year calculations; February 29th is accepted as a valid date regardless of the year. If desired, your application or servlet can perform the required calculations. The return value sent is a string in the format yyyymmdd, with the VoiceXML browser returning a ? in any positions omitted in the spoken input. If the field name is subsequently used in a value attribute within a prompt, the TTS engine will speak the date. digits Users can say numeric integer values as individual digits (0 through 9). For example, a user could say 123456 as "1 2 3 4 5 6." Users can also provide DTMF input using the numbers 0 through 9, and must terminate DTMF entry using the # key. The return value sent is a string of one or more digits. If the field name is subsequently used in a value attribute within a prompt, the TTS engine will speak individual digits from 0 through 9. In the above example, the TTS engine would speak 123456 as 1 2 3 4 5 6. Note: Use this type instead of the number type if you require very high recognition accuracy for your numeric input. number Users can say natural numbers (that is, positive and negative integers and decimals) from 0 to 999,999,999,999 as well as the words point or dot (to indicate a decimal point) and minus or negative (to indicate a negative number). Numbers can be spoken individually or in groups. For example, a user could say 123456 as "one hundred twenty-three thousand four hundred and fifty-six" or as "one, twenty-three, four, fifty-six." Users can also provide DTMF input using the numbers 0 through 9 and optionally the * key (to indicate a decimal point), and must terminate DTMF entry using the # key. Note: Only positive numbers can be entered using DTMF. The return value sent is a string of one or more digits, 0 through 9, with a decimal point and a + or - sign as applicable. If the field name is subsequently used in a
  17. 17. value attribute within a prompt, the TTS engine will speak the natural number. In the above example, the TTS engine would speak 123456 as one hundred twenty- three thousand four hundred fifty-six. Note: Use this type to provide flexibility when collecting numbers; users can speak long numbers as natural numbers, single digits, or groups of digits. If you want to force the TTS engine to speak the number back as a string of digits, use the <sayas class="literal"> markup tag. phone Users can say a telephone number, including the optional word extension. Users can also provide DTMF input using the numbers 0 through 9 and optionally the * key (to represent the word "extension"), and must terminate DTMF entry using the # key. The return value sent is a string of digits without hyphens, and includes an x if an extension was specified. If the field name is subsequently used in a value attribute within a prompt, the TTS engine will speak the telephone number, including any extension. Note: For tips on minimizing recognition errors that are due to user pauses during input, see Using the Built-In Phone Grammar. time Users can say a time using hours and minutes in either 12- or 24-hour format, as well as the word now. Users can also provide DTMF input using the numbers 0 through 9. The return value sent is a string in the format hhmmx, where x is a for AM, p for PM, h for 24-hour format, or ? if unspecified or ambiguous; for DTMF input, the return value will always be h or ?, since there is no mechanism for specifying AM or PM. If the field name is subsequently used in a value attribute within a prompt, the TTS engine will speak the time. VoiceXML DTD The VoiceXML DTD is located on the distribution media. You can use it with a validating XML parser to perform syntax checking on your VoiceXML documents Recorded Audio VoiceXML supports the use of recorded audio files as output. The VoiceXML browser plays an audio file when the corresponding URI (<audio src="file">) is encountered in a VoiceXML document.
  18. 18. Using Prerecorded Audio Files Prerecorded audio files must be: • an 8KHz 8-bit mu-law .au file • an 8KHz 8-bit mu-law .wav file • an 8KHz 8-bit a-law .au file • an 8KHz 8-bit a-law .wav file • an 8KHz 16-bit linear .au file • an 8KHz 16-bit linear .wav file Note: 16-bit linear files will take twice as much storage and download time. To have better performance, 8-bit files is recommended. Recording Spoken User Input You can use the VoiceXML <record> element to capture spoken input; the recording ends either when the user presses any DTMF key, or when the time you specified in the maxtime attribute is exceeded. To allow a spoken command to terminate the recording, you can specify a <grammar> element within the <record> element; users must pause briefly before and after speaking an utterance from this grammar; all other grammars are turned off while recording is active. Playing and Storing Recorded User Input You can play the recorded input back to the user immediately (using a <value> element), or submit it to the server (using <submit next="URI" method="post" enctype="multipart/form-data"/>)to be saved as an audio file. Document Fetching and Caching The VoiceXML browser uses caching to improve performance when fetching documents and other resources (such as audio files, grammars, and scripts); see Fetching and Caching Resources for Improved Performance. Configuring Caching By default, caching is enabled and the files are stored on the local file system in the directory specified by the vxml.cache.dir Java system property. You can turn caching off by specifying the Java system property -Dvxml.cache=false when you start the VoiceXML browser. These Java system properties are described in Table 16. Note: In the DirectTalk deployment environment, caching parameters are factory-set.
  19. 19. In a deployment (telephony) environment, the VoiceXML browser supports persistent caching for grammars, documents, and audio clips; it stores these files locally to permit them to be shared between multiple browser instances. Refer to the documentation for the applicable deployment platform for implementation details. Controlling Fetch and Cache Behavior Table 6 presents the attributes for controlling document fetching and caching; these are available with various elements, including <audio>, <choice>, <goto>, <grammar>, <object>, <prompt>, <property>, <script>, <subdialog>, and <submit>. Table 6. Attributes for Document Fetching and Caching Attribute Description Implementation Details caching Specifies whether to Valid values are fast (the default) and safe. retrieve the content from cache or from the server. The fast caching policy is similar to that used by HTML browsers: if a requested document is unexpired in the cache or expired but not modified, the cached copy is used; if the requested document is expired and modified or is not in the cache, the document is fetched from the server. You can specify the safe caching policy to force a query to fetch the document from the server. fetchaudio Specifies the URI of an Supported as documented in the VoiceXML 1.0 audio file to play while specification. fetching a document. Fetchaudio can be used to play music, avoiding prolong silence while fetching a new VoiceXML document. It is less useful for playing advertisements since playback stops as soon as the document is retrieved. fetchhint Specifies when to retrieve Valid values are prefetch (fetch when the page is and process the content. loaded), safe (fetch when the content is needed), and stream (process as the content arrives). In this release, stream (process as the content arrives) is not implemented; if you specify stream, safe is used instead. fetchtimeout Specifies how long to wait Supported as documented in the VoiceXML 1.0 for the content before specification. throwing an error.badfetch event.
  20. 20. Preventing Caching You can use the <meta> element to prevent caching of a VoiceXML document by specifying http-equiv="expires" and content="0". To specify when the VoiceXML browser should fetch a fresh copy of the document, use the content attribute to specify the date and time. For example: http-equiv="expires" content="Tue, 20 Aug 2002 14:00:00 GMT" Events The VoiceXML browser throws an event when it encounters a <throw> element or when certain specified conditions occur; these may be normal error conditions (such as a recognition error) or abnormal error conditions (such as an invalid page reference). Events are caught by <catch> elements that can be specified within other VoiceXML elements in which the event can occur, or inherited from higher-level elements. Predefined Events The VoiceXML browser supports the complete set of predefined events documented in the VoiceXML 1.0 specification, as described in Table 7: Table 7. Predefined Events and Event Handlers for US English Event Description Implementation Details cancel The VoiceXML The default event handler stops the browser throws playback of the current audio this event when prompt. the user says a valid word or phrase from the Quiet/Cancel grammar. See Built-In Commands. error.badfetch The VoiceXML The default event handler plays the browser throws message "Sorry, must exit due to this event when it processing error" and then exits. detects a problem (such as an unknown audio type, an invalid URI, or the expiration of a fetchtimeout) that
  21. 21. prevents it from jumping to the next URI. error.noauthorization The VoiceXML The default event handler plays the browser throws message "Sorry, must exit due to this event when processing error" and then exits. the user attempts to perform an action for which he/she is not authorized, such as transferring to an unauthorized telephone number. error.semantic The VoiceXML The default event handler plays the browser throws message "Sorry, must exit due to this event when it processing error" and then exits. detects an error within the content of a VoiceXML document (such as a reference to a non-existent application root document or an ill- formed ECMAScript expression) or when a critical error (such as an error communicating with the speech recognition engine) occurs during operation. error.unsupported.element The VoiceXML For example, if a document contains browser throws a <transcribe> element, the this event when it VoiceXML browser will throw an encounters a error.unsupported.transcribe reference to an event. The default event handler unsupported plays the message "Sorry, must exit element. due to processing error" and then exits. error.unsupported.formatelement The VoiceXML The default event handler plays the
  22. 22. browser throws message "Sorry, must exit due to this event when it processing error" and then exits. encounters a reference to an unsupported resource format, such as an unsupported MIME type, grammar format, or audio file format. exitelement The VoiceXML The VoiceXML browser performs browser throws some cleanup and then exits. this event when it processes a <throw event="exit">. help The VoiceXML The default event handler plays the browser throws message "Sorry, no help available" this event when and then reprompts the user. For the user says a guidelines on choosing a scheme for valid word or your own help messages, see phrase from the Choosing Help Mode or Self- Help grammar. Revealing Help. See Built-In Commands. noinput The VoiceXML The default event handler reprompts browser throws the user; if you adhere to the this event when guidelines for self-revealing help, the timeout this event can use the same messages interval (as as the help event. See Implementing defined by a Self-Revealing Help for details. timeout attribute on the current prompt, the timeout property, or the vxml.timeout Java system property) is exceeded. nomatch The VoiceXML The default event handler plays the browser throws message, "Sorry, I didn't understand" this event when and then reprompts the user; if you
  23. 23. the user says adhere to the guidelines for self- something that is revealing help, this event can use the not in any of the same messages as the help event. active grammars. See Implementing Self-Revealing Help for details. If the user pauses after uttering a partial response, and the silence period exceeds the duration specified by the incompletetimeout property, the VoiceXML browser's response depends on whether the user's partial utterance matches a valid utterance from any active grammar: • If the partial user utterance is itself a valid utterance, the VoiceXML browser returns a result. For example, if the built-in phone grammar is active and the user pauses after uttering 7 digits of a 10- digit telephone number, the VoiceXML browser will return the 7-digit number. • If the partial user utterance is not itself a valid utterance (for example, if the user pauses in the middle of a word), the VoiceXML browser throws a nomatch event. telephone.disconnect.hangup The VoiceXML Supported as documented in the browser throws VoiceXML 1.0 specification. this event when the user hangs up The default event handler exits; the telephone or however, you can override this when a handler if application-specific <disconnect> cleanup is desired. element is encountered. telephone.disconnect.transfer The VoiceXML Supported as documented in the browser throws VoiceXML 1.0 specification. The this event when <transfer> event is thrown after the the user has been transfer is initiated and before the
  24. 24. unconditionally interpreter exits. The default event transferred to handler exits; however, you can another line (via a override this handler if application- <transfer> specific cleanup is desired. element) and will not return. Note: UK English uses the same default event handlers. For French, German, Italian, Japanese, Simplified Chinese, and Spanish language versions, see the appropriate appendixes. If desired, you can override the predefined event handlers by specifying your own <catch> element to handle the events. For example: <catch event="error.badfetch"> Caught bad fetch. <goto nextitem="field1"/> </catch> Application-Specific Events In addition, you can define application-specific events and event handlers. Error types should be of the form: error.packagename.errortype, where packagename follows the Java convention of starting with the company's reversed Internet domain name (for example, com.ibm.mypackage). Recurring Events If desired, you can use the <catch> element's count attribute to vary the message based on the number of times that an event has occurred. Note: This mechanism is used for self-revealing help and tapered prompts. For example: <prompt> Name and location?</prompt> <noinput count="1"> <prompt>Please state the name and location of the employee you wish to call.</prompt> </noinput>
  25. 25. <noinput count="2"> <prompt>For example, to call Joe Smith in Dallas, say, "Joe Smith in Dallas"</prompt> </noinput> Variables and Expressions You can specify an executable script within a <script> element or in an external file. Using ECMAScript The scripting language of VoiceXML is ECMAScript, an industry-standard programming language for performing computations in Web applications. For information on the ECMAScript, refer to the ECMAScript Language Specification, available at: http://www.ecma.ch/ecma1/stand/ECMA-262.htm. ECMAScript is also used for Action Tags in mixed-initiative forms; see Mixed Initiative Application and Form-Level Grammars for details. Declaring Variables You can declare variables using the <var> element or the name attribute of various form items. Table 8 lists the possible variable scopes: Table 8. Variable scope Scope Description Implementation Details anonymous Used by <block>, <filled>, and Supported as documented in the <catch> elements for variables VoiceXML 1.0 specification. declared in those elements. application Variables are visible to the Supported as documented in the application root document and any VoiceXML 1.0 specification. other loaded application documents that use that root document. dialog Variables are visible only within the Supported as documented in the current <form> or <menu> element. VoiceXML 1.0 specification. document Variables are visible from any dialog Supported as documented in the within the current document. VoiceXML 1.0 specification. session Variables that are declared and set The session variables that are by the interpreter context. supported are: session.telephone.ani (Automatic Number Identification) and session.telephone.dnis (Dialed Number Identification Service). II Digit (Information Indicator Digit),
  26. 26. and User to User Information functionality are not available. The VoiceXML browser sets the value of the standard session variables (as described in section 9.4 of the VoiceXML 1.0 specification) to undefined. To ensure that your application is portable, you should check if session variables are defined before attempting to use them: <if cond="session.x==undefined"> <exit> <else> ok to access the session variable <endif> Assigning and Referencing Variables After declaring a variable, you can assign it a value using the <assign> element, and you can reference variables in cond and expr attributes of various elements. Because VoiceXML is an XML-based application, you must use escape sequences to specify relational operators (such as greater than, less than, etc.). Table 9 lists the escape sequences. Table 9. Examples of Relational Operators Relational Operator Escape Sequence < &lt; > &gt; <= &le; >= &ge; & &amp; && &amp;&amp; Using Shadow Variables The result of executing a field item is stored in the field's name attribute; however, there may be occasions in which you need other types of information about the execution, such as the string of words recognized by the speech recognition engine. Shadow variables
  27. 27. provide this type of information. Table 10 documents the VoiceXML browser's support for shadow variables. Table 10. Shadow Variables Element Shadow Variable Description Implementation Details <field> x$.confidence The recognition confidence The VoiceXML browser level from the speech always returns a value of recognition engine. 1.0. x$.inputmode The input mode for a given Supported as documented in utterance: voice or DTMF. the VoiceXML 1.0 specification. x$.utterance A string that represents the The actual value of the actual words that were shadow variable is recognized by the speech dependent on how the recognition engine. grammar was written. <record> x$.duration The duration of the Supported as documented in recording. the VoiceXML 1.0 specification. x$.size The size of the recording, Supported as documented in in bytes. the VoiceXML 1.0 specification. x$.termchar The DTMF key used to Supported as documented in terminate the recording. the VoiceXML 1.0 specification. <transcribe> x$.confidence The confidence level of the Not implemented in this transcription. release. The VoiceXML browser always returns a value of 1.0. x$.termchar The DTMF key used to Supported as documented in terminate the transcription. the VoiceXML 1.0 specification. x$.utterance A string that represents the Not implemented in this actual words that were release. recognized by the speech recognition engine. <transfer> x$.duration The duration of a Supported as documented in successful call. the VoiceXML 1.0 specification. This variable is meaningful only in the telephony
  28. 28. deployment environment, not the desktop development environment. See Table 2. A Simple VoiceXML Example Figure 8 illustrates the basic dialog capabilities of VoiceXML, using a menu and a form: Figure 8. VoiceXML Example Using Menu and Form <?xml version="1.0"?> <vxml version="1.0"> <menu> <prompt>Welcome to Your Bank Online speech demo.</prompt> <prompt>Please choose<enumerate/></prompt> <choice next="http://www.yourbank.example/locations.vxml"> Branch Locations </choice> <choice next="http://www.yourbank.example/interest.vxml"> Interest Rates </choice> <choice next="#invest"> Investment Information </choice> </menu> <form id="invest"> <field name="investment_amount" type="number"> <prompt> How much would you like to invest? </prompt> </field> <field name="zip_code"> <dtmf src="builtin:dtmf/digits"/> <prompt>Use your telephone keypad to enter the five-digit ZIP code of your location, followed by the # key. </prompt> </field> <filled> <submit next="/servlet/invest"/> </filled> </form> </vxml>
  29. 29. Each menu or form field in a VoiceXML application must define a set of acceptable user responses. The menu uses <choice> elements to do this, while the <field> elements in the above example use the type attribute and the <dtmf> element to specify a built-in grammar (here, "number" and "digits"). You can also create your own, application-specific grammars. (See Grammars.) The resulting dialog can proceed in numerous ways, depending on the user's responses. Two possible interactions are described next. Static Content One sample interaction between the system and the user might sound like this: System: Welcome to Your Bank Online speech demo. Please choose: Branch Locations, Interest Rates, Account Information. User: Branch Locations. At this point, the system retrieves and interprets the VoiceXML document located at http:// www.yourbank.example/locations.vxml; this new document would specify the next segment of the interaction between the system and the user, namely a dialog to locate the nearest bank branch. This interaction visits the <menu> element, but not the <form> element, from the sample VoiceXML document. It illustrates a level of capability similar to that provided for visual applications by a set of static HTML hyperlinks. However, static linking of information is merely the basic function of the World Wide Web; the real power of the Web is in dynamic distributed services, which begins with forms. Dynamic Content This interaction visits both the <menu> and the <form> from the above VoiceXML example: System: Welcome to Your Bank Online speech demo. Please choose: Branch Office Services, Interest Rates, Account Information. User: Branch Office Services. System: What kind of service do you require? User: Safety deposit box. System: Use your telephone keypad to enter the first three digits of your ZIP code, followed by the # key. User: (presses telephone keys corresponding to ZIP code, and terminates data entry using the # key)
  30. 30. At this point, the system has collected the two fields needed to complete the login, so it executes the <filled> element, which contains a <submit> element that causes the collected information to be submitted to a remote server for processing. For example, the servlet might access a database (either locally, or on a remote server) to locate the branch office offering safety deposit boxes for rental in the user's ZIP code, after which a new VoiceXML document might present dynamically-generated data related to this particular user's request. Grammars A grammar is an enumeration, in compact form, of the set of utterances (words and phrases) that constitute the acceptable user response to a given prompt. The VoiceXML browser requires all speech and DTMF grammars to be specified using the Java Speech Grammar Format (JSGF), as described in Appendix D of the VoiceXML 1.0 specification. Additionally, the VoiceXML browser supports grammars that combine JSGF syntactic tagging together with ECMAScript Action Tags that specify semantic actions. This mechanism was proposed jointly by IBM and Sun(R) Microsystems and is outside of the scope of the VoiceXML language; for more information, refer to the document ECMAScript Action Tags for JSGF, located in the %IBMVS%docsspecs directory (where %IBMVS% is an environment variable that contains the pathname of the installation directory). When you write your application, you can use the built-in grammars and/or create one or more of your own; in either case, you must decide when each grammar should be active. The speech recognition engine uses only the active grammars to define what it listens for in the incoming speech. This section discusses the following topics: • Grammar Syntax • Static Grammars • Dynamic Grammars • Grammar Scope • Hierarchy of Active Grammars • Mixed Initiative Application and Form-Level Grammars Grammar Syntax A complete discussion of grammar syntax is beyond the scope of this document; please refer to the JSGF documentation at http://java.sun.com/products/java- media/speech/forDevelopers/JSGF/index.html. The information provided in this section is merely to acquaint you with the basics of writing JSGF grammars. Grammar Header
  31. 31. A JSGF grammar file consists of a grammar header and a grammar body. The grammar header declares the version of JSGF and the grammar name, and (optionally) any imported grammars or rules. The form of the grammar name can be either a "simple grammar name" (that is, grammarName) or a "full grammar name" (that is, packageName.simpleGrammarName). To import all public rules from another grammar, specify fullGrammarName.*; to import specific public rules from another grammar, use a "fully-qualified rule name" (that is, fullGrammarName.rulename). A simple grammar header might be: #JSGF V1.0; grammar citystate; Grammar Body The grammar body consists of one or more rules that define the valid set of utterances. The syntax for grammar rules is: public <rulename> = options; where: public is an optional declaration indicating that the rule can be used as an active rule by the speech recognition engine and can be referenced by rules in other grammars. rulename is a unique name identifying the grammar rule. options can be any combination of: text that the user can speak, another rule, and delimiters such as: | to separate alternatives [] to enclose optional words, phrases, or rules () to group words, phrases, or rules {} to associate an arbitrary tag with a word or phrase * to indicate that the previous item may occur zero or more times +
  32. 32. to indicate that the previous item may occur one or more times Comments in Grammars You can specify comments in the header or body of a grammar using the standard Java format: /* ignore this text */ or // ignore from here to the end of the line External and Inline Grammars You can specify grammars in an external file. For example: #JSGF V1.0; grammar employees; public <name>= Jonathan | Jon {Jonathan} | Larry | Susan | Melissa; To access an external grammar, you reference either the URI of the grammar file (to use all public rules) or the URI of the grammar file plus a terminating #rule (to use only the specified public rule). You can also specify grammar rules inline, by using VoiceXML's <grammar> element. For example: <grammar> Jonathan | Jon {Jonathan} | Larry | Susan | Melissa; </grammar> Note:
  33. 33. An inline grammar that contains XML reserved terms or non-terminals must be enclosed within a PCDATA block. DTMF Grammars The VoiceXML browser also uses JSGF as the format for DTMF grammars. For example, the following example defines an inline DTMF grammar that allows the user to make a selection by pressing the numbers 1 through 4, the asterisk, or the pound sign on the DTMF Simulator (during desktop testing) or on a telephone keypad (when the application is deployed): <dtmf type="application/x-jsgf"> 1 | 2 | 3 | 4 | "*" | "#" </dtmf> Static Grammars The following example illustrates the use of grammars in a hypothetical Web-based voice application for a restaurant: <?xml version="1.0"?> <vxml version="1.0"> <form> <field name="drink"> <prompt>What would you like to drink?</prompt> <grammar> coffee | tea | orange juice | milk | nothing </grammar> </field> <field name="sandwich"> <prompt>What type of sandwich would you like?</prompt> <grammar src="sandwich.gram"/> </field> <filled> <submit next="/servlet/order"/> </filled> </form> </vxml>
  34. 34. In this example, the first grammar (for drinks) consists of a single rule, specified inline. In contrast, the second grammar (for sandwiches) is contained in an external grammar file, shown below: #JSGF V1.0; grammar sandwich; <ingredient>= ham | turkey | swiss [cheese]; <bread>= rye | white | [whole] wheat; public <sandwich>=<ingredient> ([and] <ingredient>)* [on <bread>]; Here, the ingredient and bread rules are private, meaning that they can only be accessed by other rules in this grammar file or in other grammar files that have imported into this grammar file. The sandwich rule is public, meaning that it can be accessed by specifying this grammar file or by referencing the fully-qualified name of the sandwich rule. The public rule in this grammar allows the user to build a sandwich by specifying, in order: <ingredient> one item from the ingredient rule ([and] <ingredient>)* zero or more (denoted by the *) additional ingredients, optionally (denoted by the [...]) separated by the word "and" [on <bread>] optionally, the word "on" followed by a type of bread from the bread rule So, for example, a typical dialog might sound like this: System: What would you like to drink? User: Coffee. System: What type of sandwich would you like? User: Turkey and cheese on rye. At this point, the system has collected the two fields needed to complete the order, so it executes the <filled> element, which contains a <submit> that causes the collected information to be submitted to a remote server for processing. Dynamic Grammars In the restaurant example, the contents of the inline and the external grammars were static. However, it is possible to create grammars dynamically, such as by using information from a back-end database located on an enterprise server. You can use the same types of server- side logic to build dynamic VoiceXML as you would use to build dynamic HTML: CGI scripts, Java Beans, servlets, ASPs, JSPs, etc. For more information, refer to the Grammar Builder application located in the %IBMVS%/samples/<locale> (where %IBMVS% is
  35. 35. an environment variable that contains the pathname of the installation directory, and where <locale> is en_US for US English. For French, German, Italian, Japanese, Simplified Chinese, Spanish, and UK English language versions, see the appropriate appendixes.). Grammar Scope Form grammars can specify one of the scopes shown in Table 11. Table 11. Form Grammar Scope Scope Description Implementation Details dialog The grammar is accessible only within Supported as documented in the the current <form> or <menu> VoiceXML 1.0 specification. element. document The grammar is accessible from any Supported as documented in the dialog within the current document; if VoiceXML 1.0 specification. the document is the application root document, the grammar is accessible to You can improve performance by using all loaded application documents that a combination of fetching and caching use that root document. to prime the cache with all of the referenced grammar files before your application starts, as described in "Fetching and Caching Resources for Improved Performance". Field, link, and menu choice grammars cannot specify a scope: • Field grammars are accessible only within the specified field. • Link grammars are accessible only within the element containing the link. • Grammars in menu choices are accessible only within the specified menu choice. Hierarchy of Active Grammars When the VoiceXML browser receives a recognized word or phrase from the speech recognition engine, the browser searches the active grammars, in the following order, looking for a match: 1. Grammars for the current field, including grammars contained in links in that field. 2. Grammars for the current form, including grammars contained in links in that form. 3. Grammars contained in links in the current document, and grammars for menus and other forms in the current document that have document scope. 4. Grammars contained in links in the application root document, and grammars for menus and other forms in the application root document that have document scope. 5. Grammars for the VoiceXML browser itself (that is, built-in commands).
  36. 36. Disabling Active Grammars You can temporarily disable active grammars, including the VoiceXML browser's grammars for built-in commands, by using the modal attribute on various form items. When modal is set to true, all grammars are temporarily disabled except the grammar for the current form item. Resolving Ambiguities Ambiguities can arise when a user's utterance matches more than one of the following: • Phrases in one or more active grammars • Items in a menu or form • Links within a single document You should exercise care to avoid using the same word or phrase in multiple grammars that could be active concurrently; the VoiceXML browser resolves any ambiguities by using the first matching value. Understanding how disambiguation works is especially important when multiple phrases in concurrently active grammars contain the same key words. See Matching Partial Phrases for details. Mixed Initiative Application and Form-Level Grammars In a machine-directed application, the computer controls all interactions by sequentially executing each form item a single time. However, VoiceXML also supports mixed-initiative applications in which either the system or the user can direct the conversation. One or more grammars in a mixed initiative application may be active outside the scope of its own dialog; to achieve this, you can use the <link> element and code them as either form-level grammars (scope="dialog") or document-level grammars (scope="document") defined in the application root document. If the user utterance matches an active grammar outside of the current dialog, the application jumps to the dialog specified by the <link>. When you code a mixed initiative application, you may also use one or more <initial> elements to prompt for form-wide information, before the user is prompted on a field-by- field basis. Form-level grammars allow a greater flexibility and more natural responses than field-level grammars because the user can fill in the fields in the form in any order and can fill more than one field as a result of a single utterance. For example:
  37. 37. #JSGF V1.0; grammar citystate; public <cityandstate> = <city> {this.city=$} [<state> {this.state=$}] [please] | <state> {this.state=$} [<city> {this.state=$}] [please] |; <city> = Los Angeles | San Francisco | Yorktown Heights; <state> = California | New York; Timeout Properties The information in this section is intended as a clarification of the information in the VoiceXML 1.0 specification. Incompletetimeout The incompletetimeout property specifies how long to wait after a user has spoken a partial utterance in the scenarios shown in Table 12. The default is 1000 ms (one second). The maximum value allowable is 5000 ms. Table 12. Incompletetimeout Scenario Outcome The partial utterance is not a complete match When the incompletetimeout period expires, for any entry in any active grammar. the speech recognition engine will reject the partial utterance and return a nomatch event. The partial utterance is a complete match for When the incompletetimeout period expires, an entry in an active grammar; however, the the speech recognition engine will consider user could speak additional words and still the utterance complete and will return the match an entry in an active grammar. matching entry from the active grammar; if Note: more than one entry matches, the This requires incompletetimeout to disambiguation scheme described in Matching be greater than completetimeout. Partial Phrases applies. Note: An inappropriately long incompletetimeout value will slow down the dialog, while an incompletetimeout value that is too short may not give the user enough time to complete an utterance, especially one that tends to be spoken in segments with intervening pauses (such as a telephone number). Completetimeout The completetimeout property specifies how long to wait after the user has spoken an utterance in the scenario shown in Table 13. The default is 1000 ms (one second). The maximum value allowable is 5000 ms.
  38. 38. Table 13. Completetimeout Scenario Outcome The utterance is a complete and terminal When the completetimeout period expires, match for an entry in an active grammar; that the speech recognition engine will return the is, there are no additional words that the user matching entry from the active grammar; if could speak and still match an entry in an more than one entry matches, the active grammar. disambiguation scheme described in Resolving Ambiguities applies. Telephony Functionality The VoiceXML browser supports the following telephony features for use in a deployment environment: • Automatic Number Identification • Dialed Number Identification Service • Call Transfer Automatic Number Identification If your central office and telephone switches provide ANI information, your VoiceXML application can use the session.telephone.ani variable to access the caller's telephone number. You can use the ANI information to personalize the caller's experience by coding your VoiceXML application to: • Determine whether the caller is using the system for the first time • Greet the caller by name • Retrieve information about the caller • Charge and bill services • Log when various customers are most likely to call Note: If the ANI information is not received from the central office, the WebSphere Voice Server uses the IP address of the Voice Server as a default ANI value. Your application should check for this default value before attempting to use the value of the session.telephone.ani variable to identify the caller, and you should design your application so that it can function without the ANI information, if none is available. Figure 9 shows an example of a VoiceXML application for a pay-per-use Customer Support line. For each incoming call, the VoiceXML application uses the ANI information to identify which customer to charge for the call. Figure 9. ANI Example
  39. 39. <vxml version="1.0"> <form id="ChargeCallers"> <block> <!-- Obtaining the phone number from which the user phoned. --> <var name="ChargeCallToNum" expr="session.telephone.ani"/> <prompt> The phone number, <value expr="ChargeCallToNum"/>, will be charged for this call to our support line. </prompt> <!-- Submit user's phone number to servlet for processing --> <submit namelist="ChargeCallToNum" next="http://SupportLine.com/servlet/ ChargeUserForCall"/> <prompt> Now transferring to the support line. </prompt> </block> <transfer name="SupportLine" dest="15555551212" bridge="false"/> </form> </vxml> Dialed Number Identification Service If your central office and telephone switches provide DNIS information, your VoiceXML application can use the session.telephone.dnis variable to access the telephone number that the caller dialed to reach the application. Note: The WebSphere Voice Server platforms do not enforce any rules or validation for DNIS information; they just provide the raw data received from the central office to the VoiceXML applications. You can use the DNIS information to: • Distribute incoming calls based on the telephone number that the user dialed (that is, assign a different telephone number to each application). This is useful for voice portals. • Provide one telephone number, and distribute incoming calls among multiple copies of your VoiceXML application running on one or more Voice Servers. • Direct all incoming telephone calls to same top-level VoiceXML application, which routes the calls to the appropriate sub-application based on the DNIS number (either by transferring the call to another VoiceXML browser or to a human, or by pulling up a specific VoiceXML page). • Eliminate the first application decision point by assigning a separate telephone number to each choice on what had been your top-level menu. For example, rather than have a Main Menu of: System: Thank you for calling Our Company. Choose one: Sales Marketing Accounting • you could assign a separate telephone number to each department and then use the DNIS information to route the calls appropriately.
  40. 40. • Track responses to advertising. You can list different telephone numbers in different ads, and then track the response rates based on which number the user dialed. Note: DNIS numbers are site-dependent because your location determines which users need to dial a country code, area code, city code, long distance access code, etc. to reach your application. In the example shown in Figure 10, a company wants to track which users are calling in response to an advertisement on the Web site and which in response to a product catalog. The company used different telephone numbers in the two ads: (800) 555-9999 in the Web ad, and (800) 555-9990 in the print ad. The VoiceXML application uses the DNIS information to determine which ad the user is responding to and increments a counter on a server that is performing the data mining. Figure 10. DNIS Example <vxml version="1.0"> <var name="AddOneCaller" expr="1"/> <form id="RouteCallers"> <block> <!-- Obtain phone number the user dialed. --> <var name="NumDialed" expr="session.telephone.dnis"/> <if cond="NumDialed == 8005559999"> <goto next="#UserFromWebsite"/> <elseif cond="NumDialed == 8005559990"/> <goto next="#UserFromCatalog"/> </if> </block> </form> <form id="UserFromWebsite"> <block> <!--Add one to the count of WebUsers--> <submit namelist="AddOneCaller" next="http://SupportLine.com/servlet/TrackWebUsers"/> </block> </form> <form id="UserFromCatalog"> <block> <!-- Add one to the count of CatalogUsers --> <submit namelist="AddOneCaller" next="http://SupportLine.com/servlet/TrackCatalogUsers"/> </block> </form> </vxml> Call Transfer Typically, you would use the call transfer feature to transfer a call to:
  41. 41. • a human operator • another VoiceXML application This release of the WebSphere Voice Server supports only "blind transfer without dialing rules and without status." 2 The VoiceXML browser simply initiates the call transfer request and then disconnects from the call; the system cannot verify whether the telephone number for the call transfer has been correctly configured in the gateway. In addition, no information is returned on the status of the transfer, so there is no way to know whether the transfer was successful, the line was busy, the telephone number was invalid, or the call was dropped. Note: What the user will hear if the transfer is unsuccessful depends on your central office and your telephony configuration. Even a standard setup on the gateway may not provide consistent behavior from all central office service providers; depending on the information sent by the central office, a user might hear a busy signal, silence, or no answer. One interesting way that you might use call transfer is to route calls to different language versions of the same application, based on the caller's DTMF response to an introductory, prerecorded audio prompt. For example: System: For English, press 1. (Call will be transferred to the English application.) Pour français, appuyez sur 2. (Call will be transferred to the French application.) Note: You could not use TTS prompts or spoken input in this initial prompt, since these require the application to be running in the desired language version of the VoiceXML browser. The two potential drawbacks of this design are: • You will not know if the transfer was unsuccessful. • You lose the ANI and DNIS information from the original call. (The ANI information for the transferred call would be the IP address of the Voice Server that initiated the transfer, and the DNIS information would be the telephone number to which the call was transferred.) Footnotes: 2 The DirectTalk deployment environment provides additional call transfer capabilities. For more information, refer to the DirectTalk for AIX documentation. [ Top of Page | Previous Page | Next Page | Table of Contents | Index ]

×