SlideShare a Scribd company logo
1 of 103
Introduction to VoiceXML
and Voice Web Architecture
             Ken Rehor




         © 2007 Ken Rehor. All Rights Reserved.   1
Session Overview
• Voice Web Architecture
   – Components of a Voice Web Application
• Voice Standards
   – W3C Speech Interface Framework
• VoiceXML
   – Language features
   – Execution model - Form Interpretation Algorithm (FIA)
• Application Design Techniques
   – Static vs. dynamic VoiceXML
   – Performance Considerations
• CCXML, VoiceXML and VoIP
• Application Deployment Models
• New Technologies
   – Speaker Biometrics, Video, Multimodal, VoiceXML 3.0




                              © 2007 Ken Rehor. All Rights Reserved.   2
Simplifying Voice Services programming
• Web-based architecture for interactive speech services
  – Exploit web technologies to simplify voice service creation and deployment
  – Enable consolidation of voice and web services
  – Separate service logic from user interaction


• High-level programming languages
  – Control speech and telephony resources in uniform manner
  – Shield application programmers from implementation details
     • No need to know ASR, TTS, telephony APIs
  – Create portable applications
     • Run on enterprise system or in telephone network
     • Run on a variety of platforms, ASR agnostic




                                © 2007 Ken Rehor. All Rights Reserved.           3
Voice Web Application Architecture




            © 2007 Ken Rehor. All Rights Reserved.   4
Key Ideas


• Standard/Common high-level language
  – Designed for the task
• Leverage open, known technology
  – Web protocols, servers, networks, development tools, expertise
• Phone number mapped to URL
  – Phone number associated with URL of voice service




                            © 2007 Ken Rehor. All Rights Reserved.   5
Voice / Web Application Architecture

                                                                      <grxml>
            PSTN or                                                             .wav
            VoIP
                                                             <vxml>              • Grammars
                                                                                 • Audio files
Any phone                                                                        • Scripts
                           VoiceXML             HTTP

                            browser
                                                                                   HTTP
                                                       Internet or


                                           HTTP
                                                        Intranet
                      <html>                                                            Application
                                                                                       (web) server
                                                                                   • Application logic
                                                                                   • Content and data
                                                                                   • Transaction processing
                                                                                   • Database interface
                        • Images
                        • Audio files
    Web                 • Scripts
  Browser


                             © 2007 Ken Rehor. All Rights Reserved.                                     6
Voice Application Architecture and Components


                                                                                   <grxml>
                             Welcome to
         Customer           Acme products                                                    .wav
          service,               …
          please…                                                             <vxml>


Caller                                                                                   HTTP

                                             VoiceXML
                     PSTN                     platform
                                                                          Internet or
                                                                           intranet                  Web
                                                                                                    server
                                       VoiceXML
                                       interpreter
                                                                 OA&M
                                      middleware
                                                     Telephony
                                    DTMF
                                    Audio
                                    ASR
                                     TTS




                                     © 2007 Ken Rehor. All Rights Reserved.                                  7
Application Backend Architecture

               • Grammars
               • Audio files
               • Scripts
     <vxml>


                                                                                      Transaction
                                                                                        Server
                           HTTP
Internet or                                                  Intranet or
  Intranet                                                    Internet


                                Application
                               (web) server
                           • Application logic
                           • Content and data
                           • Transaction processing
                           • Database interface                                         Database
                                                                                        (content)




                                                                             Web
                                                                            service
                                       © 2007 Ken Rehor. All Rights Reserved.                       8
Components of a Voice Solution
• Traditional phone, VoIP phone, mobile phone, or multimodal device
• Telephone network
    – Circuit-switched PSTN or packet-switched VoIP
    – Connects caller’s telephone with Telephony Server

• Voice User Interface
    – Dialog structure / flow
    – Prompts – what the application says to the user
    – Speech grammars – what the user can say

• Application logic that executes on an application server
    – Web "back-end“
    – Database, or database interface

• VoiceXML Server that executes dialogs
    – Controls resources such as ASR, SIV, TTS, etc

• Data network to connect application server and VoiceXML server
                                   © 2007 Ken Rehor. All Rights Reserved.   9
Inbound or Outbound calls

• VoiceXML application works the same for inbound and
  outbound calls
   – Additional call progress detection generally required for outbound

• Simple protocol for initiating outbound calls
   – No firm standards, but most vendors follow similar techniques
   – HTTP, Web Services, etc.




                               © 2007 Ken Rehor. All Rights Reserved.     10
Standards




© 2007 Ken Rehor. All Rights Reserved.   11
Value of Open Standards

• Non-proprietary interfaces between components

• Allow choice of best components for the task

• User interface languages
  – W3C Speech Interface Framework: VoiceXML, SRGS, SSML, SI
  – W3C: HTML, XHTML, SMIL, X+V
  – OMA: WAP


• Communication protocols
  –   W3C: CCXML for 3rd-party telephony call control
  –   W3C: HTTP, HTTPS, SOAP, WSDL
  –   IETF: SIP, MRCP, MSCP
  –   3GPP: IMS
  –   ITU: T1, ISDN

                                © 2007 Ken Rehor. All Rights Reserved.   12
Visual vs. Voice markup

Web app UI                                    Voice Web app UI
• HTML – Structure                            •     VoiceXML – Structure
     – Layout                                         – Dialog flow
     – Input declaration                              – Input declaration
     – Transitions                                    – Transitions
•   Images                                    •     Audio files
•   Audio files / streams                     •     Video, Images
•   Video                                     •     Text (for TTS)
•   Text                                      •     Scripts
•   Scripts




                            © 2007 Ken Rehor. All Rights Reserved.          13
Protocols
Web applications                     Voice Web applications
•   HTTP, HTTPS                      •     HTTP, HTTPS
•   RTP                              •     RTP
•   SOAP                             •     SOAP
•   WSDL                             •     WSDL
•   …                                •     SIP
                                     •     …




                   © 2007 Ken Rehor. All Rights Reserved.     14
Voice Standards Activities


• Speech Interface Framework

• Network protocols
   – SIP, MRCP v2, etc.

• Platform Certification, Developer Certification,
  Speaker Biometrics, Architecture, Tools




                          © 2007 Ken Rehor. All Rights Reserved.   15
Voice Application Standards
                                                                    CCXML                      VoiceXML
                               SIP Netann                          Call Control                Application
                               MSCML                               Application         SOAP
                               MOML / MSML
                               MSCP                                        Scripts
                               DMSP                                    CCXML                     VXML             GRXML
                               MGCP
                               etc.                                        HTTP                         HTTP
                                                                           HTTPS
                                                                                                                  Scripts
                                                                                                        HTTPS

                                                     Media                                                        Audio
                                                     Control
                                                     Interface       CCXML                                        SSML
                                       Conference/                   Browser
                                         Media
                                         Server


                                                           Telephony               Dialog
                                                           Control                 Control
                                                     SIP   Interface               Interface
                                         VoIP                                                   VoiceXML                    DTMF       GRXML
                Phone
                                        Gateway                                                  Browser
                Networ                               RFC 2833
                  k          T1 / E1                                   Media                                                           G.711, WAV,
                             ISDN                                                               VoiceXML 2.0                Audio
                                                                       Mixer /                                                         .au, mp3, etc.
                             SS7                     RTP                                        VoiceXML 2.1
Caller                                                                 Server                   ECMAScript 262

                                                                                                         MRCP Client

         Telephony Control Interface: SIP, etc.                                                                                        MRCP v1
         Dialog Control Interface: SIP, MSCP, etc.                                                              MRCP                   MRCP v2


                                                                                               Server           Server        Server

                                                                                               TTS              ASR            SIV

                                                      © 2007 Ken Rehor. All Rights Reserved.   SSML          GRXML           ** standards in progress **
                                                                                                                                                    16
W3C Speech Interface Framework




          © 2007 Ken Rehor. All Rights Reserved.   17
Voice Application Components

• Dialog – flow control of the inputs, outputs, next steps

• Input grammars
   – Control input constraints for DTMF and speech recognition


• Output formatting
   – Pronunciation, timing, sequencing




                           © 2007 Ken Rehor. All Rights Reserved.   18
W3C Speech Interface Framework
•   VoiceXML
•   SRGS
•   SSML
•   Semantic Interpretation
•   Pronunciation Lexicon
•   Call Control




For more information, see:
W3C Voice Browser Working Group         http://www.w3.org/Voice/

                         © 2007 Ken Rehor. All Rights Reserved.    19
Voice User Interface - Dialog
• W3C VoiceXML 2.0
  – W3C Recommendation March 2004
  – Widely implemented
     • Approximately 4 dozen platforms
     • Many service providers worldwide
  – VoiceXML Forum certification program
     • Nearly two dozen certified platforms, more coming


• W3C VoiceXML 2.1
  – Candidate Recommendation Sept 2006
  – Test suite under development; Certification Program to follow
  – Many platform vendors are implementing

• W3C VoiceXML 3.0
  – Early stages of development
  – SCXML – state chart markup language designed as a controller for V3 and
    CCXML 2.0 ("Working Draft" Jan 2006)


                                 © 2007 Ken Rehor. All Rights Reserved.       20
User Interaction – Input / Output Control
• Input grammars                                                          W3C SRGS 1.0
  – W3C Recommendation
  – Widely implemented

• Output formatting                                                       W3C SSML 1.0
  – W3C Recommendation
  – Widely implemented, yet minor real support
    (most TTS engines ignore the SSML instructions)


• Semantic Interpretation for Speech Recognition W3C SISR 1.0
  – Nearing Candidate Recommendation
  – Implementation gaining acceptance




                                 © 2007 Ken Rehor. All Rights Reserved.                  21
W3C Speech Interface Framework
                       Semantic Interpretation




          © 2007 Ken Rehor. All Rights Reserved.   22
W3C Speech Recognition Grammar Specification
 • Markup language to control input constraints
    – Finite-state speech recognition
    – DTMF recognition


 • Two variations
    – XML (GRXML)
    – ABNF


 • Version 1.0: W3C Recommendation – March 2004

 • Implemented and supported by numerous vendors



                             © 2007 Ken Rehor. All Rights Reserved.   23
GRXML ASR example
 • asdf
<grammar type="application/srgs+xml" root="r2" version="1.0">
 <rule id="r2" scope="public">

  <one-of>
   <item>coffee</item>
   <item>tea</item>
   <item>milk</item>
   <item>nothing</item>
  </one-of>
 </rule>

</grammar>




                          © 2007 Ken Rehor. All Rights Reserved.   24
GRXML DTMF example
<?xml version="1.0"?>

<grammar mode="dtmf" version="1.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://www.w3.org/2001/06/grammar
                             http://www.w3.org/TR/speech-grammar/grammar.xsd"
         xmlns="http://www.w3.org/2001/06/grammar">

<rule id="digit">
 <one-of>
   <item> 0 </item>
   <item> 1 </item>
   <item> 2 </item>
   <item> 3 </item>
   <item> 4 </item>
   <item> 5 </item>
   <item> 6 </item>
   <item> 7 </item>
   <item> 8 </item>
   <item> 9 </item>
 </one-of>
</rule>

<rule id="pin" scope="public">
 <one-of>
   <item>
     <item repeat="4"><ruleref uri="#digit"/></item>
     #
   </item>
</one-of>
</rule>

</grammar>



                                 © 2007 Ken Rehor. All Rights Reserved.         25
W3C Speech Synthesis Markup Language
• Markup language to control spoken and audio output

• Version 1.0: W3C Recommendation – Sept 2004

• Implemented and supported by numerous vendors

• Version 1.1: under development
   – Adds support for tonal languages
   – First public Working Draft published January 2007




                           © 2007 Ken Rehor. All Rights Reserved.   26
SSML Functions

• Audio output
  – <audio>
• Text-to-Speech output
  – Contained within SSML constructs
• Pronunciation controls
  – <say-as>
     • Interpret-as
     • Format
     • Detail
  – <emphasis>
• Timing
  – <break>



                           © 2007 Ken Rehor. All Rights Reserved.   27
SSML Functions (cont’d)

• Spoken language
  – xml:lang
• Prosody and Style – voice control
  –   Voice
  –   Gender
  –   Age
  –   Name
• Prosody
  – <prosody>
     • Pitch
     • Contour
     • Range
     • Rate
     • Duration
     • Volume

                       © 2007 Ken Rehor. All Rights Reserved.   28
SSML Functions (cont’d)

• Sentence structure
  – <p>
  – <s>
• phoneme -- Modify text
  – <sub> - substitute text
• Location identification
  – <mark>




                              © 2007 Ken Rehor. All Rights Reserved.   29
VoiceXML 2.x




 © 2007 Ken Rehor. All Rights Reserved.   30
VoiceXML Scope
• Human-machine interaction provided by voice response
  systems:
  – Output
     • play audio files
     • produce synthesized speech
  – Input
     • record spoken input
     • recognize spoken input
     • collect character input
  – Control flow
  – Telephony
     • transfer a user to another destination, such as a live agent
     • disconnect a user




                              © 2007 Ken Rehor. All Rights Reserved.   31
VoiceXML Goals
• Separate user interaction from service logic
  – Creates new possible business models
     • Service developer can be separate from telephony platform provider
• Enable service portability across implementation platforms
  – Assume common set of platform capabilities
  – Provide common language for:
     • Content providers, Tool providers, Platform providers
• Safely handle shared network-based applications
  – deterministic behavior
• Easy to build common types of applications
• Features to build complex types of applications
• Shield application authors from low-level platform-specific
  details
  – Promotes portability, ease of service creation
                                 © 2007 Ken Rehor. All Rights Reserved.     32
VoiceXML 2.0 Basic Functions

• Input
  – <field>, <menu>            recognition
  – <record>                   audio recording
• Output
  – <prompt>                   container for TTS or prerecorded audio
  – <audio>                    prerecorded audio
• Control Flow
  –   <if>, <else>, <elseif>   basic conditional logic
  –   <script>                 complex scripts using ECMAScript
  –   <goto>                   transition to a new document
  –   <submit>                 submit data to a web application
• Telephony
  – <disconnect>
  – <transfer>


                                © 2007 Ken Rehor. All Rights Reserved.   33
VoiceXML Execution Model

• Form Interpretation Algorithm <form>
• Execution is synchronous (mostly)
  – Disconnect events are handled (somewhat) asynchronously
• Audio is queued
  – Played only when encountering a waiting state
• Processing is always in one of two states:
  – Waiting for input in an input item
     • such as <field>, <record>, or <transfer>
  – Transitioning between input items in response to an input
• Event-driven
  – <catch>, <throw> generalized event mechanism
  – <nomatch>, <noinput>       short-hand user-input event handling
  – <error>          short-hand error event handling

                             © 2007 Ken Rehor. All Rights Reserved.   34
Key Points

• Architecture leverages all things "internet"
   – Languages, protocols, servers, developers, etc.
• Separation of concerns
   – Application logic / database vs. telephony / speech resources
   – Enables new business models
      • Voice ASP
      • Prepackaged applications
• URL (application) associated with phone number
   – Calling party or Called party
   – Share resources among many applications (VoiceASP)
• High-level languages, specific to domain / task
   – Simplify development and maintenance



                               © 2007 Ken Rehor. All Rights Reserved.   35
VoiceXML <form> and <field>

• <form>
  – Dialog container
  – "Form Interpretation Algorithm" (FIA) specifies default behavior
• <field>
  – Collect input from caller
  – <grammar> specifies input 'constraints'
• <prompt>
  – Container for <audio> and text




                           © 2007 Ken Rehor. All Rights Reserved.      36
Example
<?xml version="1.0"?>
<vxml version="2.0">

 <form>

  <field name="main_menu">
   <prompt>
    <audio src="welcome.wav"> Welcome to Acme.
      You can choose sales, repair, or order status.</audio>
   </prompt>
   <grammar src="main_menu.grxml"/>
  </field>

   <block>
    <submit next="http://acme.com/route... "                                method="get"/>
   </block>

 </form>
</vxml>
                                    main.vxml
Note: Code simplified for demonstration purposes…
                                   © 2007 Ken Rehor. All Rights Reserved.                    37
User Input - Grammars

• Grammars can be speech or DTMF (touchtone)
  – Both types can be active simultaneously

• Specified by SRGS
  – XML grammars are normative (aka GRXML)
  – ABNF grammars are more concise but more complex to author

• Grammars may be specified inline or sourced externally

• External grammars are referenced by URI

• Multiple grammars may be active simultaneously.



                            © 2007 Ken Rehor. All Rights Reserved.   38
Grammars can get very complicated:
       There are many ways to say the same thing…

Sales
 I'd like to place an order
 I need to talk to a salesman
Repair
 repair department
 service
 service department
 customer service
Order status
 where's my order?
 track my order
 track my shipment
 where the hell is my stuff?



                        © 2007 Ken Rehor. All Rights Reserved.   39
Basic GRXML grammar example


<grammar …xml:lang="en-US" version="1.0">

<rule id="dept" scope="public">
  <one-of>
    <item>sales</item>
    <item>repair</item>
    <item>order status</item>
</one-of>
</rule>

</grammar>




                         main_menu.grxml

                        © 2007 Ken Rehor. All Rights Reserved.   40
VoiceXML example – next step

<form>

 <field name="sales_menu">
  <prompt>
   <audio src="sales_menu.wav">
     You've reached Acme's sales department.
     To place an order, say sales. To speak to
     an associate, say I'd like to speak to someone.
   </audio>
  </prompt>
  <grammar src="sales_menu.grxml"/>
 </field>

  <block>
    <submit next="http://acme.com/... "                     method="get"/>
  </block>

</form>

                        sales.vxml
                       © 2007 Ken Rehor. All Rights Reserved.                41
VoiceXML example with error handling

<form>

 <field name="main_menu">
  <prompt>
   <audio src="welcome.wav"> Welcome to Acme.
     You can choose sales, repair, or order status.</audio>
  </prompt>
  <grammar src="main_menu.grxml"/>
 </field>

  <noinput> You must say something. </noinput>

  <block>
   <submit next="http://acme.com/route... "                     method="get"/>
  </block>

</form>

                        newmain.vxml
                       © 2007 Ken Rehor. All Rights Reserved.                    42
VoiceXML example with error handling

<form>

 <field name="main_menu">
  <prompt>
   <audio src="welcome.wav"> Welcome to Acme.
     You can choose sales, repair, or order status.</audio>
  </prompt>
  <grammar src="main_menu.grxml"/>
 </field>

  <noinput> You must say something. </noinput>
  <nomatch> I didn't understand you. Please try again. </nomatch>

  <block>
   <submit next="http://acme.com/route... "                     method="get"/>
  </block>

</form>
                        newmain.vxml
                       © 2007 Ken Rehor. All Rights Reserved.                    43
VoiceXML example with error handling
<form>

 <field name="main_menu">
  <prompt>
   <audio src="welcome.wav"> Welcome to Acme.
     You can choose sales, repair, or order status.</audio>
  </prompt>
  <grammar src="main_menu.grxml"/>
 </field>

  <help> You can say sales, repair, or order status. </help>
  <noinput> You must say something. </noinput>
  <nomatch> I didn't understand you. Please try again. </nomatch>

  <block>
   <submit next="http://acme.com/route... "                     method="get"/>
  </block>

</form>
                        newmain.vxml
                       © 2007 Ken Rehor. All Rights Reserved.                    44
Basic VoiceXML menu using <option>


<field name="maincourse">
  <prompt>
      Please select an entree. Today, we are featuring <enumerate/>
  </prompt>

  <option dtmf="1" value="fish"> swordfish </option>
  <option dtmf="2" value="beef"> roast beef </option>
  <option dtmf="3" value="chicken"> frog legs </option>

  <filled>
    <submit next="/cgi-bin/maincourse.cgi"
         method="post" namelist="maincourse"/>
  </filled>
</field>



                            maincourse.vxml
                           © 2007 Ken Rehor. All Rights Reserved.     45
Set platform features via <property>

• Input modes: type of input from a caller
   DTMF-only <property name="inputmodes" value="dtmf">
   Voice-only <property name="inputmodes" value="voice">
   Both <property name="inputmodes" value="dtmf voice">


• Timeouts
   <property name="timeout" value="1450ms">
   <property name="termtimeout" value="2500ms">
   ...




                        © 2007 Ken Rehor. All Rights Reserved.   46
Call processing: <transfer>

• Blind
   – Go somewhere but don't return


• Bridge
   – Add on another party, resume
     execution when done talking




                           © 2007 Ken Rehor. All Rights Reserved.   47
Call processing: <transfer>

 • Blind transfer


<form id="xfer">

   <block>
     <prompt> Calling Riley. Please wait. </prompt>
   </block>

      <transfer name="mycall" dest="tel:+1-555-123-4567" >

      </transfer>

</form>




                        © 2007 Ken Rehor. All Rights Reserved.   48
Call processing: <transfer>

 • Bridge transfer


<form id="xfer">
 <block> <prompt> Calling Riley. Please wait. </prompt> </block>

 <transfer name="mycall" dest="tel:+1-555-123-4567" bridge="true" >

</transfer>
</form>




                          © 2007 Ken Rehor. All Rights Reserved.      49
Call processing: <transfer>

• Bridge transfer with cancel feature


<form id="xfer">
 <block> <prompt> Calling Riley. Please wait. </prompt> </block>

 <transfer name="mycall" dest="tel:+1-555-123-4567" bridge="true" >

  <prompt> Say cancel at any time to disconnect this call.</prompt>
  <grammar src="cancel.grxml" type="application/srgs+xml"/>

</transfer>
</form>




                          © 2007 Ken Rehor. All Rights Reserved.      50
Call processing: <transfer>

<form id="xfer">
 <block> <prompt> Calling Riley. Please wait. </prompt> </block>

 <transfer name="mycall" dest="tel:+1-555-123-4567" bridge="true" >

  <prompt> Say cancel at any time to disconnect this call.</prompt>
  <grammar src="cancel.grxml" type="application/srgs+xml"/>

   <filled>
    <assign name="mydur" expr="mycall$.duration"/>
     <if cond="mycall == 'busy'">
      <prompt> Riley's line is busy. Try again later. </prompt>
     <elseif cond="mycall == 'noanswer'"/>
      <prompt> Riley didn't answer the phone.
               Please call back another time. </prompt>
     </if>
   </filled>

 </transfer>
</form>

                          © 2007 Ken Rehor. All Rights Reserved.      51
Call processing: <transfer>
<form id="xfer">
 <block> <prompt> Calling Riley. Please wait. </prompt> </block>

 <transfer name="mycall" dest="tel:+1-555-123-4567" bridge="true"
   transferaudio="music.wav" connecttimeout="60s" >

  <prompt> Say cancel at any time to disconnect this call.</prompt>
  <grammar src="cancel.grxml" type="application/srgs+xml"/>

   <filled>
    <assign name="mydur" expr="mycall$.duration"/>
     <if cond="mycall == 'busy'">
      <prompt> Riley's line is busy. Try back later. </prompt>
     <elseif cond="mycall == 'noanswer'"/>
      <prompt> Riley didn't answer the phone. Please call
            back another time. </prompt>
     </if>
   </filled>

 </transfer>
</form>

                          © 2007 Ken Rehor. All Rights Reserved.      52
Call processing: <transfer>




        © 2007 Ken Rehor. All Rights Reserved.   53
New Features in VoiceXML 2.1
• Dynamically referencing grammars and scripts
   – <grammar expr=“…”> <script expr=“…”>
• Detect Barge-in During Prompt Playback: enhance SSML 1.0 <mark>
   – Add markexpr attribute
   – Add markname and marktime to application.lastresult$ object
• Fetch (XML) data without transition: <data>
   – Uses read-only subset of DOM
• Dynamically concatenate prompts: <foreach>
   – Interate through ECMAScript array and execute content
• Record user’s utterance while attempting ASR
   – recordutterance property
   – Add shadow variables: recording, recordingsize, recordingduration
• Send data upon disconnect
   – <disconnect namelist=“…” >
• Additional <transfer> types
   – <transfer type=“…” …/>

                               © 2007 Ken Rehor. All Rights Reserved.    54
Dynamic Applications




     © 2007 Ken Rehor. All Rights Reserved.   55
VoiceXML Application Structure
• Static
   – User experience is the same for everyone
      • Information doesn’t change frequently
      • No customization per user, time of day, etc.
      • Pages are created once and used many times
• Dynamic
   – User experience is customized by:
      • User: e.g. my.yahoo.com, amazon.com (especially once you log in)
      • Situation: e.g. travel specials on expedia.com
   – Data driven, e.g. inventory system, airline reservations
   – Generated by a program at runtime
      • JSP, ASP
      • App servers such as BEA, IBM Websphere, Oracle 9iAS


                               © 2007 Ken Rehor. All Rights Reserved.      56
VoiceXML 2.1 and AJAX

• VoiceXML + ECMAScript + <data> + XML
• <data> element allows retrieval of arbitrary XML data
  without document transition
• Static VoiceXML document can fetch user-specific data at
  runtime
• Decouple presentation layer from business logic
• Performance improvements due to:
   – Cache-able VoiceXML
   – No need to generate entirely new pages for each dialog when only the
     content is new
   – Less network traffic




                            © 2007 Ken Rehor. All Rights Reserved.          57
Dynamic Application Considerations
         Execution of VoiceXML is running a program on your server…


• Must guarantee quality of dynamically-generated VoiceXML
  documents and ASR grammars
  – Catch parse errors, execution errors
  – What does the caller hear if there is an error?
     • not “Could not parse VoiceXML document”
• Runtime performance
  – Parse and interpretation time of large documents
  – Inefficient scripts and speech grammars
• Security implications
  – Exploit a bug in a particular implementation? Make free phone calls?
  – Could there be a VoiceXML virus? Will all platforms protect against them?


 Careful application design, testing and monitoring is essential
                                 © 2007 Ken Rehor. All Rights Reserved.         58
Dynamic Application Considerations
• A mix of different simultaneous applications means variable
  platform load and execution profile
   –   Parse time of VoiceXML document
   –   Fetching VoiceXML documents, grammars, audio from remote web servers
   –   Load Balancing
   –   How to protect platform from harmful application? (intentional or otherwise?)
        • Max size of document
        • Max size of grammar
        • Complexity measurement of document or grammar (statically checked before
           execution?)



 Platforms, networks, and applications must be carefully engineered




                                   © 2007 Ken Rehor. All Rights Reserved.              59
Performance Considerations




       © 2007 Ken Rehor. All Rights Reserved.   60
Load Balancing for Performance and Reliability

• CPU/memory utilization
  – Grammar compilation
  – ASR load
  – TTS load
• Telephony Network
  – Channel balancing
  – Dead channel
• Incoming/Outgoing channel assignment / mix




                          © 2007 Ken Rehor. All Rights Reserved.   61
Performance: Caching

•   Fetched documents, grammars, audio files, streams
•   Local or distributed cache?
•   Effects of prefetching
•   Where to cache generated grammars?
    – Per system
    – In-network
• Use external grammar compilation server?




                        © 2007 Ken Rehor. All Rights Reserved.   62
Application Management




     © 2007 Ken Rehor. All Rights Reserved.   63
Application Monitoring and Maintenance
  • Runtime logs
      – Web / application server
      – Voice server
      – Call Detail Reporting
  • Utterance recordings and logs
      – Useful for grammar and dialog tuning
      – Security of recordings may be an issue
      – Disk space: full-call recordings may be prohibitively large




 Usage data must be continually monitored to improve user experience




                                   © 2007 Ken Rehor. All Rights Reserved.   64
Operations, Administration, Maintenance,
                  Provisioning
• System Monitoring
   – Interfacing to existing Telco OSSs
   – Web-based for ISP environment
• Provisioning
   – Application, Customer
       • DN-URI mapping
   – Telephony
       • Call origination/transfer
       • Max call timeout
       • Max number of concurrent calls
   – Platform-specific VoiceXML features
       • ECMAScript allowed?
       • Telephony control allowed?
       • Max grammar size


                              © 2007 Ken Rehor. All Rights Reserved.   65
Billing
 Logging and Charging for usage of resources
    • "platform time"
         – Usage of server resources
    • Toll Free usage
         – It's toll free, not free
    • Transferred calls
         – Inbound minutes
         – Outbound minutes
         – Network features, e.g. Network Redirect
    • Outbound calls
 Accurate billing information is a critical factor in application cost or profitability



                                      © 2007 Ken Rehor. All Rights Reserved.               66
Application Deployment Models
Build-your-own network vs. Outsourcing




            © 2007 Ken Rehor. All Rights Reserved.   67
Build vs. Outsource?
 Deployment Options Enable a Variety of Business Models

• Completely in-house
   – Maintain complete control for security
   – Development and deployment systems can be identical

• Outsourced VoiceXML/Telephony
   – Large-scale distributed networks without major capital investment
   – Grow quickly and incrementally

• Completely outsourced hosting
   – All components and systems managed by 3rd party


• Packaged software
   – VoiceXML application integrated with existing apps




                            © 2007 Ken Rehor. All Rights Reserved.       68
Completely In-House


•   Local control of all systems
•   Voice server, app server, database can be on local network
•   Development and deployment systems can be identical
•   Physical security: in-house team “owns” it
•   Failover, reliability, scalability must be locally managed
•   Redundant power, networks, etc. are required




                          © 2007 Ken Rehor. All Rights Reserved.   69
VoiceXML On-premises Deployment
         using TDM or VoIP carrier connection




            VoIP
                                                                      Web
           "pipe"
                                                                      Applications
                                                                        Web
                           VoIP                                         Applications
                         Gateway,                   VoiceXML
  PSTN                                                 Cisco
                         PBX, etc.                  Browsers
                                                        IPCC
            TDM:
            DS3,
         Multiple PRI,
              etc.
                                                       ASR
                                                      servers
                                                                          Database
                         Co-location facility




                             © 2007 Ken Rehor. All Rights Reserved.                    70
Outsourced VoiceXML / Telephony

• Telephony and VoiceXML servers outsourced to "Voice
  Service Provider" (VSP)

• Application remains in your data center(s)
   – Geographically distributed
   – May be dedicated to specific customers


• Many carrier-grade vendors to choose from




                           © 2007 Ken Rehor. All Rights Reserved.   71
Outsourced VoiceXML / Telephony

• Architecture is identical to in-house deployment
• Secure IP connection used between facilities


                   Voice Service Provider:
                   Carrier-grade outsourcing facility


                                                                       Co-location facility
                                                                        Web
                                        VoiceXML                        Applications
                                                                          Web
      PSTN             VoIP                Cisco                          Applications
                                        Browsers
                      gateway               IPCC             Interne
                                                             t




                                          ASR
                                         servers
                                                                            Database




                          © 2007 Ken Rehor. All Rights Reserved.                              72
Advantages of Outsourcing to a VSP
• Choice of many vendors: one for all customers, or choose the
  best one for each customer
• Add capacity by adding multiple vendors
• No capital investment
• Pay-as-you-go pricing models
• Failover, reliability, scalability simplified
• Physical security of equipment and networks managed by VSP
• VPN or dedicated data connection to your backend systems




                          © 2007 Ken Rehor. All Rights Reserved.   73
Distribute Load to Multiple VSPs

                                                              VoiceXML
                                                                 Cisco
                                                              Browsers
                                                                  IPCC



PSTN
                          VoiceXML                              ASR
                             Cisco                                                Customer
                          Browsers                             servers
                              IPCC                                                co-location facility



                                                                                    Web
                             ASR                                                    Applications
                                                                                      Web
                            servers                                                   Applications
                                                                     Internet




                                                                                        Database
               VoiceXML
                  Cisco
               Browsers
                   IPCC
                                                                                Multiple co-lo facilities
                                                                                can be deployed for geographic
                                                                                redundancy and enhanced
                                                                                capacity.
                 ASR                                 VoiceXML
                                                        Cisco
                servers                              Browsers
                                                         IPCC


                     © 2007 Ken Rehor. All Rights Reserved.                                              74
Completely Outsourced

• Deploy hardware & software systems at customer-
  managed co-location facilities

• Deploy complete systems at co-location facilities managed
  by 3rd party

• Deploy pre-packaged VoiceXML application integrated
  with customer's call center (managed by customer)




                      © 2007 Ken Rehor. All Rights Reserved.   75
Combination of In-house and Outsourced
                Several ways to balance resources


• Primary in-house, with overflow or failover to a VSP
   – Local control of resources
   – Overflow to VSP during peak usage
   – Backup for failover / disaster recovery


• In-house development, with primary deployment via VSP
   – In-house development and trials
   – “Push to the network” when ready to deploy




                            © 2007 Ken Rehor. All Rights Reserved.   76
CCXML, VoiceXML, and VoIP
       3rd-Party Call Control




        © 2007 Ken Rehor. All Rights Reserved.   77
Inbound call using TDM connections

• 1st-party call control: VoiceXML server handles call
  routing/setup/answer




                                                     VoiceXML
                          PSTN                        Server


            Caller




                        © 2007 Ken Rehor. All Rights Reserved.   78
Inbound call using VoIP (SIP and RTP)

• 1st-party call control: VoIP gateway routes call to VoiceXML
  server, which handles call routing/setup/answer




                             1. INVITE
                    VoIP                                        VoiceXML
            PSTN   Gateway               2. RTP                  Server



customer




                             © 2007 Ken Rehor. All Rights Reserved.        79
Why VoIP?

• Flexible network topology

• Simplified integration of voice dialog resources

• Vendor independence for network elements

• Separation of concerns: voice dialog resources vs. call
  control




                       © 2007 Ken Rehor. All Rights Reserved.   80
Inbound Call using 3rd Party Call Control

• 3rd party application handles call routing/setup/answer


                                      Call Routing
                                      Application




                              1. INVITE              2. INVITE
                     VoIP                                        VoiceXML
             PSTN   Gateway               3. RTP                  Server


  caller




                              © 2007 Ken Rehor. All Rights Reserved.        81
Outbound call using 3rd Party Call Control
• 3rd party application handles outbound call
  initiation/setup/routing
• “Attaches” VoiceXML dialog to connection
                                         Outbound
                                          Calling
                                         Application




                             1. INVITE                 2. INVITE
                    VoIP                                           VoiceXML
            PSTN   Gateway                 3. RTP                   Server



  caller




                             © 2007 Ken Rehor. All Rights Reserved.           82
What is CCXML?

• XML-based language that manages the connections and
  resources used in phone calls

• Designed for 3rd-party call control applications

• Allows for easy integration into back end web applications
  very similar to VoiceXML’s model

• Uses the finite state machine model
   – Event handlers move from one state to the next using markup tags


• CCXML provides commands to run a “dialog” on a call leg

                            © 2007 Ken Rehor. All Rights Reserved.      83
Why is CCXML Needed?

• VoiceXML was designed primarily for voice dialogs
   – 1st-party call control: <disconnect> and a several predefined common
     <transfer> types


• Connection management requires full asynchronous event
  handling
   – Connection/telephony events can occur any time during a call and must be
     handled
   – VoiceXML specifically limits asynchronous events to simplify the execution
     and programming model


• 1st-party Call Control can be useful but has limited flexibility
   – VoiceXML 2.1 <transfer> adds "consultation" feature for network
     redirect

                               © 2007 Ken Rehor. All Rights Reserved.             84
CCXML System Architecture


                                         Telephony                Voice
                                            Web                   Web
                                         Application            Application




                                            CCXML                   VXML

                                                HTTP                  HTTP


                                           CCXML
                 Conference                Server
                   Server



                                Telephony               Dialog
                                Control                 Control
                                Interface               Interface
                 Telephony                                            Dialog
          PSTN    Interface                                           Server
                                             Media

Caller




                      © 2007 Ken Rehor. All Rights Reserved.                   85
CCXML features

• Telephony channel control: voice paths and signaling
   – <createcall>, <accept>, <disconnect>,
     <reject>, <redirect>


• Media control: Conference Bridges and Mixers
   – <join>, <unjoin>, <createconference>,
     <destroyconference>


• Dialog control: Add a VoiceXML (or other dialog)
  resource to a connection
   – <dialogstart>, <dialogprepare>,
     <dialogterminate>


                      © 2007 Ken Rehor. All Rights Reserved.   86
Integration of CCXML and VoiceXML
• Dialogs are created using <dialogstart>
   – You pass the URL of the document that you want to run


• Dialogs can be ended using <dialogterminate>
   – This allows CCXML to end a dialog based on a external event such as
     someone calling you on a second line


• Dialogs can return data back to the CCXML platform
   – In VoiceXML use <exit namelist="a b c"/>
   – This is exposed in the CCXML dialog.exit event




                           © 2007 Ken Rehor. All Rights Reserved.          87
W3C CCXML 1.0 status
• Nearing "Candidate Recommendation" status
  – Language complete
  – Test suite under development
  – Certification Program under consideration


• Growing support throughout the world

• Several open source projects underway
  – See http://www.sourceforge.net




                            © 2007 Ken Rehor. All Rights Reserved.   88
Next-Generation Technologies




         © 2007 Ken Rehor. All Rights Reserved.   89
Next-Generation Technologies
• Speaker Biometrics-based authentication
  – Speaker Identification
  – Speaker Verification

• Video IVR --VoiceXML augmented with video
  – Early stages of commercial deployment now
  – Simple extension to standard platforms
  – Straightforward step towards full multimodal


• Multimodal
  – Multiple input modalities: speech recognition, keypad, handwriting,
    biometrics (voice, fingerprint, iris, etc.), geolocation, motion
  – Multiple output modalities: graphics, audio (speech, TTS, music,
    polyphonic tones)




                             © 2007 Ken Rehor. All Rights Reserved.       90
Speaker Biometrics




    © 2007 Ken Rehor. All Rights Reserved.   91
Why Speaker Biometrics?

• Identify an individual for remote transactions

• Text / DTMF PINs are inadequate
   – Easily compromised
   – Easily forgotten
   – Does not identify an individual


• US Federal Regulations
   – FFIEC guidelines for financial services




                            © 2007 Ken Rehor. All Rights Reserved.   92
Speaker Identification and Verification (SIV)

• Authentication
   – The process of confirming one or more identities.


• Speaker Identification (one-to-many)
   – Authentication with multiple identity claims.


• Speaker Verification (one-to-one)
   – Authentication with a single identity claim.




                             © 2007 Ken Rehor. All Rights Reserved.   93
Types of SIV
• Text independent
   – SIV technology that can operate on any freeform or structured spoken input.



• Text dependent
   – SIV technology (usually verification technology) that requires the voice input
     of one or more specific passwords or pass phrases (having been enrolled).


• Text prompted
   – SIV technology (usually verification) that randomly selects words and/or
     phrases and prompts the speaker to repeat them. The term is also called
     challenge-response.




                             © 2007 Ken Rehor. All Rights Reserved.                94
Fundamental Phases of SIV

• Enrollment
   – Capture one or more user utterances to ‘train’ the system


• Verification
   – Capture one or more user utterances to make an identity claim


• Adaptation & Scoring
   – Judge how close the user’s verification utterance is to the enrolled
     utterance
   – Refine the existing enrolled utterance with information from the
     verification utterance




                             © 2007 Ken Rehor. All Rights Reserved.         95
Video and Multimodal




     © 2007 Ken Rehor. All Rights Reserved.   96
“Video” VoiceXML

• Video extensions to VoiceXML
   – 3G Wireless
   – VoIP phones
• VoiceXML is just a dialog language
   – Initially only for voice input/output
• Example
   – Videomail is a dialog application very similar to voicemail
• Video and audio are somewhat analogous
   – VoiceXML can be ‘hacked’ to handle video now:
      • <audio src="foo.au“/> could “play” a video file
         via <audio src=“foo.mpeg4”/>
   – VoiceXML 3.0 might add a new language feature
      • e.g. <video src="foo.avi"> or <media src="foo.mpeg4">


                                © 2007 Ken Rehor. All Rights Reserved.   97
“Video” VoiceXML
        Deployment and Standardization
• Simple extension to standard platforms
   – Easy integration with current platforms
   – Doesn’t “break” existing functionality
   – Well aligned with “VoiceXML model”
• Early stages of commercial deployment
   – Several vendors have deployed large-scale commercial systems
• Step towards full multimodal




                            © 2007 Ken Rehor. All Rights Reserved.   98
Multimodal Applications
• W3C Multimodal Interaction Working Group
   – Defining new standards based on extensive industry experience

• IBM / Motorola / Opera X+V 1.2
   – Early stages of commercial deployment
   – Freely available from Opera http://dev.opera.com/articles/voice/




For more information, see:
W3C Multimodal Interaction Working Group                      http://www.w3.org/2002/mmi



                                      © 2007 Ken Rehor. All Rights Reserved.               99
VoiceXML 3.0




 © 2007 Ken Rehor. All Rights Reserved.   100
VoiceXML 3.0

• Modularization
   – Cleanly separate functions to enable integration with other modalities
   – Enables code reuse
• New media processing
   –   Video
   –   Voice processing
   –   Navigation
   –   Speaker biometrics
• Separation of data, control flow and presentation
   – Control flow embodied in new language: SCXML
• Clean data model



                             © 2007 Ken Rehor. All Rights Reserved.           101
References
• W3C Voice Browser Working Group http://www.w3.org/voice
   – VoiceXML 2.0 Recommendation
      • http://www.w3.org/TR/voicexml20/
   – VoiceXML 2.1 Working Draft
      • http://www.w3.org/TR/voicexml21/
   – Semantic Interpretation Working Draft
      • http://www.w3.org/TR/semantic-interpretation/
   – SRGS 1.0 Recommendation
      • http://www.w3.org/TR/speech-grammar/
   – SSML
      • 1.0 Recommendation      http://www.w3.org/TR/speech-synthesis/
      • 1.1 Working Draft http://www.w3.org/TR/speech-synthesis11/
   – CCXML 1.0
      • http://www.w3.org/TR/ccxml/
   – SCXML
      • http://www.w3.org/TR/scxml/
• IETF http://www.ietf.org

                                     © 2007 Ken Rehor. All Rights Reserved.   102
Ken Rehor
http://www.kenrehor.com




VoiceXML Forum
Co-founder and past-Chair
Chair, VoiceXML Forum Conformance Committee
Co-Chair, VoiceXML Forum Speaker Biometrics Committee

W3C
Co-editor: VoiceXML 1.0, 2.0, 2.1, 3.0
Co-editor: CCXML 1.0




                                         © 2007 Ken Rehor. All Rights Reserved.   103

More Related Content

What's hot

Introduction to VoIP, RTP and SIP
Introduction to VoIP, RTP and SIP Introduction to VoIP, RTP and SIP
Introduction to VoIP, RTP and SIP ThousandEyes
 
Java multi threading
Java multi threadingJava multi threading
Java multi threadingRaja Sekhar
 
Java Server Pages(jsp)
Java Server Pages(jsp)Java Server Pages(jsp)
Java Server Pages(jsp)Manisha Keim
 
Basics of React Hooks.pptx.pdf
Basics of React Hooks.pptx.pdfBasics of React Hooks.pptx.pdf
Basics of React Hooks.pptx.pdfKnoldus Inc.
 
SIP Trunking
SIP TrunkingSIP Trunking
SIP Trunkingorionnow
 
Advance Java Topics (J2EE)
Advance Java Topics (J2EE)Advance Java Topics (J2EE)
Advance Java Topics (J2EE)slire
 
Android animation
Android animationAndroid animation
Android animationKrazy Koder
 
Java oops and fundamentals
Java oops and fundamentalsJava oops and fundamentals
Java oops and fundamentalsjavaease
 
Intro to Flutter SDK
Intro to Flutter SDKIntro to Flutter SDK
Intro to Flutter SDKdigitaljoni
 

What's hot (20)

Client side scripting
Client side scriptingClient side scripting
Client side scripting
 
Android intent
Android intentAndroid intent
Android intent
 
Laravel intake 37 all days
Laravel intake 37 all daysLaravel intake 37 all days
Laravel intake 37 all days
 
Introduction to VoIP, RTP and SIP
Introduction to VoIP, RTP and SIP Introduction to VoIP, RTP and SIP
Introduction to VoIP, RTP and SIP
 
Swift Introduction
Swift IntroductionSwift Introduction
Swift Introduction
 
Java multi threading
Java multi threadingJava multi threading
Java multi threading
 
Java Server Pages(jsp)
Java Server Pages(jsp)Java Server Pages(jsp)
Java Server Pages(jsp)
 
Event handling
Event handlingEvent handling
Event handling
 
Basics of React Hooks.pptx.pdf
Basics of React Hooks.pptx.pdfBasics of React Hooks.pptx.pdf
Basics of React Hooks.pptx.pdf
 
SIP Trunking
SIP TrunkingSIP Trunking
SIP Trunking
 
Tomcat server
 Tomcat server Tomcat server
Tomcat server
 
Advance Java Topics (J2EE)
Advance Java Topics (J2EE)Advance Java Topics (J2EE)
Advance Java Topics (J2EE)
 
05 intent
05 intent05 intent
05 intent
 
Android animation
Android animationAndroid animation
Android animation
 
Java oops and fundamentals
Java oops and fundamentalsJava oops and fundamentals
Java oops and fundamentals
 
Android Networking
Android NetworkingAndroid Networking
Android Networking
 
Intro to Flutter SDK
Intro to Flutter SDKIntro to Flutter SDK
Intro to Flutter SDK
 
Servlets
ServletsServlets
Servlets
 
Jsp ppt
Jsp pptJsp ppt
Jsp ppt
 
Jboss Tutorial Basics
Jboss Tutorial BasicsJboss Tutorial Basics
Jboss Tutorial Basics
 

Viewers also liked

Hybrid Learning
Hybrid Learning Hybrid Learning
Hybrid Learning ramseyr39
 
Designing Online Learning, Web 2.0 and Online Learning Resources
Designing Online Learning, Web 2.0 and Online Learning ResourcesDesigning Online Learning, Web 2.0 and Online Learning Resources
Designing Online Learning, Web 2.0 and Online Learning ResourcesSanjaya Mishra
 
Genesys voice portal whitepaper
Genesys voice portal whitepaperGenesys voice portal whitepaper
Genesys voice portal whitepaperRanjit Patel
 
Interactive voice response
Interactive voice responseInteractive voice response
Interactive voice responseAnswerPhoneUSA
 
Proactive Performance Monitoring for Genesys Call Centers
Proactive Performance Monitoring for Genesys Call CentersProactive Performance Monitoring for Genesys Call Centers
Proactive Performance Monitoring for Genesys Call CentersPerficient, Inc.
 
IVR (Interactive Voice Response) system & technology
IVR (Interactive Voice Response) system & technologyIVR (Interactive Voice Response) system & technology
IVR (Interactive Voice Response) system & technologyVijay Sharma
 
Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...
Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...
Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...Amazon Web Services
 
Leveraging IMS for VoLTE and RCS Services in LTE Networks Presented by Adnan ...
Leveraging IMS for VoLTE and RCS Services in LTE Networks Presented by Adnan ...Leveraging IMS for VoLTE and RCS Services in LTE Networks Presented by Adnan ...
Leveraging IMS for VoLTE and RCS Services in LTE Networks Presented by Adnan ...Radisys Corporation
 
Call Centre Architecture
Call Centre ArchitectureCall Centre Architecture
Call Centre Architectureapoorva tyagi
 
Introduction to PHP
Introduction to PHPIntroduction to PHP
Introduction to PHPBradley Holt
 
Genesys SIP Server Architecture
Genesys SIP Server ArchitectureGenesys SIP Server Architecture
Genesys SIP Server ArchitectureRanjit Patel
 

Viewers also liked (16)

Voicexml ppt
Voicexml pptVoicexml ppt
Voicexml ppt
 
VoiceXML
VoiceXMLVoiceXML
VoiceXML
 
Hybrid Learning
Hybrid Learning Hybrid Learning
Hybrid Learning
 
Voicexml
VoicexmlVoicexml
Voicexml
 
Designing Online Learning, Web 2.0 and Online Learning Resources
Designing Online Learning, Web 2.0 and Online Learning ResourcesDesigning Online Learning, Web 2.0 and Online Learning Resources
Designing Online Learning, Web 2.0 and Online Learning Resources
 
Genesys voice portal whitepaper
Genesys voice portal whitepaperGenesys voice portal whitepaper
Genesys voice portal whitepaper
 
Interactive voice response
Interactive voice responseInteractive voice response
Interactive voice response
 
IVR presentation
IVR  presentationIVR  presentation
IVR presentation
 
Proactive Performance Monitoring for Genesys Call Centers
Proactive Performance Monitoring for Genesys Call CentersProactive Performance Monitoring for Genesys Call Centers
Proactive Performance Monitoring for Genesys Call Centers
 
IVR (Interactive Voice Response) system & technology
IVR (Interactive Voice Response) system & technologyIVR (Interactive Voice Response) system & technology
IVR (Interactive Voice Response) system & technology
 
Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...
Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...
Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...
 
Leveraging IMS for VoLTE and RCS Services in LTE Networks Presented by Adnan ...
Leveraging IMS for VoLTE and RCS Services in LTE Networks Presented by Adnan ...Leveraging IMS for VoLTE and RCS Services in LTE Networks Presented by Adnan ...
Leveraging IMS for VoLTE and RCS Services in LTE Networks Presented by Adnan ...
 
Call Centre Architecture
Call Centre ArchitectureCall Centre Architecture
Call Centre Architecture
 
Ivrs architecture
Ivrs architectureIvrs architecture
Ivrs architecture
 
Introduction to PHP
Introduction to PHPIntroduction to PHP
Introduction to PHP
 
Genesys SIP Server Architecture
Genesys SIP Server ArchitectureGenesys SIP Server Architecture
Genesys SIP Server Architecture
 

Similar to Introduction to VoiceXml and Voice Web Architecture

Standards' Perspective - MPEG DASH overview and related efforts
Standards' Perspective - MPEG DASH overview and related effortsStandards' Perspective - MPEG DASH overview and related efforts
Standards' Perspective - MPEG DASH overview and related effortsIMTC
 
Ebu mpeg dash-webinar043
Ebu mpeg dash-webinar043Ebu mpeg dash-webinar043
Ebu mpeg dash-webinar043mc_killah
 
HTML5, Silverlight & Kinect
HTML5, Silverlight & KinectHTML5, Silverlight & Kinect
HTML5, Silverlight & KinectFrank La Vigne
 
Mike Taulty TechDays 2010 Silverlight 4 - What's New?
Mike Taulty TechDays 2010 Silverlight 4 - What's New?Mike Taulty TechDays 2010 Silverlight 4 - What's New?
Mike Taulty TechDays 2010 Silverlight 4 - What's New?ukdpe
 
HTML5 and Timed Media Playback
HTML5 and Timed Media PlaybackHTML5 and Timed Media Playback
HTML5 and Timed Media PlaybackSidra Abbasi
 
Customized IVR Implementation Using Voicexml on SIP (Voip) Communication Plat...
Customized IVR Implementation Using Voicexml on SIP (Voip) Communication Plat...Customized IVR Implementation Using Voicexml on SIP (Voip) Communication Plat...
Customized IVR Implementation Using Voicexml on SIP (Voip) Communication Plat...IJMER
 
AMF Flash and .NET
AMF Flash and .NETAMF Flash and .NET
AMF Flash and .NETYaniv Uriel
 
Client Continuum Dec Fy09
Client Continuum Dec Fy09Client Continuum Dec Fy09
Client Continuum Dec Fy09Martha Rotter
 
Delivering on the promise of the cloud for digital media, aspera on demand
Delivering on the promise of the cloud for digital media, aspera on demandDelivering on the promise of the cloud for digital media, aspera on demand
Delivering on the promise of the cloud for digital media, aspera on demandAmazon Web Services
 
StreamBase - Embedded Erjang - Erlang User Group London - 20th April 2011
StreamBase - Embedded Erjang - Erlang User Group London - 20th April 2011StreamBase - Embedded Erjang - Erlang User Group London - 20th April 2011
StreamBase - Embedded Erjang - Erlang User Group London - 20th April 2011darach
 
TLC303_Walkthrough Setting up a Highly Available Communications Platform on AWS
TLC303_Walkthrough Setting up a Highly Available Communications Platform on AWSTLC303_Walkthrough Setting up a Highly Available Communications Platform on AWS
TLC303_Walkthrough Setting up a Highly Available Communications Platform on AWSAmazon Web Services
 
Data Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat AlwellData Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat AlwellData Con LA
 
MiniFi and Apache NiFi : IoT in Berlin Germany 2018
MiniFi and Apache NiFi : IoT in Berlin Germany 2018MiniFi and Apache NiFi : IoT in Berlin Germany 2018
MiniFi and Apache NiFi : IoT in Berlin Germany 2018Timothy Spann
 
Html5 web sockets - Brad Drysdale - London Web 2011-10-20
Html5 web sockets - Brad Drysdale - London Web 2011-10-20Html5 web sockets - Brad Drysdale - London Web 2011-10-20
Html5 web sockets - Brad Drysdale - London Web 2011-10-20Nathan O'Hanlon
 
Complex Er[jl]ang Processing with StreamBase
Complex Er[jl]ang Processing with StreamBaseComplex Er[jl]ang Processing with StreamBase
Complex Er[jl]ang Processing with StreamBasedarach
 
Future of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep DiveFuture of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep DiveAldrin Piri
 

Similar to Introduction to VoiceXml and Voice Web Architecture (20)

01 introduction
01 introduction01 introduction
01 introduction
 
Standards' Perspective - MPEG DASH overview and related efforts
Standards' Perspective - MPEG DASH overview and related effortsStandards' Perspective - MPEG DASH overview and related efforts
Standards' Perspective - MPEG DASH overview and related efforts
 
Ebu mpeg dash-webinar043
Ebu mpeg dash-webinar043Ebu mpeg dash-webinar043
Ebu mpeg dash-webinar043
 
HTML5, Silverlight & Kinect
HTML5, Silverlight & KinectHTML5, Silverlight & Kinect
HTML5, Silverlight & Kinect
 
WebRTC presentation
WebRTC presentationWebRTC presentation
WebRTC presentation
 
Mike Taulty TechDays 2010 Silverlight 4 - What's New?
Mike Taulty TechDays 2010 Silverlight 4 - What's New?Mike Taulty TechDays 2010 Silverlight 4 - What's New?
Mike Taulty TechDays 2010 Silverlight 4 - What's New?
 
HTML5 and Timed Media Playback
HTML5 and Timed Media PlaybackHTML5 and Timed Media Playback
HTML5 and Timed Media Playback
 
Customized IVR Implementation Using Voicexml on SIP (Voip) Communication Plat...
Customized IVR Implementation Using Voicexml on SIP (Voip) Communication Plat...Customized IVR Implementation Using Voicexml on SIP (Voip) Communication Plat...
Customized IVR Implementation Using Voicexml on SIP (Voip) Communication Plat...
 
AMF Flash and .NET
AMF Flash and .NETAMF Flash and .NET
AMF Flash and .NET
 
Client Continuum Dec Fy09
Client Continuum Dec Fy09Client Continuum Dec Fy09
Client Continuum Dec Fy09
 
Delivering on the promise of the cloud for digital media, aspera on demand
Delivering on the promise of the cloud for digital media, aspera on demandDelivering on the promise of the cloud for digital media, aspera on demand
Delivering on the promise of the cloud for digital media, aspera on demand
 
Client server architecture
Client server architectureClient server architecture
Client server architecture
 
StreamBase - Embedded Erjang - Erlang User Group London - 20th April 2011
StreamBase - Embedded Erjang - Erlang User Group London - 20th April 2011StreamBase - Embedded Erjang - Erlang User Group London - 20th April 2011
StreamBase - Embedded Erjang - Erlang User Group London - 20th April 2011
 
TLC303_Walkthrough Setting up a Highly Available Communications Platform on AWS
TLC303_Walkthrough Setting up a Highly Available Communications Platform on AWSTLC303_Walkthrough Setting up a Highly Available Communications Platform on AWS
TLC303_Walkthrough Setting up a Highly Available Communications Platform on AWS
 
Data Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat AlwellData Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat Alwell
 
MiniFi and Apache NiFi : IoT in Berlin Germany 2018
MiniFi and Apache NiFi : IoT in Berlin Germany 2018MiniFi and Apache NiFi : IoT in Berlin Germany 2018
MiniFi and Apache NiFi : IoT in Berlin Germany 2018
 
Cloud & The Mobile Stack
Cloud & The Mobile StackCloud & The Mobile Stack
Cloud & The Mobile Stack
 
Html5 web sockets - Brad Drysdale - London Web 2011-10-20
Html5 web sockets - Brad Drysdale - London Web 2011-10-20Html5 web sockets - Brad Drysdale - London Web 2011-10-20
Html5 web sockets - Brad Drysdale - London Web 2011-10-20
 
Complex Er[jl]ang Processing with StreamBase
Complex Er[jl]ang Processing with StreamBaseComplex Er[jl]ang Processing with StreamBase
Complex Er[jl]ang Processing with StreamBase
 
Future of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep DiveFuture of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep Dive
 

More from Paul Nguyen

An Introduction To Using ChatGPT For Business
An Introduction To Using ChatGPT For BusinessAn Introduction To Using ChatGPT For Business
An Introduction To Using ChatGPT For BusinessPaul Nguyen
 
PropTech Applications Innovation and The Future of Real Estate Platform
PropTech Applications Innovation and The Future of Real Estate PlatformPropTech Applications Innovation and The Future of Real Estate Platform
PropTech Applications Innovation and The Future of Real Estate PlatformPaul Nguyen
 
An Introduction To Smart City Design and Development
An Introduction To Smart City Design and DevelopmentAn Introduction To Smart City Design and Development
An Introduction To Smart City Design and DevelopmentPaul Nguyen
 
Top 10 real estate tech trends (10 xu hướng công nghệ trong BĐS)
Top 10 real estate tech trends (10 xu hướng công nghệ trong BĐS)Top 10 real estate tech trends (10 xu hướng công nghệ trong BĐS)
Top 10 real estate tech trends (10 xu hướng công nghệ trong BĐS)Paul Nguyen
 
Refactoring and code smells
Refactoring and code smellsRefactoring and code smells
Refactoring and code smellsPaul Nguyen
 
Creative Problem Solving Skills For Staff
Creative Problem Solving Skills For StaffCreative Problem Solving Skills For Staff
Creative Problem Solving Skills For StaffPaul Nguyen
 
Effective Communication Skills
Effective Communication SkillsEffective Communication Skills
Effective Communication SkillsPaul Nguyen
 
Scrum Process For Offshore Team
Scrum Process For Offshore TeamScrum Process For Offshore Team
Scrum Process For Offshore TeamPaul Nguyen
 
Scrum Process Overview
Scrum Process OverviewScrum Process Overview
Scrum Process OverviewPaul Nguyen
 
How to retain good employee for company
How to retain good employee for companyHow to retain good employee for company
How to retain good employee for companyPaul Nguyen
 

More from Paul Nguyen (10)

An Introduction To Using ChatGPT For Business
An Introduction To Using ChatGPT For BusinessAn Introduction To Using ChatGPT For Business
An Introduction To Using ChatGPT For Business
 
PropTech Applications Innovation and The Future of Real Estate Platform
PropTech Applications Innovation and The Future of Real Estate PlatformPropTech Applications Innovation and The Future of Real Estate Platform
PropTech Applications Innovation and The Future of Real Estate Platform
 
An Introduction To Smart City Design and Development
An Introduction To Smart City Design and DevelopmentAn Introduction To Smart City Design and Development
An Introduction To Smart City Design and Development
 
Top 10 real estate tech trends (10 xu hướng công nghệ trong BĐS)
Top 10 real estate tech trends (10 xu hướng công nghệ trong BĐS)Top 10 real estate tech trends (10 xu hướng công nghệ trong BĐS)
Top 10 real estate tech trends (10 xu hướng công nghệ trong BĐS)
 
Refactoring and code smells
Refactoring and code smellsRefactoring and code smells
Refactoring and code smells
 
Creative Problem Solving Skills For Staff
Creative Problem Solving Skills For StaffCreative Problem Solving Skills For Staff
Creative Problem Solving Skills For Staff
 
Effective Communication Skills
Effective Communication SkillsEffective Communication Skills
Effective Communication Skills
 
Scrum Process For Offshore Team
Scrum Process For Offshore TeamScrum Process For Offshore Team
Scrum Process For Offshore Team
 
Scrum Process Overview
Scrum Process OverviewScrum Process Overview
Scrum Process Overview
 
How to retain good employee for company
How to retain good employee for companyHow to retain good employee for company
How to retain good employee for company
 

Recently uploaded

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 

Recently uploaded (20)

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 

Introduction to VoiceXml and Voice Web Architecture

  • 1. Introduction to VoiceXML and Voice Web Architecture Ken Rehor © 2007 Ken Rehor. All Rights Reserved. 1
  • 2. Session Overview • Voice Web Architecture – Components of a Voice Web Application • Voice Standards – W3C Speech Interface Framework • VoiceXML – Language features – Execution model - Form Interpretation Algorithm (FIA) • Application Design Techniques – Static vs. dynamic VoiceXML – Performance Considerations • CCXML, VoiceXML and VoIP • Application Deployment Models • New Technologies – Speaker Biometrics, Video, Multimodal, VoiceXML 3.0 © 2007 Ken Rehor. All Rights Reserved. 2
  • 3. Simplifying Voice Services programming • Web-based architecture for interactive speech services – Exploit web technologies to simplify voice service creation and deployment – Enable consolidation of voice and web services – Separate service logic from user interaction • High-level programming languages – Control speech and telephony resources in uniform manner – Shield application programmers from implementation details • No need to know ASR, TTS, telephony APIs – Create portable applications • Run on enterprise system or in telephone network • Run on a variety of platforms, ASR agnostic © 2007 Ken Rehor. All Rights Reserved. 3
  • 4. Voice Web Application Architecture © 2007 Ken Rehor. All Rights Reserved. 4
  • 5. Key Ideas • Standard/Common high-level language – Designed for the task • Leverage open, known technology – Web protocols, servers, networks, development tools, expertise • Phone number mapped to URL – Phone number associated with URL of voice service © 2007 Ken Rehor. All Rights Reserved. 5
  • 6. Voice / Web Application Architecture <grxml> PSTN or .wav VoIP <vxml> • Grammars • Audio files Any phone • Scripts VoiceXML HTTP browser HTTP Internet or HTTP Intranet <html> Application (web) server • Application logic • Content and data • Transaction processing • Database interface • Images • Audio files Web • Scripts Browser © 2007 Ken Rehor. All Rights Reserved. 6
  • 7. Voice Application Architecture and Components <grxml> Welcome to Customer Acme products .wav service, … please… <vxml> Caller HTTP VoiceXML PSTN platform Internet or intranet Web server VoiceXML interpreter OA&M middleware Telephony DTMF Audio ASR TTS © 2007 Ken Rehor. All Rights Reserved. 7
  • 8. Application Backend Architecture • Grammars • Audio files • Scripts <vxml> Transaction Server HTTP Internet or Intranet or Intranet Internet Application (web) server • Application logic • Content and data • Transaction processing • Database interface Database (content) Web service © 2007 Ken Rehor. All Rights Reserved. 8
  • 9. Components of a Voice Solution • Traditional phone, VoIP phone, mobile phone, or multimodal device • Telephone network – Circuit-switched PSTN or packet-switched VoIP – Connects caller’s telephone with Telephony Server • Voice User Interface – Dialog structure / flow – Prompts – what the application says to the user – Speech grammars – what the user can say • Application logic that executes on an application server – Web "back-end“ – Database, or database interface • VoiceXML Server that executes dialogs – Controls resources such as ASR, SIV, TTS, etc • Data network to connect application server and VoiceXML server © 2007 Ken Rehor. All Rights Reserved. 9
  • 10. Inbound or Outbound calls • VoiceXML application works the same for inbound and outbound calls – Additional call progress detection generally required for outbound • Simple protocol for initiating outbound calls – No firm standards, but most vendors follow similar techniques – HTTP, Web Services, etc. © 2007 Ken Rehor. All Rights Reserved. 10
  • 11. Standards © 2007 Ken Rehor. All Rights Reserved. 11
  • 12. Value of Open Standards • Non-proprietary interfaces between components • Allow choice of best components for the task • User interface languages – W3C Speech Interface Framework: VoiceXML, SRGS, SSML, SI – W3C: HTML, XHTML, SMIL, X+V – OMA: WAP • Communication protocols – W3C: CCXML for 3rd-party telephony call control – W3C: HTTP, HTTPS, SOAP, WSDL – IETF: SIP, MRCP, MSCP – 3GPP: IMS – ITU: T1, ISDN © 2007 Ken Rehor. All Rights Reserved. 12
  • 13. Visual vs. Voice markup Web app UI Voice Web app UI • HTML – Structure • VoiceXML – Structure – Layout – Dialog flow – Input declaration – Input declaration – Transitions – Transitions • Images • Audio files • Audio files / streams • Video, Images • Video • Text (for TTS) • Text • Scripts • Scripts © 2007 Ken Rehor. All Rights Reserved. 13
  • 14. Protocols Web applications Voice Web applications • HTTP, HTTPS • HTTP, HTTPS • RTP • RTP • SOAP • SOAP • WSDL • WSDL • … • SIP • … © 2007 Ken Rehor. All Rights Reserved. 14
  • 15. Voice Standards Activities • Speech Interface Framework • Network protocols – SIP, MRCP v2, etc. • Platform Certification, Developer Certification, Speaker Biometrics, Architecture, Tools © 2007 Ken Rehor. All Rights Reserved. 15
  • 16. Voice Application Standards CCXML VoiceXML SIP Netann Call Control Application MSCML Application SOAP MOML / MSML MSCP Scripts DMSP CCXML VXML GRXML MGCP etc. HTTP HTTP HTTPS Scripts HTTPS Media Audio Control Interface CCXML SSML Conference/ Browser Media Server Telephony Dialog Control Control SIP Interface Interface VoIP VoiceXML DTMF GRXML Phone Gateway Browser Networ RFC 2833 k T1 / E1 Media G.711, WAV, ISDN VoiceXML 2.0 Audio Mixer / .au, mp3, etc. SS7 RTP VoiceXML 2.1 Caller Server ECMAScript 262 MRCP Client Telephony Control Interface: SIP, etc. MRCP v1 Dialog Control Interface: SIP, MSCP, etc. MRCP MRCP v2 Server Server Server TTS ASR SIV © 2007 Ken Rehor. All Rights Reserved. SSML GRXML ** standards in progress ** 16
  • 17. W3C Speech Interface Framework © 2007 Ken Rehor. All Rights Reserved. 17
  • 18. Voice Application Components • Dialog – flow control of the inputs, outputs, next steps • Input grammars – Control input constraints for DTMF and speech recognition • Output formatting – Pronunciation, timing, sequencing © 2007 Ken Rehor. All Rights Reserved. 18
  • 19. W3C Speech Interface Framework • VoiceXML • SRGS • SSML • Semantic Interpretation • Pronunciation Lexicon • Call Control For more information, see: W3C Voice Browser Working Group http://www.w3.org/Voice/ © 2007 Ken Rehor. All Rights Reserved. 19
  • 20. Voice User Interface - Dialog • W3C VoiceXML 2.0 – W3C Recommendation March 2004 – Widely implemented • Approximately 4 dozen platforms • Many service providers worldwide – VoiceXML Forum certification program • Nearly two dozen certified platforms, more coming • W3C VoiceXML 2.1 – Candidate Recommendation Sept 2006 – Test suite under development; Certification Program to follow – Many platform vendors are implementing • W3C VoiceXML 3.0 – Early stages of development – SCXML – state chart markup language designed as a controller for V3 and CCXML 2.0 ("Working Draft" Jan 2006) © 2007 Ken Rehor. All Rights Reserved. 20
  • 21. User Interaction – Input / Output Control • Input grammars W3C SRGS 1.0 – W3C Recommendation – Widely implemented • Output formatting W3C SSML 1.0 – W3C Recommendation – Widely implemented, yet minor real support (most TTS engines ignore the SSML instructions) • Semantic Interpretation for Speech Recognition W3C SISR 1.0 – Nearing Candidate Recommendation – Implementation gaining acceptance © 2007 Ken Rehor. All Rights Reserved. 21
  • 22. W3C Speech Interface Framework Semantic Interpretation © 2007 Ken Rehor. All Rights Reserved. 22
  • 23. W3C Speech Recognition Grammar Specification • Markup language to control input constraints – Finite-state speech recognition – DTMF recognition • Two variations – XML (GRXML) – ABNF • Version 1.0: W3C Recommendation – March 2004 • Implemented and supported by numerous vendors © 2007 Ken Rehor. All Rights Reserved. 23
  • 24. GRXML ASR example • asdf <grammar type="application/srgs+xml" root="r2" version="1.0"> <rule id="r2" scope="public"> <one-of> <item>coffee</item> <item>tea</item> <item>milk</item> <item>nothing</item> </one-of> </rule> </grammar> © 2007 Ken Rehor. All Rights Reserved. 24
  • 25. GRXML DTMF example <?xml version="1.0"?> <grammar mode="dtmf" version="1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/06/grammar http://www.w3.org/TR/speech-grammar/grammar.xsd" xmlns="http://www.w3.org/2001/06/grammar"> <rule id="digit"> <one-of> <item> 0 </item> <item> 1 </item> <item> 2 </item> <item> 3 </item> <item> 4 </item> <item> 5 </item> <item> 6 </item> <item> 7 </item> <item> 8 </item> <item> 9 </item> </one-of> </rule> <rule id="pin" scope="public"> <one-of> <item> <item repeat="4"><ruleref uri="#digit"/></item> # </item> </one-of> </rule> </grammar> © 2007 Ken Rehor. All Rights Reserved. 25
  • 26. W3C Speech Synthesis Markup Language • Markup language to control spoken and audio output • Version 1.0: W3C Recommendation – Sept 2004 • Implemented and supported by numerous vendors • Version 1.1: under development – Adds support for tonal languages – First public Working Draft published January 2007 © 2007 Ken Rehor. All Rights Reserved. 26
  • 27. SSML Functions • Audio output – <audio> • Text-to-Speech output – Contained within SSML constructs • Pronunciation controls – <say-as> • Interpret-as • Format • Detail – <emphasis> • Timing – <break> © 2007 Ken Rehor. All Rights Reserved. 27
  • 28. SSML Functions (cont’d) • Spoken language – xml:lang • Prosody and Style – voice control – Voice – Gender – Age – Name • Prosody – <prosody> • Pitch • Contour • Range • Rate • Duration • Volume © 2007 Ken Rehor. All Rights Reserved. 28
  • 29. SSML Functions (cont’d) • Sentence structure – <p> – <s> • phoneme -- Modify text – <sub> - substitute text • Location identification – <mark> © 2007 Ken Rehor. All Rights Reserved. 29
  • 30. VoiceXML 2.x © 2007 Ken Rehor. All Rights Reserved. 30
  • 31. VoiceXML Scope • Human-machine interaction provided by voice response systems: – Output • play audio files • produce synthesized speech – Input • record spoken input • recognize spoken input • collect character input – Control flow – Telephony • transfer a user to another destination, such as a live agent • disconnect a user © 2007 Ken Rehor. All Rights Reserved. 31
  • 32. VoiceXML Goals • Separate user interaction from service logic – Creates new possible business models • Service developer can be separate from telephony platform provider • Enable service portability across implementation platforms – Assume common set of platform capabilities – Provide common language for: • Content providers, Tool providers, Platform providers • Safely handle shared network-based applications – deterministic behavior • Easy to build common types of applications • Features to build complex types of applications • Shield application authors from low-level platform-specific details – Promotes portability, ease of service creation © 2007 Ken Rehor. All Rights Reserved. 32
  • 33. VoiceXML 2.0 Basic Functions • Input – <field>, <menu> recognition – <record> audio recording • Output – <prompt> container for TTS or prerecorded audio – <audio> prerecorded audio • Control Flow – <if>, <else>, <elseif> basic conditional logic – <script> complex scripts using ECMAScript – <goto> transition to a new document – <submit> submit data to a web application • Telephony – <disconnect> – <transfer> © 2007 Ken Rehor. All Rights Reserved. 33
  • 34. VoiceXML Execution Model • Form Interpretation Algorithm <form> • Execution is synchronous (mostly) – Disconnect events are handled (somewhat) asynchronously • Audio is queued – Played only when encountering a waiting state • Processing is always in one of two states: – Waiting for input in an input item • such as <field>, <record>, or <transfer> – Transitioning between input items in response to an input • Event-driven – <catch>, <throw> generalized event mechanism – <nomatch>, <noinput> short-hand user-input event handling – <error> short-hand error event handling © 2007 Ken Rehor. All Rights Reserved. 34
  • 35. Key Points • Architecture leverages all things "internet" – Languages, protocols, servers, developers, etc. • Separation of concerns – Application logic / database vs. telephony / speech resources – Enables new business models • Voice ASP • Prepackaged applications • URL (application) associated with phone number – Calling party or Called party – Share resources among many applications (VoiceASP) • High-level languages, specific to domain / task – Simplify development and maintenance © 2007 Ken Rehor. All Rights Reserved. 35
  • 36. VoiceXML <form> and <field> • <form> – Dialog container – "Form Interpretation Algorithm" (FIA) specifies default behavior • <field> – Collect input from caller – <grammar> specifies input 'constraints' • <prompt> – Container for <audio> and text © 2007 Ken Rehor. All Rights Reserved. 36
  • 37. Example <?xml version="1.0"?> <vxml version="2.0"> <form> <field name="main_menu"> <prompt> <audio src="welcome.wav"> Welcome to Acme. You can choose sales, repair, or order status.</audio> </prompt> <grammar src="main_menu.grxml"/> </field> <block> <submit next="http://acme.com/route... " method="get"/> </block> </form> </vxml> main.vxml Note: Code simplified for demonstration purposes… © 2007 Ken Rehor. All Rights Reserved. 37
  • 38. User Input - Grammars • Grammars can be speech or DTMF (touchtone) – Both types can be active simultaneously • Specified by SRGS – XML grammars are normative (aka GRXML) – ABNF grammars are more concise but more complex to author • Grammars may be specified inline or sourced externally • External grammars are referenced by URI • Multiple grammars may be active simultaneously. © 2007 Ken Rehor. All Rights Reserved. 38
  • 39. Grammars can get very complicated: There are many ways to say the same thing… Sales I'd like to place an order I need to talk to a salesman Repair repair department service service department customer service Order status where's my order? track my order track my shipment where the hell is my stuff? © 2007 Ken Rehor. All Rights Reserved. 39
  • 40. Basic GRXML grammar example <grammar …xml:lang="en-US" version="1.0"> <rule id="dept" scope="public"> <one-of> <item>sales</item> <item>repair</item> <item>order status</item> </one-of> </rule> </grammar> main_menu.grxml © 2007 Ken Rehor. All Rights Reserved. 40
  • 41. VoiceXML example – next step <form> <field name="sales_menu"> <prompt> <audio src="sales_menu.wav"> You've reached Acme's sales department. To place an order, say sales. To speak to an associate, say I'd like to speak to someone. </audio> </prompt> <grammar src="sales_menu.grxml"/> </field> <block> <submit next="http://acme.com/... " method="get"/> </block> </form> sales.vxml © 2007 Ken Rehor. All Rights Reserved. 41
  • 42. VoiceXML example with error handling <form> <field name="main_menu"> <prompt> <audio src="welcome.wav"> Welcome to Acme. You can choose sales, repair, or order status.</audio> </prompt> <grammar src="main_menu.grxml"/> </field> <noinput> You must say something. </noinput> <block> <submit next="http://acme.com/route... " method="get"/> </block> </form> newmain.vxml © 2007 Ken Rehor. All Rights Reserved. 42
  • 43. VoiceXML example with error handling <form> <field name="main_menu"> <prompt> <audio src="welcome.wav"> Welcome to Acme. You can choose sales, repair, or order status.</audio> </prompt> <grammar src="main_menu.grxml"/> </field> <noinput> You must say something. </noinput> <nomatch> I didn't understand you. Please try again. </nomatch> <block> <submit next="http://acme.com/route... " method="get"/> </block> </form> newmain.vxml © 2007 Ken Rehor. All Rights Reserved. 43
  • 44. VoiceXML example with error handling <form> <field name="main_menu"> <prompt> <audio src="welcome.wav"> Welcome to Acme. You can choose sales, repair, or order status.</audio> </prompt> <grammar src="main_menu.grxml"/> </field> <help> You can say sales, repair, or order status. </help> <noinput> You must say something. </noinput> <nomatch> I didn't understand you. Please try again. </nomatch> <block> <submit next="http://acme.com/route... " method="get"/> </block> </form> newmain.vxml © 2007 Ken Rehor. All Rights Reserved. 44
  • 45. Basic VoiceXML menu using <option> <field name="maincourse"> <prompt> Please select an entree. Today, we are featuring <enumerate/> </prompt> <option dtmf="1" value="fish"> swordfish </option> <option dtmf="2" value="beef"> roast beef </option> <option dtmf="3" value="chicken"> frog legs </option> <filled> <submit next="/cgi-bin/maincourse.cgi" method="post" namelist="maincourse"/> </filled> </field> maincourse.vxml © 2007 Ken Rehor. All Rights Reserved. 45
  • 46. Set platform features via <property> • Input modes: type of input from a caller DTMF-only <property name="inputmodes" value="dtmf"> Voice-only <property name="inputmodes" value="voice"> Both <property name="inputmodes" value="dtmf voice"> • Timeouts <property name="timeout" value="1450ms"> <property name="termtimeout" value="2500ms"> ... © 2007 Ken Rehor. All Rights Reserved. 46
  • 47. Call processing: <transfer> • Blind – Go somewhere but don't return • Bridge – Add on another party, resume execution when done talking © 2007 Ken Rehor. All Rights Reserved. 47
  • 48. Call processing: <transfer> • Blind transfer <form id="xfer"> <block> <prompt> Calling Riley. Please wait. </prompt> </block> <transfer name="mycall" dest="tel:+1-555-123-4567" > </transfer> </form> © 2007 Ken Rehor. All Rights Reserved. 48
  • 49. Call processing: <transfer> • Bridge transfer <form id="xfer"> <block> <prompt> Calling Riley. Please wait. </prompt> </block> <transfer name="mycall" dest="tel:+1-555-123-4567" bridge="true" > </transfer> </form> © 2007 Ken Rehor. All Rights Reserved. 49
  • 50. Call processing: <transfer> • Bridge transfer with cancel feature <form id="xfer"> <block> <prompt> Calling Riley. Please wait. </prompt> </block> <transfer name="mycall" dest="tel:+1-555-123-4567" bridge="true" > <prompt> Say cancel at any time to disconnect this call.</prompt> <grammar src="cancel.grxml" type="application/srgs+xml"/> </transfer> </form> © 2007 Ken Rehor. All Rights Reserved. 50
  • 51. Call processing: <transfer> <form id="xfer"> <block> <prompt> Calling Riley. Please wait. </prompt> </block> <transfer name="mycall" dest="tel:+1-555-123-4567" bridge="true" > <prompt> Say cancel at any time to disconnect this call.</prompt> <grammar src="cancel.grxml" type="application/srgs+xml"/> <filled> <assign name="mydur" expr="mycall$.duration"/> <if cond="mycall == 'busy'"> <prompt> Riley's line is busy. Try again later. </prompt> <elseif cond="mycall == 'noanswer'"/> <prompt> Riley didn't answer the phone. Please call back another time. </prompt> </if> </filled> </transfer> </form> © 2007 Ken Rehor. All Rights Reserved. 51
  • 52. Call processing: <transfer> <form id="xfer"> <block> <prompt> Calling Riley. Please wait. </prompt> </block> <transfer name="mycall" dest="tel:+1-555-123-4567" bridge="true" transferaudio="music.wav" connecttimeout="60s" > <prompt> Say cancel at any time to disconnect this call.</prompt> <grammar src="cancel.grxml" type="application/srgs+xml"/> <filled> <assign name="mydur" expr="mycall$.duration"/> <if cond="mycall == 'busy'"> <prompt> Riley's line is busy. Try back later. </prompt> <elseif cond="mycall == 'noanswer'"/> <prompt> Riley didn't answer the phone. Please call back another time. </prompt> </if> </filled> </transfer> </form> © 2007 Ken Rehor. All Rights Reserved. 52
  • 53. Call processing: <transfer> © 2007 Ken Rehor. All Rights Reserved. 53
  • 54. New Features in VoiceXML 2.1 • Dynamically referencing grammars and scripts – <grammar expr=“…”> <script expr=“…”> • Detect Barge-in During Prompt Playback: enhance SSML 1.0 <mark> – Add markexpr attribute – Add markname and marktime to application.lastresult$ object • Fetch (XML) data without transition: <data> – Uses read-only subset of DOM • Dynamically concatenate prompts: <foreach> – Interate through ECMAScript array and execute content • Record user’s utterance while attempting ASR – recordutterance property – Add shadow variables: recording, recordingsize, recordingduration • Send data upon disconnect – <disconnect namelist=“…” > • Additional <transfer> types – <transfer type=“…” …/> © 2007 Ken Rehor. All Rights Reserved. 54
  • 55. Dynamic Applications © 2007 Ken Rehor. All Rights Reserved. 55
  • 56. VoiceXML Application Structure • Static – User experience is the same for everyone • Information doesn’t change frequently • No customization per user, time of day, etc. • Pages are created once and used many times • Dynamic – User experience is customized by: • User: e.g. my.yahoo.com, amazon.com (especially once you log in) • Situation: e.g. travel specials on expedia.com – Data driven, e.g. inventory system, airline reservations – Generated by a program at runtime • JSP, ASP • App servers such as BEA, IBM Websphere, Oracle 9iAS © 2007 Ken Rehor. All Rights Reserved. 56
  • 57. VoiceXML 2.1 and AJAX • VoiceXML + ECMAScript + <data> + XML • <data> element allows retrieval of arbitrary XML data without document transition • Static VoiceXML document can fetch user-specific data at runtime • Decouple presentation layer from business logic • Performance improvements due to: – Cache-able VoiceXML – No need to generate entirely new pages for each dialog when only the content is new – Less network traffic © 2007 Ken Rehor. All Rights Reserved. 57
  • 58. Dynamic Application Considerations Execution of VoiceXML is running a program on your server… • Must guarantee quality of dynamically-generated VoiceXML documents and ASR grammars – Catch parse errors, execution errors – What does the caller hear if there is an error? • not “Could not parse VoiceXML document” • Runtime performance – Parse and interpretation time of large documents – Inefficient scripts and speech grammars • Security implications – Exploit a bug in a particular implementation? Make free phone calls? – Could there be a VoiceXML virus? Will all platforms protect against them?  Careful application design, testing and monitoring is essential © 2007 Ken Rehor. All Rights Reserved. 58
  • 59. Dynamic Application Considerations • A mix of different simultaneous applications means variable platform load and execution profile – Parse time of VoiceXML document – Fetching VoiceXML documents, grammars, audio from remote web servers – Load Balancing – How to protect platform from harmful application? (intentional or otherwise?) • Max size of document • Max size of grammar • Complexity measurement of document or grammar (statically checked before execution?)  Platforms, networks, and applications must be carefully engineered © 2007 Ken Rehor. All Rights Reserved. 59
  • 60. Performance Considerations © 2007 Ken Rehor. All Rights Reserved. 60
  • 61. Load Balancing for Performance and Reliability • CPU/memory utilization – Grammar compilation – ASR load – TTS load • Telephony Network – Channel balancing – Dead channel • Incoming/Outgoing channel assignment / mix © 2007 Ken Rehor. All Rights Reserved. 61
  • 62. Performance: Caching • Fetched documents, grammars, audio files, streams • Local or distributed cache? • Effects of prefetching • Where to cache generated grammars? – Per system – In-network • Use external grammar compilation server? © 2007 Ken Rehor. All Rights Reserved. 62
  • 63. Application Management © 2007 Ken Rehor. All Rights Reserved. 63
  • 64. Application Monitoring and Maintenance • Runtime logs – Web / application server – Voice server – Call Detail Reporting • Utterance recordings and logs – Useful for grammar and dialog tuning – Security of recordings may be an issue – Disk space: full-call recordings may be prohibitively large  Usage data must be continually monitored to improve user experience © 2007 Ken Rehor. All Rights Reserved. 64
  • 65. Operations, Administration, Maintenance, Provisioning • System Monitoring – Interfacing to existing Telco OSSs – Web-based for ISP environment • Provisioning – Application, Customer • DN-URI mapping – Telephony • Call origination/transfer • Max call timeout • Max number of concurrent calls – Platform-specific VoiceXML features • ECMAScript allowed? • Telephony control allowed? • Max grammar size © 2007 Ken Rehor. All Rights Reserved. 65
  • 66. Billing Logging and Charging for usage of resources • "platform time" – Usage of server resources • Toll Free usage – It's toll free, not free • Transferred calls – Inbound minutes – Outbound minutes – Network features, e.g. Network Redirect • Outbound calls  Accurate billing information is a critical factor in application cost or profitability © 2007 Ken Rehor. All Rights Reserved. 66
  • 67. Application Deployment Models Build-your-own network vs. Outsourcing © 2007 Ken Rehor. All Rights Reserved. 67
  • 68. Build vs. Outsource? Deployment Options Enable a Variety of Business Models • Completely in-house – Maintain complete control for security – Development and deployment systems can be identical • Outsourced VoiceXML/Telephony – Large-scale distributed networks without major capital investment – Grow quickly and incrementally • Completely outsourced hosting – All components and systems managed by 3rd party • Packaged software – VoiceXML application integrated with existing apps © 2007 Ken Rehor. All Rights Reserved. 68
  • 69. Completely In-House • Local control of all systems • Voice server, app server, database can be on local network • Development and deployment systems can be identical • Physical security: in-house team “owns” it • Failover, reliability, scalability must be locally managed • Redundant power, networks, etc. are required © 2007 Ken Rehor. All Rights Reserved. 69
  • 70. VoiceXML On-premises Deployment using TDM or VoIP carrier connection VoIP Web "pipe" Applications Web VoIP Applications Gateway, VoiceXML PSTN Cisco PBX, etc. Browsers IPCC TDM: DS3, Multiple PRI, etc. ASR servers Database Co-location facility © 2007 Ken Rehor. All Rights Reserved. 70
  • 71. Outsourced VoiceXML / Telephony • Telephony and VoiceXML servers outsourced to "Voice Service Provider" (VSP) • Application remains in your data center(s) – Geographically distributed – May be dedicated to specific customers • Many carrier-grade vendors to choose from © 2007 Ken Rehor. All Rights Reserved. 71
  • 72. Outsourced VoiceXML / Telephony • Architecture is identical to in-house deployment • Secure IP connection used between facilities Voice Service Provider: Carrier-grade outsourcing facility Co-location facility Web VoiceXML Applications Web PSTN VoIP Cisco Applications Browsers gateway IPCC Interne t ASR servers Database © 2007 Ken Rehor. All Rights Reserved. 72
  • 73. Advantages of Outsourcing to a VSP • Choice of many vendors: one for all customers, or choose the best one for each customer • Add capacity by adding multiple vendors • No capital investment • Pay-as-you-go pricing models • Failover, reliability, scalability simplified • Physical security of equipment and networks managed by VSP • VPN or dedicated data connection to your backend systems © 2007 Ken Rehor. All Rights Reserved. 73
  • 74. Distribute Load to Multiple VSPs VoiceXML Cisco Browsers IPCC PSTN VoiceXML ASR Cisco Customer Browsers servers IPCC co-location facility Web ASR Applications Web servers Applications Internet Database VoiceXML Cisco Browsers IPCC Multiple co-lo facilities can be deployed for geographic redundancy and enhanced capacity. ASR VoiceXML Cisco servers Browsers IPCC © 2007 Ken Rehor. All Rights Reserved. 74
  • 75. Completely Outsourced • Deploy hardware & software systems at customer- managed co-location facilities • Deploy complete systems at co-location facilities managed by 3rd party • Deploy pre-packaged VoiceXML application integrated with customer's call center (managed by customer) © 2007 Ken Rehor. All Rights Reserved. 75
  • 76. Combination of In-house and Outsourced Several ways to balance resources • Primary in-house, with overflow or failover to a VSP – Local control of resources – Overflow to VSP during peak usage – Backup for failover / disaster recovery • In-house development, with primary deployment via VSP – In-house development and trials – “Push to the network” when ready to deploy © 2007 Ken Rehor. All Rights Reserved. 76
  • 77. CCXML, VoiceXML, and VoIP 3rd-Party Call Control © 2007 Ken Rehor. All Rights Reserved. 77
  • 78. Inbound call using TDM connections • 1st-party call control: VoiceXML server handles call routing/setup/answer VoiceXML PSTN Server Caller © 2007 Ken Rehor. All Rights Reserved. 78
  • 79. Inbound call using VoIP (SIP and RTP) • 1st-party call control: VoIP gateway routes call to VoiceXML server, which handles call routing/setup/answer 1. INVITE VoIP VoiceXML PSTN Gateway 2. RTP Server customer © 2007 Ken Rehor. All Rights Reserved. 79
  • 80. Why VoIP? • Flexible network topology • Simplified integration of voice dialog resources • Vendor independence for network elements • Separation of concerns: voice dialog resources vs. call control © 2007 Ken Rehor. All Rights Reserved. 80
  • 81. Inbound Call using 3rd Party Call Control • 3rd party application handles call routing/setup/answer Call Routing Application 1. INVITE 2. INVITE VoIP VoiceXML PSTN Gateway 3. RTP Server caller © 2007 Ken Rehor. All Rights Reserved. 81
  • 82. Outbound call using 3rd Party Call Control • 3rd party application handles outbound call initiation/setup/routing • “Attaches” VoiceXML dialog to connection Outbound Calling Application 1. INVITE 2. INVITE VoIP VoiceXML PSTN Gateway 3. RTP Server caller © 2007 Ken Rehor. All Rights Reserved. 82
  • 83. What is CCXML? • XML-based language that manages the connections and resources used in phone calls • Designed for 3rd-party call control applications • Allows for easy integration into back end web applications very similar to VoiceXML’s model • Uses the finite state machine model – Event handlers move from one state to the next using markup tags • CCXML provides commands to run a “dialog” on a call leg © 2007 Ken Rehor. All Rights Reserved. 83
  • 84. Why is CCXML Needed? • VoiceXML was designed primarily for voice dialogs – 1st-party call control: <disconnect> and a several predefined common <transfer> types • Connection management requires full asynchronous event handling – Connection/telephony events can occur any time during a call and must be handled – VoiceXML specifically limits asynchronous events to simplify the execution and programming model • 1st-party Call Control can be useful but has limited flexibility – VoiceXML 2.1 <transfer> adds "consultation" feature for network redirect © 2007 Ken Rehor. All Rights Reserved. 84
  • 85. CCXML System Architecture Telephony Voice Web Web Application Application CCXML VXML HTTP HTTP CCXML Conference Server Server Telephony Dialog Control Control Interface Interface Telephony Dialog PSTN Interface Server Media Caller © 2007 Ken Rehor. All Rights Reserved. 85
  • 86. CCXML features • Telephony channel control: voice paths and signaling – <createcall>, <accept>, <disconnect>, <reject>, <redirect> • Media control: Conference Bridges and Mixers – <join>, <unjoin>, <createconference>, <destroyconference> • Dialog control: Add a VoiceXML (or other dialog) resource to a connection – <dialogstart>, <dialogprepare>, <dialogterminate> © 2007 Ken Rehor. All Rights Reserved. 86
  • 87. Integration of CCXML and VoiceXML • Dialogs are created using <dialogstart> – You pass the URL of the document that you want to run • Dialogs can be ended using <dialogterminate> – This allows CCXML to end a dialog based on a external event such as someone calling you on a second line • Dialogs can return data back to the CCXML platform – In VoiceXML use <exit namelist="a b c"/> – This is exposed in the CCXML dialog.exit event © 2007 Ken Rehor. All Rights Reserved. 87
  • 88. W3C CCXML 1.0 status • Nearing "Candidate Recommendation" status – Language complete – Test suite under development – Certification Program under consideration • Growing support throughout the world • Several open source projects underway – See http://www.sourceforge.net © 2007 Ken Rehor. All Rights Reserved. 88
  • 89. Next-Generation Technologies © 2007 Ken Rehor. All Rights Reserved. 89
  • 90. Next-Generation Technologies • Speaker Biometrics-based authentication – Speaker Identification – Speaker Verification • Video IVR --VoiceXML augmented with video – Early stages of commercial deployment now – Simple extension to standard platforms – Straightforward step towards full multimodal • Multimodal – Multiple input modalities: speech recognition, keypad, handwriting, biometrics (voice, fingerprint, iris, etc.), geolocation, motion – Multiple output modalities: graphics, audio (speech, TTS, music, polyphonic tones) © 2007 Ken Rehor. All Rights Reserved. 90
  • 91. Speaker Biometrics © 2007 Ken Rehor. All Rights Reserved. 91
  • 92. Why Speaker Biometrics? • Identify an individual for remote transactions • Text / DTMF PINs are inadequate – Easily compromised – Easily forgotten – Does not identify an individual • US Federal Regulations – FFIEC guidelines for financial services © 2007 Ken Rehor. All Rights Reserved. 92
  • 93. Speaker Identification and Verification (SIV) • Authentication – The process of confirming one or more identities. • Speaker Identification (one-to-many) – Authentication with multiple identity claims. • Speaker Verification (one-to-one) – Authentication with a single identity claim. © 2007 Ken Rehor. All Rights Reserved. 93
  • 94. Types of SIV • Text independent – SIV technology that can operate on any freeform or structured spoken input. • Text dependent – SIV technology (usually verification technology) that requires the voice input of one or more specific passwords or pass phrases (having been enrolled). • Text prompted – SIV technology (usually verification) that randomly selects words and/or phrases and prompts the speaker to repeat them. The term is also called challenge-response. © 2007 Ken Rehor. All Rights Reserved. 94
  • 95. Fundamental Phases of SIV • Enrollment – Capture one or more user utterances to ‘train’ the system • Verification – Capture one or more user utterances to make an identity claim • Adaptation & Scoring – Judge how close the user’s verification utterance is to the enrolled utterance – Refine the existing enrolled utterance with information from the verification utterance © 2007 Ken Rehor. All Rights Reserved. 95
  • 96. Video and Multimodal © 2007 Ken Rehor. All Rights Reserved. 96
  • 97. “Video” VoiceXML • Video extensions to VoiceXML – 3G Wireless – VoIP phones • VoiceXML is just a dialog language – Initially only for voice input/output • Example – Videomail is a dialog application very similar to voicemail • Video and audio are somewhat analogous – VoiceXML can be ‘hacked’ to handle video now: • <audio src="foo.au“/> could “play” a video file via <audio src=“foo.mpeg4”/> – VoiceXML 3.0 might add a new language feature • e.g. <video src="foo.avi"> or <media src="foo.mpeg4"> © 2007 Ken Rehor. All Rights Reserved. 97
  • 98. “Video” VoiceXML Deployment and Standardization • Simple extension to standard platforms – Easy integration with current platforms – Doesn’t “break” existing functionality – Well aligned with “VoiceXML model” • Early stages of commercial deployment – Several vendors have deployed large-scale commercial systems • Step towards full multimodal © 2007 Ken Rehor. All Rights Reserved. 98
  • 99. Multimodal Applications • W3C Multimodal Interaction Working Group – Defining new standards based on extensive industry experience • IBM / Motorola / Opera X+V 1.2 – Early stages of commercial deployment – Freely available from Opera http://dev.opera.com/articles/voice/ For more information, see: W3C Multimodal Interaction Working Group http://www.w3.org/2002/mmi © 2007 Ken Rehor. All Rights Reserved. 99
  • 100. VoiceXML 3.0 © 2007 Ken Rehor. All Rights Reserved. 100
  • 101. VoiceXML 3.0 • Modularization – Cleanly separate functions to enable integration with other modalities – Enables code reuse • New media processing – Video – Voice processing – Navigation – Speaker biometrics • Separation of data, control flow and presentation – Control flow embodied in new language: SCXML • Clean data model © 2007 Ken Rehor. All Rights Reserved. 101
  • 102. References • W3C Voice Browser Working Group http://www.w3.org/voice – VoiceXML 2.0 Recommendation • http://www.w3.org/TR/voicexml20/ – VoiceXML 2.1 Working Draft • http://www.w3.org/TR/voicexml21/ – Semantic Interpretation Working Draft • http://www.w3.org/TR/semantic-interpretation/ – SRGS 1.0 Recommendation • http://www.w3.org/TR/speech-grammar/ – SSML • 1.0 Recommendation http://www.w3.org/TR/speech-synthesis/ • 1.1 Working Draft http://www.w3.org/TR/speech-synthesis11/ – CCXML 1.0 • http://www.w3.org/TR/ccxml/ – SCXML • http://www.w3.org/TR/scxml/ • IETF http://www.ietf.org © 2007 Ken Rehor. All Rights Reserved. 102
  • 103. Ken Rehor http://www.kenrehor.com VoiceXML Forum Co-founder and past-Chair Chair, VoiceXML Forum Conformance Committee Co-Chair, VoiceXML Forum Speaker Biometrics Committee W3C Co-editor: VoiceXML 1.0, 2.0, 2.1, 3.0 Co-editor: CCXML 1.0 © 2007 Ken Rehor. All Rights Reserved. 103

Editor's Notes

  1. This DTMF grammar accepts a 4-digit PIN followed by a pound terminator