Ken Rehor's presentation at eComm 2008 - Presentation Transcript
Alphabet Soup: Sorting out Emerging Telephony and Speech Standards Ken Rehor Co-founder, VoiceXML Forum Founder, Harken Systems, LLC
Voice Web Telephony Architecture
Benefits of Open Interfaces, Protocols, Languages
Status and Deployment
Components of a Voice Solution
Break out of the monolithic systems trap
Modernize existing proprietary applications without starting from scratch
Develop new apps, and incrementally add features in a modular fashion
Advantages
Faster development
Less expensive to develop and maintain
Path towards modern, open standards architecture
Voice / Web Application Architecture Phone user HTTP HTTP App server
Application logic
Content and data
Transaction processing
Database interface
VoiceXML platform TDM or VoIP
Grammars
Audio / SSML
Scripts
Images
Media
Scripts
HTTP Any phone Internet or Intranet Web user <html> .wav <grxml> <vxml>
Voice App Architecture and Standards Scripts HTTP HTTPS HTTP HTTPS VoiceXML Browser Telephony Control Interface: SIP, etc. Dialog Control Interface: SIP, MSCP, etc. Dialog Control Interface VoiceXML Application CCXML VXML Phone Network Caller CCXML Call Control Application Media Control Interface SOAP GRXML Scripts Audio T1 / E1 ISDN SS7 SIP RFC 2833 RTP M R C P GRXML SSML GRXML G.711, WAV, .au, mp3, etc. SIP Netann MSCML MOML / MSML MSCP DMSP MGCP etc. Telephony Control Interface VoiceXML 2.0 VoiceXML 2.1 ECMAScript 262 MRCP v1 MRCP v2 SSML VoIP Gateway Conference/ Media Server CCXML Browser MRCP Client Audio DTMF Media Mixer / Server TTS Server SIV Server ASR Server
Why Standards?
Grow an industry
Interoperation
Lower cost of goods
Innovation and evolution
Disrupt proprietary markets
Ecosystems develop around every open interface
Everyone benefits through joint work: reduces design effort
Promote technology to the next level
Sell more due to larger market
Open Interfaces Enable Innovation
Migration: Proprietary, hardware-based solutions to Proprietary software-based solutions to Open Software
New Business Models
e.g. Voice Service Provider: Separate application from Telephony/Speech resources
Separation of concerns
Evolve components without starting from scratch
Concentrate on innovation rather than duplication
Move up the value chain
Leverage open, known technology
Web protocols, servers, networks, development tools, expertise
Distributed Client-Server Architecture
Enables new business models and efficient resource utilization
Standard/Common high-level language
Designed for voice dialogs and telephony
Phone number mapped to URL
Phone number associated with URL of voice application
Voice Web Fundamental Concepts
Visual vs. Voice markup
Web app UI
HTML – Structure
Layout
Input declaration
Transitions
Images
Audio
Video
Text
Scripts
Voice Web app UI
VoiceXML – Structure
Dialog flow
Input declaration
Transitions
Audio
Video, Images
Text (for TTS)
Scripts
Protocols
Web applications
HTTP, HTTPS
SIP
RTP
SOAP
WSDL
…
Voice Web applications
HTTP, HTTPS
SIP
RTP
SOAP
WSDL
…
The Telecom Trilogy
User Interaction
Voice user interface
Multimodal user interface
Switching
Connecting endpoints
Moving connections
Signaling
Media processing
ASR, SIV, TTS, Record / Play
Conferencing, Mixing, Echo cancellation
Endpointing, Coding / Format conversion
Ecosystem at Every Interface Proprietary dialog XML <xml> VoiceXML, GRXML, SSML, Scripts, etc. MRCP client MRCP server VSP: Telephony, Speech, apps
Application Developers
VUI designers
Voice platforms
Tools
Service Providers
Application Servers
Audio Engine ASR Engine <grxml> TTS Engine <ssml> VoiceXML browser <vxml> Application Server Code Generator GUI Tool / SDE .wav
Industry Standards – Global Adoption
VoiceXML Forum
Nearly 100 member organizations worldwide
Platform Certification
Speaker Biometrics
Collaborating with W3C, ANSI, ISO
W3C Speech Interface Framework
VoiceXML 2.0/2.1, SRGS 1.0, SSML 1.0, CCXML 1.0
SISR 1.0, PLS 1.0
Coming: VoiceXML 3.0, SSML 1.1
IETF
Media Resource Control Protocol (MRCPv2)
SIP / VoiceXML media server spec (MEDIACTRL)
W3C Speech Interface Framework
VoiceXML
SRGS
SSML
Semantic Interpretation
Call Control
Pronunciation Lexicon
SCXML
For more information, see: W3C Voice Browser Working Group http://www.w3.org/Voice/
W3C Speech Interface Framework
W3C VoiceXML 2.0
W3C Recommendation March 2004
Widely implemented
Approximately 4 dozen platforms
Many service providers worldwide
Many tools, countless applications
VoiceXML Forum Platform Certification Program
24 certified platforms, more coming
W3C VoiceXML 2.1
W3C Recommendation April 2007
Most platform vendors support it
Certification Program and Test suite in progress
W3C VoiceXML 3.0
Spec in early stages of development
W3C Speech Interface Framework
Call Control W3C CCXML 1.0
W3C Working Draft Jan 2007
Implementations increasing
Pronunciation Lexicon W3C PLS 1.0
Used to describe phonetic information for use in speech recognition and synthesis
2 nd Last Call Working Draft Oct 2006
W3C Speech Interface Framework
Input grammars SRGS 1.0
W3C Recommendation March 2004
Widely implemented
Output formatting SSML 1.0, 1.1
SSML 1.0 - W3C Recommendation March 2004
Widely implemented, yet minor real support (most TTS engines ignore the SSML instructions)
SSML 1.1 – W3C Working Draft June 2007
Adds support for Asian, Eastern European, and Middle Eastern languages
Semantic Interpretation for Speech Recognition SISR 1.0
W3C Recommendation April 2007
Implementations increasing
Required for new Platform Certification
What's Next?
VoiceXML 3.0
Video
Multimodal integration
Speaker Biometrics
Cleaner Modularity
SCXML 1.0
State Chart Markup Language
Separate logic from presentation
W3C Working Draft Feb 2007
Several implementations available
Commercial, educational, open source
Web / Voice ++
Standards enable easy integration with other technologies
Mashups, SOA, Multi-Channel/Modal POTS PSTN or VoIP Mobile web VXML Browser Voice UI App Mobile IP IP Presentation logic Business logic Mobile UI App Web UI App PC
http://www.kenrehor.com http://www.voicexml.org http://www.w3.org/voice For more information:
3 rd Party Call Control: CCXML and SIP Media HTTP HTTP PSTN Caller Telephony Control Interface Dialog Control Interface Telephony Web Application Voice Web Application CCXML VXML Telephony Interface CCXML Server VoiceXML Server Media Server
Voice Web Application Architecture VoiceXML browser PSTN or IP network database audio <record> audio .wav MRCP Server Voice Web Application Server MRCP Client ASR Engine <grxml> <vxml> TTS Engine <ssml>
VSOA Interfaces
Services can use a combination of interfaces
SIP / RTP for media services
With data carried in SIP messages
VoiceXML / HTTP for dialog services
CCXML / HTTP for switching control services
All can use SOAP or other web services interfaces
An eComm 2008 presentation – http://eCommMedia.com for more
0 comments
Post a comment