Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Fundamentals of Text To
Speech in UC	

Patrick Dexter
Thank you - my name is Patrick Dexter with a company Cepstral and to...
Cepstral
Text To Speech innovator	

Founded in 2001 	

Focus on North and South America	

Elastix Partner since 2011	

To ...
!
!
@Cepstral_LLC	

Our marketing department wouldn’t let me do this presentation without giving you our twitter address. ...
What is Text To Speech?
So what is Text To Speech? Text to Speech is the ability to create audio that was never recorded b...
Fun History of TTS
Before we dive into more details about Text To Speech I want to show you one of the earliest Speech Syn...
The machine pictured on this slide is a replica of the first speech synthesizer originally developed by Wolfgang Von Kempel...
Text To Speech	

Technologies
So there are several different competing technologies that are used to create Text To Speech...
• Formant	

• Diphone	

• Statistical Parametric
If you’re familiar with Text To Speech you’ve heard of some of these.
!
F...
Unit Selection Synthesis
But the primary technology being used in commercial Text To Speech systems today is Unit Selectio...
We’ve all seen these in movies and TV shows. You cut up letters and rearrange them to form new words. At the most base lev...
Unit Selection Synthesis
You can’t just record the alphabet because human speech is not made up of letters. We use letters...
agua
So let’s take a look at phonemes. I tried to modify this presentation with Spanish examples when possible. 
!
Agua is...
a1 g xu0 a0
and here is the phonetic spelling of that word. If you’re curious this is based on the Carnegie Mellon Univers...
a1g xu0 a0
OK so looking at this first phoneme we are depicting the ah sound with a 1 which denotes that the vowel is stres...
a1 gxu0 a0
The guh sound is represented by the g
a1 g xu0a0
and here’s one that should be new - xu with a 0. this is where phonetic alphabets start to make differ from a r...
aquí
Here’s a very similar word - has a u in it but it’s pronounced completely different aqui versus agua do you hear that...
a1 g xu0a0
That whuuu sound is identified by this X U phone. and the 0 marks it as being unstressed.
a1 g xu0 a0
the a 0 now completes the word to give us the whhhuuaaa sound.
Create a new unit
selection TTS
voice
I think the really cool thing about Unit selection voices is that they we need to se...
So how do we grab all of those phonemes? We lock that lucky voice actor in a sound booth and force them to record hours an...
Nowadays I wish I
used cheese to
coax them out,
because bacon can
be awkward.
Here’s an actual sentence from our English s...
Labeling
Pitch

Duration	

Position	

Diphones
The phonemes are then labelled with acoustic parameters or context factors ...
agua salida sola jamón hora
Now that we have a database filled with units we can select them to create new audio that was n...
a gxuasalida sola jamón hora
g xu a
Starting off with our old friend agua, we’ll take the last bit of that. You can see th...
agua sali dasola jamón hora
g xu a d a
Continuing along with our example. Now we’ll add in a group of phonemes from the ne...
agua salida so lajamón hora
g xu a d a l a
We’ll continue to do this to build out the new word.
agua salida sola xamón hora
g xu a d a l a x a
In an actual TTS engine. This selection will only take milliseconds. This i...
agua salida sola jamón ho ra
g xu a d a l a x a r a
And finishing up gives us
g xu a0 d a0 l a1 x a0 r a
Guadalajara
The Mexican city of Guadalajara
!
Now this is just a simple example of creating a n...
Hay agua en Marte.
!
!
Beber el agua.
The whuuuuuu phonemes from Agua in these two sentences are labeled differently. Typi...
User Lexicons
!
!
We’ve been talking about the research end of speech synthesis but there are production applications that...
!
User Lexicons
word = phonemes
word = phonemes
word = phonemes
word = phonemes
word = phonemes
User lexicons are lookup t...
Text
Normalization
I mentioned earlier that in milliseconds the engine performs other calculations as well. One of the mor...
7/10
Let’s say you had a piece of text that said this. What could it mean?
7/10
7 de octubre
Here in Colombia it could be todays date October 7th.
7/10
MM/DD/YYYY
In the United States the date format is different.
7/10
July 7th
so this exact same text would be July 10th to me. It’s a bit absurd I think we’re one of the only countries ...
7/10
7 dividido por 10
Or we could look at it another way and it would be a math problem. These are the types of issues th...
This is a courtesy call to
remind Patrick Dexter
of an appointment with
Dr. Steel on 10/07/2015
at 10:30 am.
About a week ...
This is a courtesy call to
remind Patrick Dexter
of an appointment with
Dr. Steel on 10/07/2015
at 10:30 am.
In a high cal...
This is a courtesy call to
remind Patrick Dexter
of an appointment with
Dr. Steel on 10/07/2015
at 10:30 am.
But even with...
This is a courtesy call to
remind Patrick Dexter
of an appointment with
Dr. Steel on 10/07/2015
at 10:30 am.
So getting ba...
This is a courtesy call to
remind Patrick Dexter
of an appointment with
Dr. Steel on 10/07/2015
at 10:30 am.
And we’re not...
Heteronym
Another issue is this lovely thing. Does anyone know what a Heteronym is? 
It’s an evil part of the English lang...
!
!
@Cepstral_LLC	

This is where that twitter handle becomes relevant.
Bass
The word bass can be a fish
Bass
Or it can be pronounced bass and in music mean the deep low end. As in the bass clef versus the treble clef.
Object
This is a fun word that can be either a verb or a noun
Me opongo a
ese objeto.
In spanish this sentence may make sense. I’m really hoping. I used Google Translate. But you have ...
I object to that
object.
But in English the sentence is I object to that object. Do you hear the two different pronunciati...
I object verb to
that object.
The engine can figure out the part of speech to help determine the pronunciation. Here object...
I object to that
object noun.
and a noun. This functionality is called a part of speech tagger and it’s very helpful in a ...
$10 per day
Currency interpretation is also important and is something we see all of the time in outbound phone call campa...
$10.5 million per
day
We’d read this as $10 point five million dollars per day. Not 10 period 5 dollars million per day. So...
1
To put it all those pieces together. we have our text to speech software here.
!
1. The text is sent to it
2
!
2. The Text Normalization occurs trying to figure out all of those dates and currency, the parts of speech and heterony...
3
!
3. The best possible units are selected from hundreds of thousands of examples based on all of those acoustic paramete...
4
!
4. The units are all joined together to generate a wave form
5
!
5. And the audio is outputted to the user. Magic!
!
In the world of Telephones and Unified Communication Text To Speech...
Since Elastix is built on top of Asterisk we can use the existing tools like ODBC database connections for grabbing variab...
And the open source module app_swift for linking Cepstral into Asterisk
app_swift
we'll be looking at app_swift which is specific to Cepstral text to speech.
MRCP
but there's also a protocol called MRCP if you want to use other TTS engines and MRCP is also used to add in speech r...
exten=>n,Swift("Hello! Thank you for
calling Cepstral.”|4000|3)	

!
exten=>n,Set(CALL_TRANSFER=$
{FILTER(0-9,${SWIFT_DTMF}...
exten => 123,8,Set(BALANCE=$
{ODBC_BALANCE(${ACCOUNT})})	

!
exten => 123,9,Swift(${BALANCE})	

Like I said before with OD...
exten => 123,8,Set(BALANCE=$
{ODBC_BALANCE(${ACCOUNT})})	

!
exten => 123,9,Swift(${BALANCE})	

and then using the swift c...
https://vimeo.com/84233208
!
There's a fantastic video available from Elastix training that goes into detail on how to ins...
• Text To Speech automates
the delivery of
information. 	

• Grow IVR usage without
adding call center
employees	

To end ...
¡Gracias!
Thank you very much for the opportunity to speak to all of you today.
Upcoming SlideShare
Loading in …5
×

Dynamic calls with Text To Speech

1,081 views

Published on

Speaker: Patrick Dexter
ElastixWorld 2015
October 7th, 2015
Bogota - Colombia

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Dynamic calls with Text To Speech

  1. 1. Fundamentals of Text To Speech in UC Patrick Dexter Thank you - my name is Patrick Dexter with a company Cepstral and today I’ll be talking about Text To Speech voices. We’ll discuss how a TTS voice is made, the component parts of text to speech software, and how that fits into Unified Communications software like Elastix
  2. 2. Cepstral Text To Speech innovator Founded in 2001 Focus on North and South America Elastix Partner since 2011 To give you some background information Cepstral is a commercial company spun out of Carnegie Mellon University in Pittsburgh Pennsylvania. We have customers all around the world from doing announcements at train stations in New Zealand and Australia to delivering 1000s of concurrent ports in large call centers in Canada. Our main customer base is in North and South America. And we’ve been a proud partner of Elastix since 2011.
  3. 3. ! ! @Cepstral_LLC Our marketing department wouldn’t let me do this presentation without giving you our twitter address. But this is also useful if you have any questions about this presentation or TTS in general tweet them to me and I’ll respond.
  4. 4. What is Text To Speech? So what is Text To Speech? Text to Speech is the ability to create audio that was never recorded before. There’s far too many words to record them all and new ones are being created every day. We see this all the time in Telephone systems. You need to tell a caller the amount of money they have in an account. Or that their package will be delivered to a specific address. Information is constantly changing so you need a way to get it to your callers.
  5. 5. Fun History of TTS Before we dive into more details about Text To Speech I want to show you one of the earliest Speech Synthesis devices
  6. 6. The machine pictured on this slide is a replica of the first speech synthesizer originally developed by Wolfgang Von Kempelen in the late 1700s. Interestingly this machine from the 1840s was viewed and studied by Alexander Graham Bell who created his own version and used many of the ideas when he invented the telephone! So Speech Synthesis and the Telephone have been used together since the very beginning.
  7. 7. Text To Speech Technologies So there are several different competing technologies that are used to create Text To Speech voices.
  8. 8. • Formant • Diphone • Statistical Parametric If you’re familiar with Text To Speech you’ve heard of some of these. ! Formant synthesis creates mathematical models of the tissue in the mouth and lungs. It has a very small footprint but requires a great deal of computing power to operate. To me it sounds like an opera singer. doing scales of aaaaahhhhhhs Formant synthesizers are good at doing vowel sounds in a range of pitches ! Diphone voices are quite robotic - this is the Stephen Hawking voice it’s easily understood but doesn’t sound like a person. ! Statistical parametric voices are sometimes called HMM for their use of Hidden Markov Models to create a model of speech based on a corpus and then use that model to generate the new audio. !
  9. 9. Unit Selection Synthesis But the primary technology being used in commercial Text To Speech systems today is Unit Selection synthesis. You’re already familiar with this if your mobile phone talks back to you. It’s what SIRI and Google Now uses. ! Unit Selection voices provide the most human like experience today And that’s because they are made with the recordings of actual people. these recordings are identified and labeled to create a database of sounds. ! In this sense Unit Selection is similar to a ransom note
  10. 10. We’ve all seen these in movies and TV shows. You cut up letters and rearrange them to form new words. At the most base level this is what Unit Selection Synthesis is all about. In English there are 26 letters in Spanish 29 so all we have to do is record about 30 things and we have a Unit Selection Text to Speech voice, right? Well no - unfortunately it’s not this easy.
  11. 11. Unit Selection Synthesis You can’t just record the alphabet because human speech is not made up of letters. We use letters to write down speech. but when spoken, speech is made up of sounds which we call phonemes. And how these phonemes are pronounced can vary quite a lot depending on what you’re saying and importantly where in the sentence that sound occurs. So much so that there are specialized alphabets specifically for phonemes. ! !
  12. 12. agua So let’s take a look at phonemes. I tried to modify this presentation with Spanish examples when possible. ! Agua is a word that even with my limited Spanish skills I can pronounce.
  13. 13. a1 g xu0 a0 and here is the phonetic spelling of that word. If you’re curious this is based on the Carnegie Mellon University phonetic set. There are several other phoneme dictionaries IPA, SAMPA are popular but we use a variation of the CMU alphabet for our voices.
  14. 14. a1g xu0 a0 OK so looking at this first phoneme we are depicting the ah sound with a 1 which denotes that the vowel is stressed.
  15. 15. a1 gxu0 a0 The guh sound is represented by the g
  16. 16. a1 g xu0a0 and here’s one that should be new - xu with a 0. this is where phonetic alphabets start to make differ from a regular alphabet. If we just used the u to describe this sound it wouldn’t work.
  17. 17. aquí Here’s a very similar word - has a u in it but it’s pronounced completely different aqui versus agua do you hear that whuu sound?
  18. 18. a1 g xu0a0 That whuuu sound is identified by this X U phone. and the 0 marks it as being unstressed.
  19. 19. a1 g xu0 a0 the a 0 now completes the word to give us the whhhuuaaa sound.
  20. 20. Create a new unit selection TTS voice I think the really cool thing about Unit selection voices is that they we need to select a specific person to record a new voice and that their voice will live on theoretically forever.
  21. 21. So how do we grab all of those phonemes? We lock that lucky voice actor in a sound booth and force them to record hours and hours worth of carefully worded scripts. These scripts are designed to capture as many of the phoneme interactions as possible.
  22. 22. Nowadays I wish I used cheese to coax them out, because bacon can be awkward. Here’s an actual sentence from our English script. This sentence is grammatically correct which helps the voice actor to read it in a natural tone. Once this has been recorded the audio file is segmented into individual phonemes, and the location of the phonemes in the syllables, words, common phrases, and then sentences is noted. This allows us to better match sounds when the software runs. We’ll see an example of this in a minute
  23. 23. Labeling Pitch
 Duration Position Diphones The phonemes are then labelled with acoustic parameters or context factors like the fundamental frequency or pitch, time duration, position in the syllable, and neighboring phonemes. Because of all of these different possible interactions. When we’re done we’ll have hundreds of thousands of examples or units of each phoneme in our database. 
 !
  24. 24. agua salida sola jamón hora Now that we have a database filled with units we can select them to create new audio that was never said by the voice talent. Here’s a group of words - most likely recorded at different times of the day or even days or years apart. We have to use the same voice talent for a single voice. So based on our own testing and customer feedback we’ll record new material and build that into the voice in order to provide more natural synthesis. ! Let’s go through this and create a new word by selecting units from these recordings.
  25. 25. a gxuasalida sola jamón hora g xu a Starting off with our old friend agua, we’ll take the last bit of that. You can see that we’re grabbing several phonemes at the same time. The TTS engine is looking for units that will match up best. So if it can find phonemes that were near each other already the audio will probably sound more natural. What we want is for the phonemes from different recordings to join together smoothly. If they don’t there’s a jump that the two sounds will have to make and that’s when you hear the glitches in TTS audio that make it sound robotic. So getting smooth joins is of paramount importance. That’s one of the reasons why we need hundreds of thousands of these phonemes.
  26. 26. agua sali dasola jamón hora g xu a d a Continuing along with our example. Now we’ll add in a group of phonemes from the next word.
  27. 27. agua salida so lajamón hora g xu a d a l a We’ll continue to do this to build out the new word.
  28. 28. agua salida sola xamón hora g xu a d a l a x a In an actual TTS engine. This selection will only take milliseconds. This is how the software can be used in a telephone of unified communications system. It operates faster than realtime. The engine will also be performing a number of other calculations as well that we’ll look at in a minute. To me it’s still amazing that Text To Speech even works at all.
  29. 29. agua salida sola jamón ho ra g xu a d a l a x a r a And finishing up gives us
  30. 30. g xu a0 d a0 l a1 x a0 r a Guadalajara The Mexican city of Guadalajara ! Now this is just a simple example of creating a new word. In real life a TTS engine is looking at features like phrase boundaries - does the phoneme occur in the beginning, middle, or end of a word. Going even further where in the original recording was the word? All of these attributes influence how that phoneme is said. !
  31. 31. Hay agua en Marte. ! ! Beber el agua. The whuuuuuu phonemes from Agua in these two sentences are labeled differently. Typically at the end of a sentence the pitch descends. Beber el Agua. and I know my spanish is very very bad. But el Agua the l bleeds into the a. It would be difficult to take that A phoneme and use it in Marte for example. ! The perfect unit required at synthesis time may not be available in the database, so a selection must be performed to choose, from amongst the many slightly mis-matched units, the best available sequence of units to concatenate. The more units we have the greater the chance that we’ll find that perfect unit.
  32. 32. User Lexicons ! ! We’ve been talking about the research end of speech synthesis but there are production applications that knowing all of this will help with. ! Being familiar with phonemes and phonetic alphabets provides both you and the end users of Text To Speech software with the ability to customize the voice through a user lexicon.
  33. 33. ! User Lexicons word = phonemes word = phonemes word = phonemes word = phonemes word = phonemes User lexicons are lookup tables replacing words in the text with user defined pronunciations. These can be specialized acronyms that are specific to a company or peoples names - often very useful when using an English voice to pronounce Spanish or other language names. Lexicons fine tune the audio to make sure that it’s as understandable as possible.
  34. 34. Text Normalization I mentioned earlier that in milliseconds the engine performs other calculations as well. One of the more important calculations is called Text Normalization. This actually happens first in the process. So let’s take a look at what that means and why it is a challenge for all Text To Speech engines.
  35. 35. 7/10 Let’s say you had a piece of text that said this. What could it mean?
  36. 36. 7/10 7 de octubre Here in Colombia it could be todays date October 7th.
  37. 37. 7/10 MM/DD/YYYY In the United States the date format is different.
  38. 38. 7/10 July 7th so this exact same text would be July 10th to me. It’s a bit absurd I think we’re one of the only countries to use this format it probably has something to do with our hatred of the metric system as well. I guess we just like to be difficult. Moving on
  39. 39. 7/10 7 dividido por 10 Or we could look at it another way and it would be a math problem. These are the types of issues that Text Normalization has to solve. Unless the software knows what the sentence means it can’t properly pronounce the words to convey that information clearly. One of the keys to figuring these things out is identifying and analyzing them in the context of the sentence as a whole.
  40. 40. This is a courtesy call to remind Patrick Dexter of an appointment with Dr. Steel on 10/07/2015 at 10:30 am. About a week before my dentist appointment I receive a phone call reminding me to floss so I don’t get yelled at by the guy with sharp stabby things in my mouth. ! If you have a service that provides outbound phone calls this message is exactly the type of automation that Text To Speech is perfect for.
  41. 41. This is a courtesy call to remind Patrick Dexter of an appointment with Dr. Steel on 10/07/2015 at 10:30 am. In a high call volume environment there’s no way you could record every name. And on the call you really do want to identify a specific person - in this case Patrick Dexter. You don’t want the person to show up to the appointment with their son or daughter when it’s actually their appointment.
  42. 42. This is a courtesy call to remind Patrick Dexter of an appointment with Dr. Steel on 10/07/2015 at 10:30 am. But even with saying the person’s name there’s so much here that a computer can easily make a mistake on. The text to speech software needs to identify this text D R period. Not as durrrr and the end of the sentence. But as the abbreviation for Doctor. Doctor Steel is what a person reading this would think so that’s what the engine has to say.
  43. 43. This is a courtesy call to remind Patrick Dexter of an appointment with Dr. Steel on 10/07/2015 at 10:30 am. So getting back to our first example. The text to speech engine should be able to correctly interpret this text as a date. It will look at the sentence and know that at least in English on is a preposition that typically comes before a date so that’s a very good clue and it’s English so the format is month date year. This gives you an idea of the type of rules that are built into TTS software. Now some of the fun research that’s being done in the field of computational linguistics is how to apply more artificial intelligence to this process rather than strict rule based decision trees.
  44. 44. This is a courtesy call to remind Patrick Dexter of an appointment with Dr. Steel on 10/07/2015 at 10:30 am. And we’re not done yet. At the end of our sentence again there’s something ambiguous. we have the time of the appointment. if the TTS engine reads this as 10 colon 30 ammmm it will cause confusion. ! ! !
  45. 45. Heteronym Another issue is this lovely thing. Does anyone know what a Heteronym is? It’s an evil part of the English language where a single word can mean two different things and is pronounced differently! I don’t believe Spanish has heteronyms but if it does I’d love to find out more.
  46. 46. ! ! @Cepstral_LLC This is where that twitter handle becomes relevant.
  47. 47. Bass The word bass can be a fish
  48. 48. Bass Or it can be pronounced bass and in music mean the deep low end. As in the bass clef versus the treble clef.
  49. 49. Object This is a fun word that can be either a verb or a noun
  50. 50. Me opongo a ese objeto. In spanish this sentence may make sense. I’m really hoping. I used Google Translate. But you have two different words for the verb and noun
  51. 51. I object to that object. But in English the sentence is I object to that object. Do you hear the two different pronunciations of the exact same word? object versus OB ject. If you say it I object to that object to an English speaker it doesn’t make sense.
  52. 52. I object verb to that object. The engine can figure out the part of speech to help determine the pronunciation. Here object is a verb
  53. 53. I object to that object noun. and a noun. This functionality is called a part of speech tagger and it’s very helpful in a text to speech engine. !
  54. 54. $10 per day Currency interpretation is also important and is something we see all of the time in outbound phone call campaigns. Maybe this is a phone call to remind someone of a fine they will have to pay. Or a utility bill. We’d read this as 10 dollars or 10 pesos per day
  55. 55. $10.5 million per day We’d read this as $10 point five million dollars per day. Not 10 period 5 dollars million per day. So the engine has to look at the entire sentence both backwards and forwards in advance in order to understand not only what the text is but how all of the words interoperate. ! Text Normalization occurs first in order to determine what the text as a whole means. periods and commas are interpreted to add pauses, numbers and dates are converted into formats that are more friendly to the ear. Abbreviations and so much more all figured out so that the human computer interaction can occur.
  56. 56. 1 To put it all those pieces together. we have our text to speech software here. ! 1. The text is sent to it
  57. 57. 2 ! 2. The Text Normalization occurs trying to figure out all of those dates and currency, the parts of speech and heteronym information. And User lexicons are checked to see if there are custom pronunciations.
  58. 58. 3 ! 3. The best possible units are selected from hundreds of thousands of examples based on all of those acoustic parameters like pitch, duration and position relative to other phonemes
  59. 59. 4 ! 4. The units are all joined together to generate a wave form
  60. 60. 5 ! 5. And the audio is outputted to the user. Magic! ! In the world of Telephones and Unified Communication Text To Speech isn’t just magic though. it’s an incredibly powerful tool giving you the ability to deliver information to your callers. Let’s take a look at an example of this.
  61. 61. Since Elastix is built on top of Asterisk we can use the existing tools like ODBC database connections for grabbing variable information. !
  62. 62. And the open source module app_swift for linking Cepstral into Asterisk
  63. 63. app_swift we'll be looking at app_swift which is specific to Cepstral text to speech.
  64. 64. MRCP but there's also a protocol called MRCP if you want to use other TTS engines and MRCP is also used to add in speech recognition. It’s great for larger installs as well where you may have multiple Elastix servers or that all need to share TTS resources. Again twitter or see me after the talk with any questions on this.
  65. 65. exten=>n,Swift("Hello! Thank you for calling Cepstral.”|4000|3) ! exten=>n,Set(CALL_TRANSFER=$ {FILTER(0-9,${SWIFT_DTMF})}) Here’s an app_swift example - swift is the name of Cepstral’s TTS engine - so that adds the swift command into your dialplan. It says a simple greeting and then uses Asterisk functionality to listen for a DTMF tone. ! We’ve had customers with 100s of menu options in their IVR that have the entire thing read in realtime by TTS voices. Allowing them to make menu updates and changes on the fly. Want to add a new IVR option? Simply reload extensions.conf and the menu changes. No recording of prompts, no uploading wav files. Very easy to maintain.
  66. 66. exten => 123,8,Set(BALANCE=$ {ODBC_BALANCE(${ACCOUNT})}) ! exten => 123,9,Swift(${BALANCE}) Like I said before with ODBC you can also create very complex database driven systems right in the dialplan. Or you can use AGI to do this. ! Here’s a very simple example of querying a database for an account balance.
  67. 67. exten => 123,8,Set(BALANCE=$ {ODBC_BALANCE(${ACCOUNT})}) ! exten => 123,9,Swift(${BALANCE}) and then using the swift command right in the dialplan to read that back to the caller. And because you have the $ symbol in there Cepstral will perform that text normalization we discussed to read this off as a person would say it. ! You can really expand what’s possible to automate inbound or outbound calls. Do you have a technical support line? Have the caller identify themselves with a Ticket number and read back the last note that a customer service rep left for them. Or a status update on a known system outage. !
  68. 68. https://vimeo.com/84233208 ! There's a fantastic video available from Elastix training that goes into detail on how to install and configure TTS and Elastix and how to set up an AGI script that makes use of the TTS software like this. The video shows you a demo app for employees to find out more information about outstanding loans. ! I'll tweet a link to the video. And I do recommend that you watch it. It’s in Spanish and you can pause and rewatch it over and over. And now that you know how the text to speech software works it will make more sense if you’re adding TTS to your Elastix systems !
  69. 69. • Text To Speech automates the delivery of information. • Grow IVR usage without adding call center employees To end - Text To Speech is a powerful tool that can read any information that you have in your systems. Not only is it useful for traditional IVR systems. But if you have a call center agent or customer service rep reading information to a caller then Text To Speech can automate that. Allowing you to grow call volumes without adding agents. It also frees agents up to handle more difficult tasks that can’t be automated.
  70. 70. ¡Gracias! Thank you very much for the opportunity to speak to all of you today.

×