SSML - Why Bother?

SSML - Making your skills
sound right
@RichMerrett815
@VeniLoqui
#alexadevscamb

What is it?
● Speech Synthesis Markup Language
● XML Based
● W3C
● Gives you extra control over the speech in your skills
● Uses tags like HTML
● Ever heard anything pronounced wrong by Alexa? They haven’t used SSML!
● One of our brands requires it
○ <phoneme alphabet="ipa" ph="vɒks'ɛl'əɹeɪt">VoxLR8</phoneme>
● Disclaimer: Amazon have a great site guiding you through SSML
https://developer.amazon.com/docs/custom-skills/speech-synthesis-markup-language-ssml-reference.html

What can it be used for?
● Pauses
● Emphasis
● Phonemes
● Prosody
● Saying as
● Tense/Word Type
● Amazon Effects
● Adding short audio files
● Language - New
● Amazon Polly - New
● Speechcons

<speak>
● Need to include all SSML within the <speak></speak> tags
● Within your index.js file you need to set the output speech type to SSML
(default is plain text)
○ Workaround for having to add the <speak> tags to every bit of speech.

<break>
● Break tag adds in pauses.
● Good to use in place of commas, full stops and <p> if your skill is multi modal (including cards)
● Attributes
○ Strength
■ Medium (equivalent of a comma)
■ Strong (equivalent of a full stop or <s>)
■ X-strong (equivalent of a <p>)
This pause <break strength= "medium"/> is a medium pause
○ Time
■ Seconds (s)
■ Milliseconds (ms)
This pause <break time= "2s"/> is a two second pause
This pause <break time= "300ms"/> is a three hundred millisecond pause

<emphasis>
● Changes the rate and volume of the speech
● The more emphasis, the more like you are trying to order a cheeseburger in
France (Louder and Slower)
● Attributes
○ Strong - increase volume and slow down speaking rate
○ Moderate - increase volume and slow down speaking rate (but not as much as strong)
○ Reduced - Decrease volume and speed up speaking rate
I said I would <emphasis level="strong">really like a cheeseburger please </emphasis>
I said I would <emphasis level="reduced">really like a cheeseburger please </emphasis>

<phoneme>
● Allows you to program Alexa how to pronounce something.
● Attributes
○ Alphabet (phonetic alphabet to use)
■ Ipa (International Phonetic Alphabet)
■ X-sampa (Extended Speech Assessment Methods Phonetic Alphabet)
○ ph (the phonetic pronunciation to speak - symbols)
● Symbols have slight variations across countries.

<prosody>
● Changes the volume, pitch and rate of speech.
● Attributes
○ Rate - change the rate of speech
■ X-slow, slow, medium, fast, x-fast
■ 100% - no change
■ > 100% increase rate
■ < 100% decrease rate (min 20%)
○ Pitch - raise or lower the tone of the speech
■ X-low, low, medium, high, x-high
■ 1% - 51% increase pitch
■ -1% - -33.3% decrease the pitch
○ Volume - change the volume of the speech relative to the current volume level.
■ Silent, x-soft, soft, medium, loud, x-loud
■ +0.01db - 4.08db increase volume
■ -0.01db - -6db decrease volume

<say-as>
● Determines how text should be interpreted.
● Attribute:
○ Interpret-as:
■ Characters, spell-out
■ Cardinal, number
■ Ordinal
■ Digits
■ Fraction
■ Unit
■ Date - can specify the format, using the ‘format’ attribute
■ Time
■ Telephone
■ Address
■ Interjection
■ Expletive

<w>
● Customises the pronunciation of words by specifying the word’s part of
speech.
● Attribute
○ Role
■ amazon:VB - Verb
■ amazon:VBD - past participle
■ amazon:NN - noun
■ amazon:SENSE_1 - where there are different meanings (Homographs) i.e. Bass and
Bass, Bow and Bow, Wind and Wind.

<amazon:effect>
● Specialist Amazon effects. Cannot be used elsewhere.
● Attribute:
○ Name
■ Whispered

<audio>
● Allows you to insert MP3 files into the speech
○ HTTPS endpoint
○ No customer specific information
○ Valid MP3 file (MPEG v2)
○ No longer than 90 seconds
○ Bit rate = 48kbps
○ Sample rate = 16000Hz
● Attribute
○ Src
● Alexa Sound Library - NEW!

<lang> NEW!
● The language tag allows you to make the text spoken as it should be in the
language it is written in
● Attribute
○ xml:lang
■ Supports all Amazon Polly languages

<voice>
● Can use an Amazon Polly voice in your skills response
○ 50+ voices in 25+ languages
● Dependant on locale what tags you use.
● <lang> can be used with it.

Speechcons
● Speech Emoji’s
● Part of the <say-as> tag using the interpret as “interjection” attribute.
○ <say-as interpret-as="interjection">Wow.</say-as>
● Supported in the languages Alexa is available for
https://developer.amazon.com/docs/custom-skills/speechcon-reference-interjections-english-uk.html

SSML Top Tips...from the front line
● Use the voice simulator to test every bit of speech
○ At the very least the key parts.
● Design for multi-modal (whether you are using it or not)
● Sound each word out aloud so you can hear what each element sounds like
● Trial and error with phonemes
● Go back to school
● Utilise other voices
● Use audio files
● When creating other types of media e.g. webpages, run them through the
voice simulator before publishing. It's a great way to pick up mistakes!

<phoneme alphabet="ipa" ph="θ'ænk
ju">Thank You</phoneme>
@RichMerrett815
@VeniLoqui
#alexadevscamb

SSML - Why Bother?

Recommended

Recommended

More Related Content

Similar to SSML - Why Bother?

Similar to SSML - Why Bother? (20)

Recently uploaded

Recently uploaded (20)

SSML - Why Bother?