Amazon Web Services release the new AI services during last re:Invent, here we see a little introduction to them and a simple integration with a Lego robot
18. Polly
Lifelike Text To Speech
- 47 voices across 24
languages
- Low latency
- Free to reuse
19. Polly
Quality
- Natural sounding speech
A subjective measure of how close TTS output is to human speech.
- Accurate text processing
Ability of the system to interpret common text formats such as abbreviations, numerical sequences,
homographs etc.
- Today in Las Vegas, NV it's 54°F.
- "We live for the music", live from the Madison Square Garden.
- Highly intelligibile
A measure of how comprehensible speech is.
- ”Peter Piper picked a peck of pickled peppers.”
20. Polly
Quality
- Lexicons
Enables developers to customize the pronunciations of word or phrases
<lexeme>
<grapheme>Kaja</grapheme>
<grapheme>Kaja</grapheme>
<grapheme>Kaja</grapheme>
<phoneme>kaI.@</phoneme>
</lexeme>
- Homographs
Words written identically that have different pronunciation
- Proper Names
21. Polly
Quality
- Text Normalization
Disambiguation of abbreviations, acronyms, units (St. -> street/saint, KM -> kilometers)
- Foreign Words
Use of right pronunciation from a different language (C’est la vie, dèjà vu)
- Slang
Support for common used way of saying (ASAP, LOL, ROTFL)
- Prosody / Intonation contour
Prediction of changes in volume, rate, and pitch
22. Polly
SSML
Speech Synthesis Markup Language
is a W3C recommendation, an XML-based markup language for speech
synthesis applications
<speak>
My name is Kuklinski. It is spelled
<prosody rate=’x-slow’>
<say-as interpret-as=”characters”>Kuklinski</say-as>
</prosody>
</speak>
24. Raspberry PI 3
- AWS services with Java SDK
- Bluetooth speaker
- USB Webcam for image and audio input
- Send Python scripts to EV3Dev
- Send audio but receive text to handle different
cases
25. EV3Dev
- Receive and exec Python scripts
- Interact with motors and sensors
- It’s only a ARM9 300MHz 64MB
- EV3Dev Debian based distro with kernel
modules for Lego Mindstorm EV3 hardware
26. What was hard
- javax.sound API
- Choice of mic (success)
- Play wav file (fail)
- Bluetooth pair CLI
-
- Debugging Lex
27. TODO
- Use AWS IoT button to start/stop audio
recording (no gui needed)
- Alexa Voice Service integration
- Avoid SSH connection between RaspberryPI and
EV3Dev (what?)
- Replace EV3 with BrickPI?