4. Orchestration and virtualization
Docker is an open platform for developers and
sysadmins to build, ship, and run distributed
applications, whether on laptops, data center VMs,
or the cloud.
6. DBMS
MongoDB is a document database with the
scalability and flexibility that you want with the
querying and indexing that you need.
7. Speech Recognizer
Google Cloud Speech API enables developers
to convert audio to text by applying powerful
neural network models in an easy to use API.
8. NLP
Dialogflow is an end-to-end development suite
for building conversational interfaces for
websites, mobile applications, popular messaging
platforms, and IoT devices
9. TTS
Amazon Polly is a cloud service that converts text
into lifelike speech. You can use Amazon Polly to
develop applications that increase engagement
and accessibility.
10. Hotword detector
Snowboy is an highly customizable hotword
detection engine that is embedded real-time and
is always listening (even when off-line)
compatible with Raspberry Pi, (Ubuntu) Linux, and
Mac OS X.
15. I/O Drivers
I/O drivers are the way the AI handles inputs and output.
Every I/O module knows how to handle user input and
output to the user.
I/OUser
Input
App
startInput
Output
16. I/O Drivers
Example of I/O drivers are:
- IO.Telegram: handle I/O for a Telegram bot
- IO.Messenger: handle I/O for a Facebook Messenger bot
- IO.Test: handle I/O using the CLI (used for test purposes)
- IO.Rest: handle I/O via HTTP REST API
- IO.Kid: handle input using microphone and speech
recognizer and output using a TTS via a speaker
17. IO.Kid
It uses your microphone to register your voice; once it detects an hot
word (example: Hey BOT), it sends the stream through an online
speech recognizer.
When you finish to talk, it sends the recognized speech over AI that
returns a fulfillment.
The fulfillment it's sent over an online TTS to get an audio file that is
played over the speaker.
https://github.com/kopiro/otto-ai/blob/master/src/io/kid.js
18. IO.Kid: HW to SR
Client
IO.Kid
User
User says:
"Hey, Otto"
SR
Redirect microphone
stream to SR
19. IO.Kid: SR to NLP
Client
IO.Kid
User says:
"What time is it?"
NLPSR
"What time is it?"
20. IO.Kid: NLP to Fulfilment to TTS
Client
IO.Kid
{ "action": "date.now" }
TTSNLP
It's 18.15
Server
Webhook for action resolution
21. IO.Kid: TTS to Speaker
Client
IO.Kid
audio.mp3
SpeakerTTS
Output
22. IO.Telegram
It listens via webhook (or via polling) the chat events of your Telegram
bot, send the text over AI that return an output.
The output is used to respond to the user request via Telegram.
https://github.com/kopiro/otto-ai/blob/master/src/io/telegram.js
23. IO.Telegram: Hotword to NLP
Server
IO.Telegram
Telegram
User
{ "type": "text",
"text": "What time is it?" }
NLP
"What time is it?"
24. IO.Kid: NLP to Fulfillment to Telegram
Server
IO.Telegram
{ "action": "date.now" }
NLP
It's 18.15
Server
Telegram
User
Webhook for action resolution
25. I/O Accessories
I/O Accessories are similar to drivers, but don't handle input and output
directly.
They can be attached to I/O driver to perform additional things.
Example of I/O accessories are:
- Chromecast
- GPIO_Button
- Leds
- Mopidy
26. I/O Accessories
Accessories listen for I/O drivers events and, when an output to a driver is
request, this output could be forwarded to accessories.
Each accessory has a method called canHandleOutput that should return:
- YES_AND_BREAK
- YES_AND_CONTINUE
- NO
Depending on this return value, the IOManager forward the output to the next
configured driver or stops the chain.
Example: https://github.com/kopiro/otto-ai/blob/master/src/io_accessories/
chromecast.js
31. Actions
An action corresponds to the step your application will take when a
specific intent has been triggered by a user’s input.
In the library, is a responder for an intent that has logic inside.
exports.id = 'hello.name';
module.exports = async function({ sessionId, result }, session) {
let { parameters: p, fulfillment } = result;
if (p.name == null) throw 'Invalid parameters';
return {
speech: `Hello ${p.name}!`
};
};
32. Actions: local vs remote
Each action can potentially run on the server or on the client.
This can be possibile thanks to the architecture based on the same
language (NodeJS) for both platforms.
In the intent, you can specify if this action should preferably run in
the server on in a client.
For example, a very computationally intensive action (algorithm to
detect next move in a chess game) should run in a powerful server and
only return the output.
33. Actions: trust boundary
Local
Action
Remote
Action
Internal network trusted boundary
Denied OK
Instead, if you have to control your home lights, you should run the action locally on the
client to take advantage that the client is in the same Wi-Fi network with your lights,
avoiding to expose your IoT things over Internet.
34. Action (ran in server mode)
NLP Server Action
1 2
34
Client
50
37. Fulfillment
A fulfillment is the output of an intent, whether it was performed by
an action or a simple output string.
38. Fulfillment transformer
Every fulfillment passes into a transformer where it could be filtered or
altered is some ways.
async function fulfillmentTransformer(fulfillment, session) {
fulfillment = fulfillmentSanitizer(fulfillment);
_.defaults(fulfillment.data, fulfillment.payload);
if (!_.isEmpty(fulfillment.speech)) {
fulfillment.speech = await Translator.translate(
fulfillment.speech,
session.getTranslateTo()
);
}
return fulfillment;
}
39. Fulfillment types
speech | String that could be spoken or written
data.error | Error object to send.
data.language | Language override for speech.
data.replies[] | List of choices that the user can select.
data.url | URL to send or to open
data.music | Music to send or to play.
data.feedback | Boolean value indicating that this is
temporary feedback until the real response will be sent
data.game | Game that can be handled via Telegram.
data.video | Video to send or to show.
data.audio | Audio to send or to show.
data.image | Image to send or to show.
data.lyrics | Lyrics object of a song.
data.voice | Audio file to send or play via voice middlewares.
43. Intent implemented right now
Akinator
Uses machine learning to guess a
celebrity
Alarm
Set alarms, meeting, timers
Chess
Plays chess with a MinMax algorithm
Coinflip
Do a coinflip
Date.now
Tell current date
Gocrazy
Say random words
Lyrics.Search
Search a lyrics from a track
Lyrics.Track
Search a track from a lyrics
Metronome
Do a metronome
SCF
Play Sasso-Carta-Forbice
44. Intent implemented right now
Torrent.Download
Search and download torrents
Translate.Text
Translate a text in various languages
Weather.Search
Get informations about weather
Music.*
Search music on Spotify and play over a
speaker or Chromecast
Youtube.*
Search videos on Youtube and play over
a Chromecast
Lights.*
Power on/off Xiaomi Lights, change color
or intensity
Draw
Search an image
Knowledge.Get
Uses WolframAlpha to get all kind of
universal knowledge
Camera.Spy
Record a video and upload in Cloud
SmallTalk.*
All kind of dialogues.
Thanks @ValentinaCiav
45. How to write and test an action
git clone https://github.com/kopiro/otto-ai/
... configure ...
cp ./src/actions/__example.js ./src/actions/namespace/newaction.js
... develop ...
node main.js
48. Re-Speaker 2-Mics Pi HAT
PowerBoost 500 Charger
Additional hardware
LiPo Battery 3.5V
Push Button On/Off Switch Button
Speaker
49. Re-Speaker 2-Mics Pi HAT
The board is developed based on WM8960, a low power stereo codec.
There are 2 microphones on both sides of the board for collecting sounds and it also provides
3 APA102 RGB LEDs, 1 User Button and 2 on-board Grove interfaces for expanding your
applications.
50. PowerBoost 500 Charger
With a built-in battery charger circuit, you'll be able to keep your project running even while
recharging the battery!
This little DC/DC boost converter module can be powered by any 3.7V LiIon/LiPoly battery, and
convert the battery output to 5.2V DC for running your 5V projects.