Otto AI

Otto was my monkey plush, now
is my vocal assistant.

Orchestration and virtualization
Docker is an open platform for developers and
sysadmins to build, ship, and run distributed
applications, whether on laptops, data center VMs,
or the cloud.

Runtime
Node.js® is a JavaScript runtime built on Chrome's
V8 JavaScript engine.

DBMS
MongoDB is a document database with the
scalability and ﬂexibility that you want with the
querying and indexing that you need.

Speech Recognizer
Google Cloud Speech API enables developers
to convert audio to text by applying powerful
neural network models in an easy to use API.

NLP
Dialogﬂow is an end-to-end development suite
for building conversational interfaces for
websites, mobile applications, popular messaging
platforms, and IoT devices

TTS
Amazon Polly is a cloud service that converts text
into lifelike speech. You can use Amazon Polly to
develop applications that increase engagement
and accessibility.

Hotword detector
Snowboy is an highly customizable hotword
detection engine that is embedded real-time and
is always listening (even when off-line)
compatible with Raspberry Pi, (Ubuntu) Linux, and
Mac OS X.

Architecture for client mode
TTS Server
Client
Database
NLP
SR

Architecture for messaging bots
Server
Server listens for incoming requests by messaging platforms

I/O Drivers
I/O drivers are the way the AI handles inputs and output. 
Every I/O module knows how to handle user input and
output to the user.
I/OUser
Input
App
startInput
Output

I/O Drivers
Example of I/O drivers are:
- IO.Telegram: handle I/O for a Telegram bot 
- IO.Messenger: handle I/O for a Facebook Messenger bot 
- IO.Test: handle I/O using the CLI (used for test purposes) 
- IO.Rest: handle I/O via HTTP REST API 
- IO.Kid: handle input using microphone and speech
recognizer and output using a TTS via a speaker

IO.Kid
It uses your microphone to register your voice; once it detects an hot
word (example: Hey BOT), it sends the stream through an online
speech recognizer.
When you finish to talk, it sends the recognized speech over AI that
returns a fulfillment. 
 
The fulfillment it's sent over an online TTS to get an audio file that is
played over the speaker.
https://github.com/kopiro/otto-ai/blob/master/src/io/kid.js

IO.Kid: HW to SR
Client
IO.Kid
User
User says: 
"Hey, Otto"
SR
Redirect microphone 
stream to SR

IO.Kid: SR to NLP
Client
IO.Kid
User says: 
"What time is it?"
NLPSR
"What time is it?"

IO.Kid: NLP to Fulﬁlment to TTS
Client
IO.Kid
{ "action": "date.now" }
TTSNLP
It's 18.15
Server
Webhook for action resolution

IO.Kid: TTS to Speaker
Client 
IO.Kid
audio.mp3
SpeakerTTS
Output

IO.Telegram
It listens via webhook (or via polling) the chat events of your Telegram
bot, send the text over AI that return an output. 
The output is used to respond to the user request via Telegram.
https://github.com/kopiro/otto-ai/blob/master/src/io/telegram.js

IO.Telegram: Hotword to NLP
Server 
IO.Telegram
Telegram 
User
{ "type": "text", 
"text": "What time is it?" }
NLP
"What time is it?"

IO.Kid: NLP to Fulﬁllment to Telegram
Server
IO.Telegram
{ "action": "date.now" }
NLP
It's 18.15
Server
Telegram 
User
Webhook for action resolution

I/O Accessories
I/O Accessories are similar to drivers, but don't handle input and output
directly. 
They can be attached to I/O driver to perform additional things.
Example of I/O accessories are: 
- Chromecast 
- GPIO_Button 
- Leds 
- Mopidy

I/O Accessories
Accessories listen for I/O drivers events and, when an output to a driver is
request, this output could be forwarded to accessories.
Each accessory has a method called canHandleOutput that should return: 
- YES_AND_BREAK 
- YES_AND_CONTINUE 
- NO 
 
Depending on this return value, the IOManager forward the output to the next
conﬁgured driver or stops the chain.
Example: https://github.com/kopiro/otto-ai/blob/master/src/io_accessories/
chromecast.js

Intents
An intent represents a mapping between what a user says
and what action should be taken by your software.

Entities
Entities are tools used for extracting parameters.

Actions
An action corresponds to the step your application will take when a
speciﬁc intent has been triggered by a user’s input.
In the library, is a responder for an intent that has logic inside.
exports.id = 'hello.name';
module.exports = async function({ sessionId, result }, session) {
let { parameters: p, fulfillment } = result;
if (p.name == null) throw 'Invalid parameters';
return {
speech: `Hello ${p.name}!`
};
};

Actions: local vs remote
Each action can potentially run on the server or on the client. 
This can be possibile thanks to the architecture based on the same
language (NodeJS) for both platforms.
In the intent, you can specify if this action should preferably run in
the server on in a client. 
 
For example, a very computationally intensive action (algorithm to
detect next move in a chess game) should run in a powerful server and
only return the output.

Actions: trust boundary
Local 
Action
Remote
Action
Internal network trusted boundary
Denied OK
Instead, if you have to control your home lights, you should run the action locally on the
client to take advantage that the client is in the same Wi-Fi network with your lights,
avoiding to expose your IoT things over Internet.

Action (ran in server mode)
NLP Server Action
1 2
34
Client
50

Action (ran in client mode)
NLP
Action
2
3
Client
10

Fulﬁllment
A fulﬁllment is the output of an intent, whether it was performed by
an action or a simple output string.

Fulfillment transformer
Every fulfillment passes into a transformer where it could be filtered or
altered is some ways.
async function fulfillmentTransformer(fulfillment, session) {
fulfillment = fulfillmentSanitizer(fulfillment);
_.defaults(fulfillment.data, fulfillment.payload);
if (!_.isEmpty(fulfillment.speech)) {
fulfillment.speech = await Translator.translate(
fulfillment.speech,
session.getTranslateTo()
);
}
return fulfillment;
}

Fulﬁllment types
speech | String that could be spoken or written
data.error | Error object to send.
data.language | Language override for speech.
data.replies[] | List of choices that the user can select.
data.url | URL to send or to open
data.music | Music to send or to play.
data.feedback | Boolean value indicating that this is
temporary feedback until the real response will be sent
data.game | Game that can be handled via Telegram.
data.video | Video to send or to show.
data.audio | Audio to send or to show.
data.image | Image to send or to show.
data.lyrics | Lyrics object of a song.
data.voice | Audio ﬁle to send or play via voice middlewares.

User
NLP Intent
Action Fulﬁllment
Output
Input
User

Intent implemented right now
Akinator 
Uses machine learning to guess a
celebrity 
 
Alarm 
Set alarms, meeting, timers 
 
Chess 
Plays chess with a MinMax algorithm 
 
Coinﬂip 
Do a coinﬂip
Date.now 
Tell current date 
Gocrazy 
Say random words 
 
Lyrics.Search 
Search a lyrics from a track 
 
Lyrics.Track 
Search a track from a lyrics 
 
Metronome 
Do a metronome
SCF 
Play Sasso-Carta-Forbice

Intent implemented right now
Torrent.Download 
Search and download torrents 
 
Translate.Text 
Translate a text in various languages 
 
Weather.Search 
Get informations about weather 
 
Music.* 
Search music on Spotify and play over a
speaker or Chromecast
Youtube.* 
Search videos on Youtube and play over
a Chromecast
Lights.* 
Power on/off Xiaomi Lights, change color
or intensity 
 
Draw 
Search an image 
 
Knowledge.Get 
Uses WolframAlpha to get all kind of
universal knowledge 
 
Camera.Spy 
Record a video and upload in Cloud
SmallTalk.* 
All kind of dialogues.  
Thanks @ValentinaCiav

How to write and test an action
git clone https://github.com/kopiro/otto-ai/ 
 
... configure ...
cp ./src/actions/__example.js ./src/actions/namespace/newaction.js
 
... develop ... 
 
node main.js

Base hardware
Raspberry PI Zero W

Re-Speaker 2-Mics Pi HAT
PowerBoost 500 Charger
Additional hardware
LiPo Battery 3.5V
Push Button On/Off Switch Button
Speaker

Re-Speaker 2-Mics Pi HAT
The board is developed based on WM8960, a low power stereo codec. 
There are 2 microphones on both sides of the board for collecting sounds and it also provides
3 APA102 RGB LEDs, 1 User Button and 2 on-board Grove interfaces for expanding your
applications.

PowerBoost 500 Charger
With a built-in battery charger circuit, you'll be able to keep your project running even while
recharging the battery! 
This little DC/DC boost converter module can be powered by any 3.7V LiIon/LiPoly battery, and
convert the battery output to 5.2V DC for running your 5V projects.

ENGND
USB 5V
5V GND
GPIO 
PINS
GPIO8
GND

Otto AI

Recommended

Recommended

More Related Content

What's hot

What's hot (14)

Similar to Otto AI

Similar to Otto AI (20)

Recently uploaded

Recently uploaded (20)

Otto AI