Google Assistant Revolution: An Introduction to Interfaces and Development

@Rafael_Casuso
2
A B O U T M E
•CTO @Stayapp, CEO
@SnowStormIO
•Organizer @BotDevMad
@VueJSMadrid
•Software Engineer with +10 years
of experience leading teams and
developing.
•Software Architect looking for
revolutionary ways to change the
world.
•Specialties: JavaScript, NodeJS,
Conversational Intelligences.

GOOGLE
ASSISTANT
+ INTRODUCTION

AN INTRODUCTION_
‣ ACTIONS ON GOOGLE ALLOWS BUILDING APPS FOR GOOGLE
ASSISTANT
‣ GOOGLE HOME AND SOME ANDROID
‣ FUTURE UP TO 80% OF WORLD MOBILE 
MARKET
‣ INTEGRATION WITH SEVERAL PLATFORMS, 
INCLUDING API.AI

ADVANTAGES_
‣ BOTH VOICE AND GRAPHIC INTERFACES
‣ A SOLID ECOSYSTEM
‣ API.AI
‣ CLOUD APIS (https://cloud.google.com/apis)
‣ FIREBASE
‣ ONLINE SIMULATOR FOR BOTH GOOGLE HOME AND MOBILE
‣ TRANSACTIONS IN DEVELOPER PREVIEW WITH GOOGLE ACCOUNT

DISADVANTAGES_
‣ DOCUMENTATION INCONSISTENCIES
‣ CURRENTLY ONLY IN ENGLISH
‣ CURRENTLY ONLY IN PARTICULAR DEVICES
‣ GOOGLE HOME
‣ PURE ANDROID DEVICES (PIXEL, ONEPLUS, ETC)

CONVERSATIONAL
USER
INTERFACE
+ DESIGN

CONVERSATIONS BASICS_
‣ Turn-taking
‣ Threading
‣ Leveraging inherent eﬃciency of language
‣ Anticipating variable user behaviour
‣ Understanding cooperative behaviour
‣ Cooperative principle
‣ Paul Grice’s Maxims
‣ Use everyday language
‣ Instilling user conﬁdence

GRICE’S MAXIMS_
‣ The maxim of quantity, where one tries to be as informative as one
possibly can, and gives as much information as is needed, and no
more.
‣ The maxim of quality, where one tries to be truthful, and does not
give information that is false or that is not supported by evidence.
‣ The maxim of relation, where one tries to be relevant, and says things
that are pertinent to the discussion.
‣ The maxim of manner, when one tries to be as clear, as brief, and as
orderly as one can in what one says, and where one avoids obscurity
and ambiguity.

GOOGLE
ASSISTANT
DEVELOPMENT
+ INTERFACES

ACTIONS ON GOOGLE_
‣ Platform to build actions invoked by users to fulﬁll some need
‣ Easy way with API.AI integration
‣ Custom way with ACTIONS SDK
‣ How it works:
‣ User requests an action “Talk to my Hotel Concierge”
‣ Assistant asks Actions on Google to invoke the particular app
‣ The conversation between the user and the app begins
‣ Subsequent user input is sent directly to app until the app
fulﬁlls the intent and ends

ACTIONS_
‣ Actions are entry points into your app that define the invocation
and discovery model for your app. You declare actions in a JSON
file called an action package, which you eventually upload to
your developer project when you want to test or submit your app
for approval
‣ Every app must define one and only one default action that
declares support for the actions.intent.MAIN intent. This intent is
triggered whenever users invoke your app by its name, such as
"Ok Google, talk to Sekai”

ACTIONS_
‣ Actions, intents and conversations
ACTIONS_

GACTIONS CLI_
‣ Executable CLI (Command Line Interface) to link your source
actions with your Actions On Google project
‣ Main commands:
‣ Init: Initiates a default action.json ﬁle for your project
‣ Test: Pushes an action package to Assistant platform for test
‣ Update: Updates an action package related to a project

FULLFILMENT: ACTIONS SDK_
‣ Fulfillment defines the conversational interface for your app to
obtain user input and the logic to process the input and
eventually fulfill the action
‣ Overview:
‣ Initialize the ActionsSDK

‣ Create functions to handle requests

‣ Initialize an action map that maps intents to functions. When
your endpoint receives a request, the client library checks the
intent associated with the request and automatically calls the
appropriate function

SURFACE CAPABILITIES_
‣ Surface capabilities describe
the surface that the user is
experiencing your app on.
Surfaces can have audio
support, screen support, or
both. Actions on Google
returns the capabilities of a
surface to every request to
your fulﬁllment, so you can
use this information to deliver
the right UI

SURFACE CAPABILITIES_
‣ Response Branching
‣ Every time your fulﬁllment receives a request from the Google
Assistant, you can query the following surfaces
‣ Conversation Branching
‣ You can set up API.AI intents to only trigger on certain capabilities
with pre-deﬁned API.AI contexts

SIMPLE RESPONSES_
‣ Simple Responses:
‣ Supported for both audio and text devices
‣ 640 character limit, 300 recommended

RESPONSES: SSML_
‣ When returning a response to the Google Assistant, you can use a subset
of the Speech Synthesis Markup Language (SSML) in your responses
‣ https://www.w3.org/TR/speech-synthesis

RICH RESPONSES_
‣ Rich Responses:
‣ Supported for screen or screen/audio
devices
‣ Can contain:
‣ One or two simple responses
‣ Optional basic card
‣ Optional suggestion chips
‣ An Option Interface:
‣ List of items
‣ Carousel of cards

BASIC CARD_
‣ Basic Card:
devices
‣ Requires image or formatted text:
‣ Text: 500 character limit with image, 
no links, minor markdown allowed
‣ Image: source is url, motion gif allowed, 
lateral gray bars if aspect ratio not same
‣ Optional:
‣ Title
‣ Subtitle
‣ Link button

LIST SELECTOR_
‣ List Selector:
devices
‣ Optional List Title, max 1 line
‣ Each List Item:
‣ Title, max 1 line
‣ Body text, optional, max 2 lines
‣ Image, optional, 48x48px
‣ Pagination shows >5 simple items, >3 items
with image or body text

CAROUSEL SELECTOR_
‣ Carousel Selector:
devices
‣ Min 2 tiles, max 10 tiles
‣ Each Tile:
‣ Title, max 1 line, unique for voice
selection
‣ Image, optional, 128x232dp
‣ Body text, optional, max four lines
‣ Interactions allows swipe and tapping

SUGGESTION CHIPS_
‣ Suggestion Chips:
devices
‣ Max 8
‣ Max 25 characters

HELPERS_
‣ Helpers tell the Assistant to momentarily take over the conversation to
obtain common data such as a user's full name, a date and time, or a
delivery address. Present a standard, consistent UI to users to obtain
this information, so you don't have to design your own.

ONE MORE THING_
‣ IDENTITY
‣ Either through Helpers or Account Linking you can retrieve User
Name, Location, Id and Language
‣ TRANSACTIONS DEVELOPER PREVIEW
‣ Purchase or Reservation
‣ Requires some standard Info (Delivery Address, Cart Assembly,
Checkout, …) with predeﬁned intents
‣ Sign in/Account creation with user’s Google Account and your
user system through Account Linking
‣ SMART HOME
‣ Control IoT devices through the Google Assistant. Connect and
control devices through your existing cloud infrastructure.

API.AI BASICS_
‣ NLU Platform to receive requests and converts them to intents,
parameters

AGENTS_
‣ NLU (Natural Language Understanding) modules. These can be
included in your app, product, or service and transforms natural
user requests into actionable data.
‣ This transformation occurs when a user input matches one of the
intents inside your agent. Intents are the predefined or
developer-defined components of agents that process a user’s
request.
‣ Agents can also be designed to manage a conversation flow in a
specific way. This can be done with the help of contexts, intent
priorities, slot filling, responsibilities, and fulfillment via
webhook.

INTENTS_
‣ Represent a mapping between what a user says and what action
should be taken by your software.
‣ User Says (Expressions)
‣ Natural language expressions annotated with parameters that
are linked to entities
‣ Actions
‣ Trigger-name with associated parameters to perform an action
on the app
‣ Response
‣ You can add Simple Text or Rich Response depending on platform
‣ Contexts
‣ Passing info from other intents or external. Input are prerequisite

ENTITIES_
‣ Significant data extracted from user input in form of parameter
value
‣ Entities are associated to particular actions
‣ There are three types:
‣ System
‣ Pre-built entities provided by API.AI in order to facilitate
handling common concepts (colors, locations,…)
‣ Developer
‣ Custom entities created with Reference Value plus Synonyms
‣ User Entities
‣ Defined for the session, specific playlists for instance

CONTEXTS_
‣ Persisted information that can be used through intents
‣ It can be internal like a particular movie the user is asking for
‣ Or external like the user data retrieved from a user system
‣ Lifespan:
‣ By default they last for 5 requests or 10 minutes
‣ Input Context:
‣ Limit intents to be matched only when certain contexts are set
‣ For example when you need speciﬁc info to perform action
‣ Output Context:
‣ They are tied to user sessions, is shared by the intent
‣ Automatically added to follow-up intents

EVENTS & DIALOGS_
‣ Events is a feature that allows you to invoke intents by an event
name instead of a user query
‣ Dialogs
‣ Linear
‣ With Slot Filling you deﬁne required parameters with prompts
and order them. Agent will ask for them until has all info.
‣ Non-linear
‣ Complex dialogs are formed from context routing, removing
Output Context for Intent Responses, and adding new Output
Context that is matched for next question

MACHINE LEARNING_
‣ Machine Learning is the tool that allows your agent to understand a user's
interactions as natural language and convert them into structured data. In
API.AI terminology, your agent uses machine learning algorithms to match
user requests to speciﬁc intents and uses entities to extract relevant data
from them.
‣ An agent “learns” both from the examples you provide in the User Says
section and the language models developed by API.AI. Based on this data,
it builds a model (algorithm) for making decisions on which intent should
be triggered by a user input and what data needs to be extracted. This
algorithm is unique to your agent.
‣ The algorithm adjusts dynamically according to the changes made in your
agent and in the API.AI platform. To make sure that the algorithm is
improving, your agent needs to constantly be trained using real
conversation logs.

OTHER
COGNITIVE
SERVICES
+ ARTIFICIAL INTELLIGENCE

GOOGLE CLOUD API_
‣ CLOUD VISION
‣ Integrates Google Vision features, including image labeling, face, logo,
and landmark detection, optical character recognition (OCR), and
detection of explicit content, into applications.
‣ https://cloud.google.com/vision/docs/reference/rest/
‣ CLOUD SPEECH
‣ Converts audio to text, synchronously and asynchronously in 80+
diﬀerent languages with a high degree of accuracy
‣ https://cloud.google.com/speech/docs

GOOGLE CLOUD API_
‣ NATURAL LANGUAGE
‣ Provides natural language understanding technologies to developers.
Examples include sentiment analysis, entity recognition, entity
sentiment analysis, and text annotations.
‣ https://cloud.google.com/natural-language/docs/reference/rest
‣ TRANSLATION
‣ Translates over 80+ languages and detect language from speech.
‣ https://cloud.google.com/translate/docs/reference/rest

ACTION:
VOICE-ORIENTED
EXAMPLE
+ THE EVOLUTION

ACTION:
GRAPHIC-ORIENTED
EXAMPLE
+ ACTION!

Google Assistant Revolution: An Introduction to Interfaces and Development

Recommended

Recommended

More Related Content

What's hot

What's hot (11)

Similar to Google Assistant Revolution: An Introduction to Interfaces and Development

Similar to Google Assistant Revolution: An Introduction to Interfaces and Development (20)

More from Rafael Casuso Romate

More from Rafael Casuso Romate (9)

Recently uploaded

Recently uploaded (20)

Google Assistant Revolution: An Introduction to Interfaces and Development