This document provides an introduction to building conversational applications for Google Assistant using Actions on Google and API.AI. It discusses the key interfaces and development tools for Google Assistant, including Actions on Google which allows building apps that can be invoked by voice, API.AI for natural language understanding, and various response types and capabilities depending on whether the application is voice-only or includes visual/graphic interfaces. The document also briefly mentions other related Google Cloud APIs for artificial intelligence capabilities like vision, speech, translation, and natural language processing.
2. @Rafael_Casuso
2
A B O U T M E
•CTO @Stayapp, CEO
@SnowStormIO
•Organizer @BotDevMad
@VueJSMadrid
•Software Engineer with +10 years
of experience leading teams and
developing.
•Software Architect looking for
revolutionary ways to change the
world.
•Specialties: JavaScript, NodeJS,
Conversational Intelligences.
4. AN INTRODUCTION_
‣ ACTIONS ON GOOGLE ALLOWS BUILDING APPS FOR GOOGLE
ASSISTANT
‣ GOOGLE HOME AND SOME ANDROID
‣ FUTURE UP TO 80% OF WORLD MOBILE
MARKET
‣ INTEGRATION WITH SEVERAL PLATFORMS,
INCLUDING API.AI
5. ADVANTAGES_
‣ BOTH VOICE AND GRAPHIC INTERFACES
‣ A SOLID ECOSYSTEM
‣ API.AI
‣ CLOUD APIS (https://cloud.google.com/apis)
‣ FIREBASE
‣ ONLINE SIMULATOR FOR BOTH GOOGLE HOME AND MOBILE
‣ TRANSACTIONS IN DEVELOPER PREVIEW WITH GOOGLE ACCOUNT
8. CONVERSATIONS BASICS_
‣ Turn-taking
‣ Threading
‣ Leveraging inherent efficiency of language
‣ Anticipating variable user behaviour
‣ Understanding cooperative behaviour
‣ Cooperative principle
‣ Paul Grice’s Maxims
‣ Use everyday language
‣ Instilling user confidence
9. GRICE’S MAXIMS_
‣ The maxim of quantity, where one tries to be as informative as one
possibly can, and gives as much information as is needed, and no
more.
‣ The maxim of quality, where one tries to be truthful, and does not
give information that is false or that is not supported by evidence.
‣ The maxim of relation, where one tries to be relevant, and says things
that are pertinent to the discussion.
‣ The maxim of manner, when one tries to be as clear, as brief, and as
orderly as one can in what one says, and where one avoids obscurity
and ambiguity.
12. ACTIONS ON GOOGLE_
‣ Platform to build actions invoked by users to fulfill some need
‣ Easy way with API.AI integration
‣ Custom way with ACTIONS SDK
‣ How it works:
‣ User requests an action “Talk to my Hotel Concierge”
‣ Assistant asks Actions on Google to invoke the particular app
‣ The conversation between the user and the app begins
‣ Subsequent user input is sent directly to app until the app
fulfills the intent and ends
13. ACTIONS_
‣ Actions are entry points into your app that define the invocation
and discovery model for your app. You declare actions in a JSON
file called an action package, which you eventually upload to
your developer project when you want to test or submit your app
for approval
‣ Every app must define one and only one default action that
declares support for the actions.intent.MAIN intent. This intent is
triggered whenever users invoke your app by its name, such as
"Ok Google, talk to Sekai”
15. GACTIONS CLI_
‣ Executable CLI (Command Line Interface) to link your source
actions with your Actions On Google project
‣ Main commands:
‣ Init: Initiates a default action.json file for your project
‣ Test: Pushes an action package to Assistant platform for test
‣ Update: Updates an action package related to a project
16. FULLFILMENT: ACTIONS SDK_
‣ Fulfillment defines the conversational interface for your app to
obtain user input and the logic to process the input and
eventually fulfill the action
‣ Overview:
‣ Initialize the ActionsSDK
18. FULLFILMENT: ACTIONS SDK_
‣ Initialize an action map that maps intents to functions. When
your endpoint receives a request, the client library checks the
intent associated with the request and automatically calls the
appropriate function
19. SURFACE CAPABILITIES_
‣ Surface capabilities describe
the surface that the user is
experiencing your app on.
Surfaces can have audio
support, screen support, or
both. Actions on Google
returns the capabilities of a
surface to every request to
your fulfillment, so you can
use this information to deliver
the right UI
20. SURFACE CAPABILITIES_
‣ Response Branching
‣ Every time your fulfillment receives a request from the Google
Assistant, you can query the following surfaces
‣ Conversation Branching
‣ You can set up API.AI intents to only trigger on certain capabilities
with pre-defined API.AI contexts
21. SIMPLE RESPONSES_
‣ Simple Responses:
‣ Supported for both audio and text devices
‣ 640 character limit, 300 recommended
22. RESPONSES: SSML_
‣ When returning a response to the Google Assistant, you can use a subset
of the Speech Synthesis Markup Language (SSML) in your responses
‣ https://www.w3.org/TR/speech-synthesis
23. RICH RESPONSES_
‣ Rich Responses:
‣ Supported for screen or screen/audio
devices
‣ Can contain:
‣ One or two simple responses
‣ Optional basic card
‣ Optional suggestion chips
‣ An Option Interface:
‣ List of items
‣ Carousel of cards
24. BASIC CARD_
‣ Basic Card:
‣ Supported for screen or screen/audio
devices
‣ Requires image or formatted text:
‣ Text: 500 character limit with image,
no links, minor markdown allowed
‣ Image: source is url, motion gif allowed,
lateral gray bars if aspect ratio not same
‣ Optional:
‣ Title
‣ Subtitle
‣ Link button
26. LIST SELECTOR_
‣ List Selector:
‣ Supported for screen or screen/audio
devices
‣ Optional List Title, max 1 line
‣ Each List Item:
‣ Title, max 1 line
‣ Body text, optional, max 2 lines
‣ Image, optional, 48x48px
‣ Pagination shows >5 simple items, >3 items
with image or body text
28. CAROUSEL SELECTOR_
‣ Carousel Selector:
‣ Supported for screen or screen/audio
devices
‣ Min 2 tiles, max 10 tiles
‣ Each Tile:
‣ Title, max 1 line, unique for voice
selection
‣ Image, optional, 128x232dp
‣ Body text, optional, max four lines
‣ Interactions allows swipe and tapping
31. HELPERS_
‣ Helpers tell the Assistant to momentarily take over the conversation to
obtain common data such as a user's full name, a date and time, or a
delivery address. Present a standard, consistent UI to users to obtain
this information, so you don't have to design your own.
33. ONE MORE THING_
‣ IDENTITY
‣ Either through Helpers or Account Linking you can retrieve User
Name, Location, Id and Language
‣ TRANSACTIONS DEVELOPER PREVIEW
‣ Purchase or Reservation
‣ Requires some standard Info (Delivery Address, Cart Assembly,
Checkout, …) with predefined intents
‣ Sign in/Account creation with user’s Google Account and your
user system through Account Linking
‣ SMART HOME
‣ Control IoT devices through the Google Assistant. Connect and
control devices through your existing cloud infrastructure.
35. API.AI BASICS_
‣ NLU Platform to receive requests and converts them to intents,
parameters
36. AGENTS_
‣ NLU (Natural Language Understanding) modules. These can be
included in your app, product, or service and transforms natural
user requests into actionable data.
‣ This transformation occurs when a user input matches one of the
intents inside your agent. Intents are the predefined or
developer-defined components of agents that process a user’s
request.
‣ Agents can also be designed to manage a conversation flow in a
specific way. This can be done with the help of contexts, intent
priorities, slot filling, responsibilities, and fulfillment via
webhook.
37. INTENTS_
‣ Represent a mapping between what a user says and what action
should be taken by your software.
‣ User Says (Expressions)
‣ Natural language expressions annotated with parameters that
are linked to entities
‣ Actions
‣ Trigger-name with associated parameters to perform an action
on the app
‣ Response
‣ You can add Simple Text or Rich Response depending on platform
‣ Contexts
‣ Passing info from other intents or external. Input are prerequisite
38. ENTITIES_
‣ Significant data extracted from user input in form of parameter
value
‣ Entities are associated to particular actions
‣ There are three types:
‣ System
‣ Pre-built entities provided by API.AI in order to facilitate
handling common concepts (colors, locations,…)
‣ Developer
‣ Custom entities created with Reference Value plus Synonyms
‣ User Entities
‣ Defined for the session, specific playlists for instance
39. CONTEXTS_
‣ Persisted information that can be used through intents
‣ It can be internal like a particular movie the user is asking for
‣ Or external like the user data retrieved from a user system
‣ Lifespan:
‣ By default they last for 5 requests or 10 minutes
‣ Input Context:
‣ Limit intents to be matched only when certain contexts are set
‣ For example when you need specific info to perform action
‣ Output Context:
‣ They are tied to user sessions, is shared by the intent
‣ Automatically added to follow-up intents
40. EVENTS & DIALOGS_
‣ Events is a feature that allows you to invoke intents by an event
name instead of a user query
‣ Dialogs
‣ Linear
‣ With Slot Filling you define required parameters with prompts
and order them. Agent will ask for them until has all info.
‣ Non-linear
‣ Complex dialogs are formed from context routing, removing
Output Context for Intent Responses, and adding new Output
Context that is matched for next question
41. MACHINE LEARNING_
‣ Machine Learning is the tool that allows your agent to understand a user's
interactions as natural language and convert them into structured data. In
API.AI terminology, your agent uses machine learning algorithms to match
user requests to specific intents and uses entities to extract relevant data
from them.
‣ An agent “learns” both from the examples you provide in the User Says
section and the language models developed by API.AI. Based on this data,
it builds a model (algorithm) for making decisions on which intent should
be triggered by a user input and what data needs to be extracted. This
algorithm is unique to your agent.
‣ The algorithm adjusts dynamically according to the changes made in your
agent and in the API.AI platform. To make sure that the algorithm is
improving, your agent needs to constantly be trained using real
conversation logs.
43. GOOGLE CLOUD API_
‣ CLOUD VISION
‣ Integrates Google Vision features, including image labeling, face, logo,
and landmark detection, optical character recognition (OCR), and
detection of explicit content, into applications.
‣ https://cloud.google.com/vision/docs/reference/rest/
‣ CLOUD SPEECH
‣ Converts audio to text, synchronously and asynchronously in 80+
different languages with a high degree of accuracy
‣ https://cloud.google.com/speech/docs
44. GOOGLE CLOUD API_
‣ NATURAL LANGUAGE
‣ Provides natural language understanding technologies to developers.
Examples include sentiment analysis, entity recognition, entity
sentiment analysis, and text annotations.
‣ https://cloud.google.com/natural-language/docs/reference/rest
‣ TRANSLATION
‣ Translates over 80+ languages and detect language from speech.
‣ https://cloud.google.com/translate/docs/reference/rest