Artificial intelligence and conversational search are having their big moment right now, and it’s easy to see why. Its relevant to all of us. Its potential to enhance our daily lives is the foundation of its widespread adoption…but measuring our satisfaction correctly *in the moment* is the key to its ultimate success – or failure. The future of search lies in the ability of conversational UI to make human-to-computer interactions correct, relevant and useful. The satisfaction metric is the key to moving search ahead, making personal AI assistants essential sidekicks in everyday life. The more our personal AIs “get” us, the more we want to talk with them.
In this session, Ozlo’s Principal Engineering Lead Heidi Young, who has lived and breathed search for more than 10 years, will discuss how the success of AI-driven assistants – and their ability to enhance our lives – depends on a specific satisfaction metric. By breaking down human-to-computer interactions and focusing on immediate feedback loops (micro-level metrics) instead of solely on lagging indicators (macro-level metrics), satisfaction is more effectively measured. This type of engagement is a game-changer is measuring satisfaction and indicating happiness. It should be a top priority of AI architects, since users who have positively ending conversations come back. This is the key to the new personal AI assistant revolution.
Heidi Young - The Future of Search: How Measuring Satisfaction Will Enhance Our Personal AIs and Our Lives - Seattle Interactive 2016
1. The Future of Search:
How Measuring Satisfaction Will
Enhance Our Personal AIs and
Our Lives
Heidi Young
VP of Engineering
Ozlo
2. Who am I?
Search Junkie, Data Scientist,
Engineer
Currently building Ozlo!!!
3. What is Ozlo?
Next generation assistant
Ozlo is leveraging artificial intelligence, machine
learning and natural language processing to
power the next generation of search
Ozlo is in the early stages of learning to
understand a wide range of human goals and
activities, and the words and ideas that connect
those things to help users find what they
actually need
4. AI Assistant and Chatbot Landscape
Siri
Alexa Skills Store
Bot Store
Skype Bot Store
Assistants
Platforms for
exposing
chatbots
Building a chatbot or
assistant
5. AI Assistant and Chatbot Landscape
https://twitter.com/ashevat/status/786690547733889024/photo/1
6. AI Assistant and Chatbot Landscape
https://twitter.com/davidjbland/status/725119174368976897
7. Why all the hype then?
We’ve moved to mobile where
messaging is the natural
method of communication
We’re moving to connected
smart devices and expect our
interactions to be natural to our
surroundings
8. Why all the hype then?
There’s a good chunk of
information seeking tasks
that search engines don’t
handle well in their
current form
Say wha?And they aren’t the really
hard ones that you’re
thinking of
(i.e. research travel, buy a
house)
10. Why is conversational a better experience?
It isn’t for a lot of things
Alexa, buy me some pants
I can’t buy pants. So I’ve added it
to your shopping list.
😒
I want to order a pizza
Great! What kind of
toppings would you like?
Pepperoni and sausage
with extra cheese
And what kind of crust?
Thin crust
What size pizza would you
like?
…
😒
On average 73 taps with
conversational ui vs
conventional filtering ui
with 16 taps
11. Why is conversational a better experience?
Rich,
robust
filtering
Highly visual experience
A lot of variety
It isn’t for a lot of things
12. Answer? The most natural interaction
The bar should be:
What kind of response would you
expect from a really
knowledgeable friend?
Are there any good movies playing?
Here’s some:
…
Anything more kid friendly?
How about these?
…
Which of these is playing around 9pm?
This is the only one playing close to
9pm, near you
…
Great! Can you get me a ticket?
Here’s a link to buy it on Fandango
13. Information Task Modes
Remember
• Simple Facts
• Simple 1-2
sentence answers
• Clean, cut, dried
Understand
• Obtaining
knowledge from a
multitude of
sources
• Constructing
meaning from
different content
sources
Analyze
• Breaking material
into constituent
parts
• Determine
relationships
• Make decisions
https://www.microsoft.com/en-us/research/wp-content/uploads/2015/08/fp286-bailey.pdf
15. Back to that hype thing…
https://www.microsoft.com/en-us/research/wp-content/uploads/2015/08/fp286-bailey.pdf
Chatbots and AI of today are primarily focused on stuff that’s pretty easy to get with an
existing app or search engine
X X X X
But our expectation is that they can do these
16. Understand or Analyze Type of Task
What’s a good place to watch the game nearby?
Point of interest
That is
rated highly or is popular or is known for
this type of task
Implies sports bar or point of
interest that has a television
with sports typically available
Close to your current
location
Depending on where
you’re located, could mean
within walking distance or
could mean 20 mins
driving distance,
depending on density of
POIs and sparsity of
available content
VERY IMPORTANT!!!
There is not ONE right answer to this question
It is a subjective question. Depending on your content sources, results can widely vary.
It requires a lot of synthesis across multiple sources, and likely presenting multiple
sources, not a definitive answer.
17. What you really want
Place A:
Great sports
bar nearby
Place B:
Romantic
restaurant
nearby
Place C:
Coffeeshop
nearby
X
X
Place A:
Great sports
bar nearby
Place D:
Restaurant
known for
sports and tvs
Place D:
Restaurant
known for
sports and tvs
19. What might a good experience look like?
Present evidence as to why those are
good options
Present multiple options, but not so
many that it’s overwhelming
Establish that you were heard and that
he understood what you actually
meant (i.e. sports bars, nearby)
Offer most likely refinements and
follow on prompts
21. To measure, we must understand
National
Communication
Association publishes a
rating scale to assess
skills in interpersonal
settings during
conversation
1 5
Inadequate
awkward, disruptive, leaving
a negative impression
Excellent
smooth, controlled, leaving a
positive impression
Attentiveness
Attention to, concern for
conversational partner
Composure
Confidence, assertiveness
Expressiveness
Articulation, animation,
variation
Coordination
Non disruptive negotiation of
speaking turns
22. What do REAL messaging conversations look like?
New vs Continuing
Conversations
Identifying satisfaction
of each sub-
conversation
23. How we think about things at Ozlo
Negative conversations
Bottom Line: How did the conversation end?
Negative indicators,
implicit AND explicit
We:
1. Identify conversation boundaries
2. Assign positive or negative assessment of
each interaction
3. Mark as negative if it “ended” negatively
24. What’s a negative ending conversation?
Conversations that contain one
of the following in the last N
messages in the interaction:
1. Explicit negative feedback
2. Highly latent
3. Not well understood
4. No follow on
VS
25. What’s a negative ending conversation?
Negative Ending Specific Signal Roughly maps to NCA ratings for…
Explicit Negative
Feedback
Thumbs down Composure (i.e. Didn’t understand, Results could be better)
Attentiveness (i.e. Oddly worded response, Didn’t understand)
Expressiveness (i.e. Oddly worded response)
Highly latent >1 second Coordination (i.e. Controlling the flow of conversation, “Never leave
me hanging”)
Not well understood Didn’t understand,
low confidence scores
Composure
Expressiveness
No follow on Lack of prompts displayed,
Lack of engagement for
non QnA questions
Coordination
Attentiveness
26. Why this over DAUs?
It’s not one over the other
DAUs/MAUs are lagging indicators
We must optimize for in-the-moment
interactions
Negatively ending conversations allows
us to react in the moment, and
aggregate and set targets
27. Will this result in better AI experiences?
Still early
This is how we learn, reinforce good behavior
Once we successfully measure, we can optimize