Chatbots
Present, Past and Future
Paul Houle
Ontology2
Chatbots Present
Facebook Messenger Amazon Echo (Alexa)
Interact with chatbots using the same UI you use to
interact with people
Conversational voice interface oriented around tasks;
voice is the primary, if not the only interface
Facebook Messenger Platform
Message content:
1. Text
2. Video, Audio and Image Media Objects
3. Hyperlinks, Buttons, “Call to Action”
Application Server
Back-End Business
Systems
Real World
Human Supervision
Facebook Messenger: Air Travel Application
Air Travel Application
• Large Commercial Value (3 tickets, $4000)
• Complex Interaction that takes place over many days
• Takes place in moderately large, but finite world
• Mixed-initiative, changes can occur on either end:
• Passenger: change flight, seats, upgrade, ...
• Airline: cancellation, delay, standby, ...
Your IT Systems
Alexa Skills: System architecture
Alexa Skills: Examples
The trouble with mobile applications
Complex Business
Complex
Applications
Client
Communication Networks
App Stores
A complex business means a complex application: hard to distribute as a mobile app if all screens are sent on the client
a real business application can have 200-2000 screens!
Asynchronous/Mixed Initiative User Interface
Chatbots past
From telegraph to teletype...
20 words/minute ≈ 2 character/sec TELEX
15 character/sec ≈ 110 bits/sec (bps)
1770: The Turk
Chess playing “robot” controlled by a very
small chess player
1942 Alan Turing: War Hero
Alan Turing: Computer Scientist
1952 First Computer Chess ProgramFundamentals of Computation
1950: Alan Turing’s “Imitation Game”
Interrogator tries to answer: “who is the male and who
is the female?” Both players are pretending to be the
other
Can a computer understand gender?
Can a computer pretend to be gendered? Can it pretend
to be the other gender?
What social skills are necessary?
Can a computer pass as a human at all?
1956 1957
1957 Client-Centered Therapy
Psychotherapist
Carl Rogers
Client-centered therapy
mirrored
not mirrored
1960 SABRE Airline Booking System
1958 SAGE Air Defense System
Two response strategies are sufficient to elicit empathy: user fills in imagined motives for simulated computer therapist
(is that different from experiencing fiction?)
1966 The Eliza Effect
Active Listening
Changing the Subject
ELIZA: Maybe your plans have something to do with this.
ELIZA: Don't you think computers can help people?
ELIZA: I've told you that apologies are not required.
ELIZA: Can You Elaborate On That?
Science Fiction Robots
1970 SHRDLU
> Pick up a big red block.
OK.
> Grasp the pyramid
I DON'T UNDERSTAND WHICH PYRAMID YOU MEAN.
> Find a block which is taller than the one you are holding and put it into the box.
BY "IT", I ASSUME YOU MEAN THE BLOCK WHICH IS TALLER THAN THE ONE I AM HOLDING.
OK.
> What does the box contain?
THE BLUE PYRAMID AND THE BLUE BLOCK.
> What is the pyramid supported by?
THE BOX.
SHRDLU makes conversation about an imaginary
“blocks world” with the computer and can order
the computer to make changes to the block
world within some constraints.
id type color picked_up above inside height width ...
1 block green 0 0 0 2.0 2.0 ...
2 block red 1 0 0 3.0 1.5 ...
3 pyramid red 0 1 0 3 0.5 ...
4 block blue 0 0 0 4.0 2.5 ...
“The Blocks World”
SHRDLU capabilities
“Have a conversation about a database”
English-like syntax
Remembers context and...
... asks questions to uncover context
ability to reason and plan
Teleprinter Video Display Terminal
1975 Microsoft Basic
1) Using BASIC is like having a conversation, and...
2) ... A conversation-like interface is easy to implement in BASIC
1975 Microsoft Basic
... but the conversation context is encoded in the state of the program
Not Flexible
Can’t handle Mixed Initiative
1975 MYCIN
Medical Diagnostic Expert System
1975 MYCIN
Medical Diagnostic Expert System
1975 MYCIN
Medical Diagnostic Expert System
Rules
encode knowledge & procedures
Facts
describe the problem
Production Rules / RETE Engine
Ideal for mixed initiative:
• System accepts facts from both the user and the world (react to multiple inputs)
• Firing rules can (i) cause actions and (ii) cause more rules to fire
1976 Colossal Cave
database describing game world
has a lot in common with SHRDLU....
1983 Infocom
1984 Apple Macintosh
“AI Winter”
SHDRLU and MYCIN were not scalable!
Algorithms and tools did not scale
Labor Cost to Create Knowledge Base too High
1991-present Adobe Premiere
1994
Netscape
Navigator
2001 Metal Gear Solid 2
2016 Consumer VR
REPL (of many kinds) is alive and well
but not as smart as SHRDLU!
1988 Mathematica Notebooks
1985 GSM and SMS
A PUBL1C SERVICE ANN0UNCEMENT!
IRC, ICQ, AIM, Skype...
2007 iPhone
2009-2014 WhatsApp
• 1-layer neural network can learn a separating line between two categories
• The input space could have thousands or millions of dimensions (ex. an image)
1957 Perceptrons
This 1969 book by Minsky and Papert demolished Perceptrons by demonstrating many
things Perceptrons could not do
1971 full-text search
tfidf: term frequency/inverse document frequency
Karen-Spärck Jones and Gerard Salton @ Cornell
• is computing a dot product in high-dimensional space just like the Perceptron
• is solving a “subjective” problem, isn’t expected to get 100% right answers
• still the dominant algorithm for full-text search in 2016
1992 US Post Office: Handwritten Digits
Backpropagation makes it practical to train shallow neural
networks.
NIST developed training data for this project
that eventually became the famous MNIST
digits
predictive analytics
Problem: given four measurements for an Iris flower, determine species
data from 1936 Ronald Fisher and Edgar Anderson
methods based on Vector Spaces:
methods based on rules:
linear discriminant, support vector machine, neural networks,
k-nearest neighbors, etc.
C4.5, random forests, inductive logic programming
data driven competitions
1992-present: yearly competitions and conference to improve accuracy of search
engines and similar systems. Supported by US National Institute of Standards.
2010-present data: set of images annotated with noun concepts from Wordnet,
yearly competitions for classification tasks have led to large breakthroughs in
convolutional neural networks & image recognition
2010-present data: venture backed company solves data science problems for
customers by holding public competitions
2006 deep learning
Two kinds of training data:
1. A large number of unlabeled examples
2. A small number of classified examples
Two phases:
1. Deep Belief Network (DBN) learns statistical regularities in
unlabeled data
2. Backpropagation fine tunes the network for a specific task
based on labeled and/or unlabeled data
information bottleneck
forces network to
generalize rather than
memorize
present:Explosion of Neural Network
Architectures
2011 IBM Watson wins at Jeopardy
Watson considers several possible answers to a question and
computes a probability score for each one.
Watson weighs the risk of getting a wrong answer against the risk of
an opponent answering first and takes action at the optimal time. It
calculates bets to ensure a win, if possible.
Watson Precision/Recall Curve
Throwaway prototype based on commercial off-
the-shelf (COTS) full-text search engine
Point cloud is estimated performance of
human jeopardy players, the goal is to
get into this region
Progressive improvement of:
• knowledge base
• question answering strategies
• result merging
Watson achieves hyperprecision because it can choose whether or
not to answer a question.
Watson reasons about uncertainty in order to maximize a utility
function; act in it’s own “self interest”
Chatbots future
Overwhelmed?
Intelligent systems use complexity to
cope with the complexity of the world
they inhabit.
Outsource subtasks to agents...
present:
also on
a
company
wit.ai: rapidly generalizes examples of specific patterns of text and links these to “intents”
business rules: revenge of the expert system!
Business Processing Modelling Language
• Production rules engine scale 1000x larger
• ... are ideal for managing processes which happen over an extended time
• ... that are driven by events that happen in “the real world”
Complex Event Processing
Greatly improved RETE algorithms do this efficiently!
Rules can put together a story about a set of related
events, by creating new events when the existing
events meet some condition
constraint solving & optimization
Route Optimization
SAT/SMT Solver
Travel PlanningBox Packing
There are tools such as Drools OptaPlanner and IBM CPLEX Optimizer that
marry constraint solving and optimization with rules-based systems.
However, many people who work in this space code everything in C++
because they want to try the largest rate of possibilities per second
2XL
“toy of the year”
-- Disney's Family Fun Magazine
Speak and
Spell
1978
1980s Interactive Voice Response:
2001 <VoiceXML>
VoiceXML (from TellMe) supports text to
speech in voice prompts and lets the
script author write a grammar for things
that the telephone caller is supposed to
say.
Performance is dependent on the system
modelling what the user might say: it can
resolve addresses in the US because it
has a list of all the street names!
Can call out to “Web Services” in order to
implement business tasks
2000 Voice Improvement Program
Today: A chance of rain after 4pm. Increasing clouds, with a high near 41.
Southeast wind 6 to 8 mph. Chance of TODAY: A CHANCE OF RAIN AFTER 4PM.
INCREASING CLOUDS, WITH A HIGH NEAR 41. SOUTHEAST WIND 6 TO 8 MPH. CHANCE OF
PRECIPITATION IS 50%. NEW PRECIPITATION AMOUNTS OF LESS THAN ...
• Six voices: male/female and English/Spanish
• Voices vary speed and pitch to create feeling of urgency
• Requires full attention to listen to
ITHACA WXN59 162.5 MHZ
Video games engage players with dialog
that supports the story.
Dialog depends heavily on
writing & voice acting and
is not very interactive.
Text-to-speech can’t keep
users engaged
SSML
All-in
Amazon Alexa is a new platform where you
can’t fall back to the keyboard, mouse or touch-screen.
Voice function has to be good
Others
Bolted onto full-powered phones, computers, and game
consoles, vendors don’t have to face the hard cases
for voice control because fallback to traditional
controllers is imminent.
seeing the world that humans live in
specialized cameras and sensors let robots see the world directly in 3-d
varieties of depth camera
multiocular structured light
laser scanner
Kinect/A Sensor for a Sessile Robot
Inform 7 tricks
and ideas
Controlled English facts and rules Pre-existing Ontologies and Theories
Rules Override Other Rules
Parsing number words as numbers Rule Precedence Managed with
“Rulebooks”
Conclusion:
Chatbots
• Popular today because of mobile
application limitations
• Possible “third platform” for
applications
• Use a wide range of tactics to
accomplish goals
• Chatbots in 2017 will depend on
data-rich services
• Deeply interdisciplinary, involving
art as much as science

Chatbots in 2017 -- Ithaca Talk Dec 6

  • 1.
    Chatbots Present, Past andFuture Paul Houle Ontology2
  • 2.
  • 3.
    Facebook Messenger AmazonEcho (Alexa) Interact with chatbots using the same UI you use to interact with people Conversational voice interface oriented around tasks; voice is the primary, if not the only interface
  • 4.
    Facebook Messenger Platform Messagecontent: 1. Text 2. Video, Audio and Image Media Objects 3. Hyperlinks, Buttons, “Call to Action” Application Server Back-End Business Systems Real World Human Supervision
  • 5.
    Facebook Messenger: AirTravel Application
  • 6.
    Air Travel Application •Large Commercial Value (3 tickets, $4000) • Complex Interaction that takes place over many days • Takes place in moderately large, but finite world • Mixed-initiative, changes can occur on either end: • Passenger: change flight, seats, upgrade, ... • Airline: cancellation, delay, standby, ...
  • 7.
    Your IT Systems AlexaSkills: System architecture
  • 8.
  • 9.
    The trouble withmobile applications Complex Business Complex Applications Client Communication Networks App Stores A complex business means a complex application: hard to distribute as a mobile app if all screens are sent on the client a real business application can have 200-2000 screens!
  • 10.
  • 11.
  • 12.
    From telegraph toteletype... 20 words/minute ≈ 2 character/sec TELEX 15 character/sec ≈ 110 bits/sec (bps)
  • 13.
    1770: The Turk Chessplaying “robot” controlled by a very small chess player
  • 14.
  • 15.
    Alan Turing: ComputerScientist 1952 First Computer Chess ProgramFundamentals of Computation
  • 16.
    1950: Alan Turing’s“Imitation Game” Interrogator tries to answer: “who is the male and who is the female?” Both players are pretending to be the other
  • 17.
    Can a computerunderstand gender? Can a computer pretend to be gendered? Can it pretend to be the other gender? What social skills are necessary?
  • 18.
    Can a computerpass as a human at all?
  • 20.
  • 21.
    1957 Client-Centered Therapy Psychotherapist CarlRogers Client-centered therapy mirrored not mirrored
  • 22.
    1960 SABRE AirlineBooking System 1958 SAGE Air Defense System
  • 23.
    Two response strategiesare sufficient to elicit empathy: user fills in imagined motives for simulated computer therapist (is that different from experiencing fiction?) 1966 The Eliza Effect Active Listening Changing the Subject ELIZA: Maybe your plans have something to do with this. ELIZA: Don't you think computers can help people? ELIZA: I've told you that apologies are not required. ELIZA: Can You Elaborate On That?
  • 24.
  • 25.
    1970 SHRDLU > Pickup a big red block. OK. > Grasp the pyramid I DON'T UNDERSTAND WHICH PYRAMID YOU MEAN. > Find a block which is taller than the one you are holding and put it into the box. BY "IT", I ASSUME YOU MEAN THE BLOCK WHICH IS TALLER THAN THE ONE I AM HOLDING. OK. > What does the box contain? THE BLUE PYRAMID AND THE BLUE BLOCK. > What is the pyramid supported by? THE BOX. SHRDLU makes conversation about an imaginary “blocks world” with the computer and can order the computer to make changes to the block world within some constraints.
  • 26.
    id type colorpicked_up above inside height width ... 1 block green 0 0 0 2.0 2.0 ... 2 block red 1 0 0 3.0 1.5 ... 3 pyramid red 0 1 0 3 0.5 ... 4 block blue 0 0 0 4.0 2.5 ... “The Blocks World”
  • 27.
    SHRDLU capabilities “Have aconversation about a database” English-like syntax Remembers context and... ... asks questions to uncover context ability to reason and plan
  • 28.
  • 29.
    1975 Microsoft Basic 1)Using BASIC is like having a conversation, and... 2) ... A conversation-like interface is easy to implement in BASIC
  • 30.
    1975 Microsoft Basic ...but the conversation context is encoded in the state of the program Not Flexible Can’t handle Mixed Initiative
  • 31.
  • 32.
  • 33.
    1975 MYCIN Medical DiagnosticExpert System Rules encode knowledge & procedures Facts describe the problem
  • 34.
    Production Rules /RETE Engine Ideal for mixed initiative: • System accepts facts from both the user and the world (react to multiple inputs) • Firing rules can (i) cause actions and (ii) cause more rules to fire
  • 35.
    1976 Colossal Cave databasedescribing game world has a lot in common with SHRDLU....
  • 36.
  • 37.
  • 38.
    “AI Winter” SHDRLU andMYCIN were not scalable! Algorithms and tools did not scale Labor Cost to Create Knowledge Base too High
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
    REPL (of manykinds) is alive and well but not as smart as SHRDLU!
  • 44.
  • 45.
    1985 GSM andSMS A PUBL1C SERVICE ANN0UNCEMENT!
  • 46.
  • 47.
  • 48.
  • 49.
    • 1-layer neuralnetwork can learn a separating line between two categories • The input space could have thousands or millions of dimensions (ex. an image) 1957 Perceptrons This 1969 book by Minsky and Papert demolished Perceptrons by demonstrating many things Perceptrons could not do
  • 50.
    1971 full-text search tfidf:term frequency/inverse document frequency Karen-Spärck Jones and Gerard Salton @ Cornell • is computing a dot product in high-dimensional space just like the Perceptron • is solving a “subjective” problem, isn’t expected to get 100% right answers • still the dominant algorithm for full-text search in 2016
  • 51.
    1992 US PostOffice: Handwritten Digits Backpropagation makes it practical to train shallow neural networks. NIST developed training data for this project that eventually became the famous MNIST digits
  • 52.
    predictive analytics Problem: givenfour measurements for an Iris flower, determine species data from 1936 Ronald Fisher and Edgar Anderson methods based on Vector Spaces: methods based on rules: linear discriminant, support vector machine, neural networks, k-nearest neighbors, etc. C4.5, random forests, inductive logic programming
  • 53.
    data driven competitions 1992-present:yearly competitions and conference to improve accuracy of search engines and similar systems. Supported by US National Institute of Standards. 2010-present data: set of images annotated with noun concepts from Wordnet, yearly competitions for classification tasks have led to large breakthroughs in convolutional neural networks & image recognition 2010-present data: venture backed company solves data science problems for customers by holding public competitions
  • 54.
    2006 deep learning Twokinds of training data: 1. A large number of unlabeled examples 2. A small number of classified examples Two phases: 1. Deep Belief Network (DBN) learns statistical regularities in unlabeled data 2. Backpropagation fine tunes the network for a specific task based on labeled and/or unlabeled data information bottleneck forces network to generalize rather than memorize
  • 55.
    present:Explosion of NeuralNetwork Architectures
  • 56.
    2011 IBM Watsonwins at Jeopardy Watson considers several possible answers to a question and computes a probability score for each one. Watson weighs the risk of getting a wrong answer against the risk of an opponent answering first and takes action at the optimal time. It calculates bets to ensure a win, if possible.
  • 58.
    Watson Precision/Recall Curve Throwawayprototype based on commercial off- the-shelf (COTS) full-text search engine Point cloud is estimated performance of human jeopardy players, the goal is to get into this region Progressive improvement of: • knowledge base • question answering strategies • result merging
  • 59.
    Watson achieves hyperprecisionbecause it can choose whether or not to answer a question. Watson reasons about uncertainty in order to maximize a utility function; act in it’s own “self interest”
  • 60.
  • 61.
    Overwhelmed? Intelligent systems usecomplexity to cope with the complexity of the world they inhabit. Outsource subtasks to agents...
  • 62.
  • 63.
  • 64.
    a company wit.ai: rapidly generalizesexamples of specific patterns of text and links these to “intents”
  • 65.
    business rules: revengeof the expert system!
  • 66.
    Business Processing ModellingLanguage • Production rules engine scale 1000x larger • ... are ideal for managing processes which happen over an extended time • ... that are driven by events that happen in “the real world”
  • 67.
    Complex Event Processing Greatlyimproved RETE algorithms do this efficiently! Rules can put together a story about a set of related events, by creating new events when the existing events meet some condition
  • 68.
    constraint solving &optimization Route Optimization SAT/SMT Solver Travel PlanningBox Packing There are tools such as Drools OptaPlanner and IBM CPLEX Optimizer that marry constraint solving and optimization with rules-based systems. However, many people who work in this space code everything in C++ because they want to try the largest rate of possibilities per second
  • 69.
    2XL “toy of theyear” -- Disney's Family Fun Magazine Speak and Spell 1978
  • 70.
  • 71.
    2001 <VoiceXML> VoiceXML (fromTellMe) supports text to speech in voice prompts and lets the script author write a grammar for things that the telephone caller is supposed to say. Performance is dependent on the system modelling what the user might say: it can resolve addresses in the US because it has a list of all the street names! Can call out to “Web Services” in order to implement business tasks
  • 72.
    2000 Voice ImprovementProgram Today: A chance of rain after 4pm. Increasing clouds, with a high near 41. Southeast wind 6 to 8 mph. Chance of TODAY: A CHANCE OF RAIN AFTER 4PM. INCREASING CLOUDS, WITH A HIGH NEAR 41. SOUTHEAST WIND 6 TO 8 MPH. CHANCE OF PRECIPITATION IS 50%. NEW PRECIPITATION AMOUNTS OF LESS THAN ... • Six voices: male/female and English/Spanish • Voices vary speed and pitch to create feeling of urgency • Requires full attention to listen to ITHACA WXN59 162.5 MHZ
  • 73.
    Video games engageplayers with dialog that supports the story. Dialog depends heavily on writing & voice acting and is not very interactive. Text-to-speech can’t keep users engaged
  • 75.
  • 76.
    All-in Amazon Alexa isa new platform where you can’t fall back to the keyboard, mouse or touch-screen. Voice function has to be good Others Bolted onto full-powered phones, computers, and game consoles, vendors don’t have to face the hard cases for voice control because fallback to traditional controllers is imminent.
  • 77.
    seeing the worldthat humans live in specialized cameras and sensors let robots see the world directly in 3-d
  • 78.
    varieties of depthcamera multiocular structured light laser scanner
  • 79.
    Kinect/A Sensor fora Sessile Robot
  • 82.
    Inform 7 tricks andideas Controlled English facts and rules Pre-existing Ontologies and Theories Rules Override Other Rules Parsing number words as numbers Rule Precedence Managed with “Rulebooks”
  • 83.
    Conclusion: Chatbots • Popular todaybecause of mobile application limitations • Possible “third platform” for applications • Use a wide range of tactics to accomplish goals • Chatbots in 2017 will depend on data-rich services • Deeply interdisciplinary, involving art as much as science