2. Anticipatory computing is transforming the way we find information
Set reminder
- Launch calendar app
- Create new reminder
- Enter flight details
Check flight status
- Launch web browser
- Go to airline site
- Enter flight number
Check traffic
- Launch web browser
- Go to map / traffic site
- Enter current location
- Enter airport address
Today, we find information Tomorrow, information finds us
9. Recent technology advances
Speech Recognition
- Deep learning (deep / recurrent ANNs)
- Ultra large language models
- Dynamic speaker adaptation
- Massive datasets (108s of users)
Computer Vision
- Deep learning
- Massive datasets
Language Understanding
- Deep learning
- Knowledge graphs
Source: Facebook Source: Stanford University
10. Knowledge graphs
From disembodied strings to grounded entities
• Yahoo! 10 M entities, 30 M properties, 10 M connections
• Microsoft 300 M entities, 800 M connections
• Google 570 M entities, 18 B properties and connections
• Wikipedia 4 M entities
• Freebase 40 M topics, 2 B facts
• Factual 66 M local businesses and POIs in 50 countries
• LinkedIn 225 M people
• Facebook 1.15 B people
Cf.
• Cyc
239 K concepts,
2 M facts
• OpenCyc
6 K concepts,
60 K facts
Source: Yahoo
11. Dynamic activation of the knowledge graph
TIME
Continuous
user context
hayes valley palo alto north beach cow hollow
I really want to see that new movie with Ben Affleck
It is the one about the Iran Hostage Crisis
You have to see that video of the Today Show doing the Harlem Shake
I am going to meet Raymond at Goat Hill Pizza at noon
It is near the Comstock Saloon
I am planning to go whitewater rafting in the Grand Canyon
The Black Keys were on the Colbert Report last night
It is near the Comstock Saloon
Rolling Context Window
Dynamic
entity graph
(~10M entities)
things I recently
wrote or said
restaurants near
North Beach
places in the
Bay area
topics related to
things I recently read
current
my friends, colleagues events
and recent contacts
links that my friends
have recently shared
Human Knowledge
(~50B entities)
5B people
1B places
1B products
100M interests
100M events
1B media
2008
(1M entities)
2010
(10M entities)
2014
(500M entities)
2016
(10B entities)
5B domain-specific
12. The knowledge graph enables anchored NLP
“I saw the man on the hill with the telescope”
Source: Deniz Yuret
13. Voice
10% of Baidu
search queries
are done with
voice today.
In five years,
it’ll be 50% ”
Andrew Ng
14. Types of voice-driven applications
Question & Answer "What is the capital of California?"
"Who directed Citizen Kane?"
Command & Control "Call Jenny's work phone."
"Turn up the heat to 72 degrees."
Content Discovery "Is there a good Japanese restaurant near Union Square?"
"Show me all the James Bond movies with Roger Moore."
Performing Tasks "Make a reservation for two at Kama tomorrow at 8pm."
"Book me on a flight to JFK on Saturday afternoon."
Dictation "Send a text to Jenny saying…"
"Send the following email to Joe…"
Passive Listening "…have you seen that video of the Russian meteor…"
"…I’m thinking of getting a pair of red Kobe 9 sneakers…"
15. Anatomy of a voice interaction
1. Speech recognition 2. Natural language understanding
type: restaurant
category: Italian
location: San Francisco
cost: $, $$
filter: good for kids
”It’d be nice to find an inexpensive Italian
restaurant in San Francisco that is good for kids.”
3. Search ranking & filtering 4. Real-time visualization of results
Candidate 1: Buca di Beppo [confidence: 0.91]
Candidate 2: La Traviata [confidence: 0.82]
Candidate 3: Ragazza [confidence: 0.80]
Candidate 4: Sotto Mare [confidence: 0.76]
…
16. The MindMeld platform
generate a
continuously changing
model of user intent
based on long-12running context
passively analyze multiple
concurrent data streams
for each user in real-time
voice, gps, video, updates, … 3
proactively find,
correlate and rank
relevant information
display to user as appropriate
18. CONFIDENTIAL
Step 1 !
We will automatically index !
any document collection.!
18
19.
20. CONFIDENTIAL
Step 1 !
We will automatically index !
any document collection.!
!!
Step 2 !
Use our API to continuously!
track contextual signals for!
your users.!
!
20
21. curl
-‐X
POST
-‐H
X-‐MindMeld-‐Access-‐Token:
mindmeld-‐access-‐token
-‐H
Content-‐Type:
application/json
-‐d
'{
text:
I
was
thinking
we
could
go
to
Muir
Woods
or
Stinson
Beach,
type:speech,
weight:0.5
}'
https://mindmeld.expectlabs.com/session/:sessionid/textentries
21
22. CONFIDENTIAL
Step 1 !
We will automatically index !
any document collection.!
!!
Step 2 !
Use our API to continuously!
track contextual signals for!
your users.!
!!
Step 3 !
Display context-driven !
search results and !
recommendations.!
!!
25. Technology choice: Speech recognition
Google Nuance
• Google’s server-side implementation of
HTML5’s webkitSpeechRecognition for
Chrome and Android
• Fast, interim results, free
• 79 languages
• No SLA, no iOS support
• Nuance NDEV Mobile speech-to-text service
• Custom vocabularies
• Embedded engine, cloud-based API, and
combination
• 40 languages
• SLA
ATT Build own
• ATT Speech API
• iOS, Android SDK clients to cloud-based API
• 19 languages
• SLA
• Start with open-source speech recognition
engine such as Sphinx or Kaldi
• Be ready to invest $$$
• Full control
26. Technology choice: Natural language understanding
Stanford NLP NLTK
• PoS tagging, parsing, entity extraction,
co-reference resolution
• Java-based SDK
• Free (GNU General Public License v2)
• PoS tagging, parsing, entity extraction
• Python-based SDK
• Free (Apache License v2)
AlchemyAPI TextRazor
• Sentiment analysis, entity / keyword
extraction, language detection
• Cloud-based API
OpenCalais Build own
• Extraction of named entities, facts, and events
• Cloud-based API
• Entity recognition, topic tagging, dependency
parsing
• Cloud-based API
• Combine, extend functionality
• Full control
• Requires maintenance
27. Technology choice: Machine learning
Scikit-learn Weka
• Classification, regression, clustering
• SVM, logistic regression, random forests, …
• Python-based, open-source library
• Data analysis, predictive modeling
• Wide array of machine learning classifiers
• Java-based, free library (GNU GPL)
Google GraphLab
• Google Prediction API
• Pattern matching, classifiers, recommender
systems
• Freemium pricing
PredictionIO Build own
• Predictive modeling, recommender systems
• Open-source service
• Topic modeling, graph analytics, clustering,
collaborative filtering, computer vision
• Parallel programming
• C++ core, Python interface
• Combine, extend functionality
• Optimize runtime for each model
28. Technology choice: Development operations
Amazon Web Services Nitrous.IO
• Cloud-based servers, storage, load balancers • Dev box in seconds with browser-based IDE
GitHub Chef
• Code repos management
• Server provisioning
Nginx Nagios
• Operations monitoring alerting
Circle CI Pivotal Tracker
• Project management
• Web server
• Test build
29. Challenges
Functionality Accuracy
Scalability
• Users
• Applications
• Domains
Latency
• ASR
• NLU
• IR
• Visualization
ASR
• Word error rate
• Accented speech
• Noisy environment
• Distant speaker
NLU / IR
• Precision recall
• Word sense disambiguation
• Anaphora resolution
• Conversation modeling
• Interruptability
30. MindMeld API: Powerful yet easy to use
developer.expectlabs.com
real-time location
entity
extraction
speech recognition on any device
on any device
Android
SDK iOS
SDK
JavaScript
SDK
sample
code
turnkey HTML5
widgets
push
events
open graph
support
customizable
ranking
keyphrase
detection
topic
detection
natural language
processing
on-demand
web crawling
proactive
suggestions
instant answers
extensive online
documentation
real-time analytics
console
complete API
explorer tool
crawl manager
dashboard
ranking
dashboard
32. MindMeld API: Powering a wide range of applications
1 2 3
Voice-Driven
Intelligent Assistant
Location-Based
Proactive Assistant
Voice and Video
Conference Assistant
online commerce
media entertainment
mobile apps devices
wearables
location-based services
local travel apps
smart cars
mobile workforce
customer support help desk
call center solutions
voice video calling
telepresence collaboration