No single approach to knowledge classification and access is best for every application.
This webinar will help participants choose the right approach(es) to support their own cognitive computing application.
The science and engineering of data management for computational efficiency is well-understood. We have algorithms and heuristics to pre-fetch data and instructions and distribute them based on properties of the algorithms, data sets, applications, and system software and hardware. We have decades of experience fine-tuning hardware, networks, operating systems, compilers and applications based on physics. Now we need to start thinking in terms of biology.
Fortunately, we don’t have to actually model the 100B neurons or 100-500 trillion synapses in the human brain in hardware or software. We do need a well-specified knowledge model to organize refined data based on how we expect to query and further refine it. What we store constrains which questions a cognitive system may be able to answer. How we organize this knowledge may determine whether our system can answer questions or generate hypotheses efficiently or effectively.
Smart Data Webinar: Organizing Data and Knowledge - The Role of Taxonomies and Ontologies
1. Organizing Data & Knowledge
The Role of Taxonomies & Ontologies
Adrian Bowles, PhD
Founder, STORM Insights, Inc.
Lead Analyst, AI, Aragon Research
info@storminsights.com
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
AUGUST 10, 2017
2. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
AGENDA - ORGANIZING DATA AND KNOWLEDGE: 4 QUESTIONS
What Are We Trying to Accomplish?
Why Is It SO Important?
How Do We Do It Today?
How Will It Be Done In The Future?
3. Learn
Plan Reason
Understand
Model
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
THE GOAL IS TO BUILD SYSTEMS THAT UNDERSTAND AND LEARN
Plan (v)
Identify a goal/desired state
and a set of steps/activities
to reach that state.
Reason (v)
An evidence-based process for
determining the truth or
probability of a conclusion.
Deductive - Top down reduction,
Results are Certain
Inductive - Bottom up generalizations,
creating hypotheses with confidence
levels/probability
Abductive - Bottom up, probabalistic
development
of theories from observations
Understand
Awareness of the
meaning of data.
Learn
To acquire understanding
of data.
4. Model
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
Corpus
The complete machine-readable
record of a domain.
Distance
A metric for the similarity of two items
based on their relative locations in n-
space according to an algorithm.
Assumptions
Implicit or explicit data or
relationships held to be valid.
Hypothesis
An evidence-based testable
assertion that explains a
phenomenon or relationship.
COGNITIVE COMPUTING FUNDAMENTALS
Model
The Corpus, Assumptions, Algorithms
Used to
Generate & Score Hypotheses
or
Calculate The Strength of a Relationship
5. Model
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
COGNITIVE COMPUTING FUNDAMENTALS: SAMPLE ASSUMPTIONS
Model
The Corpus, Assumptions, Algorithms
Used to
Generate & Score Hypotheses
or
Calculate The Strength of a Relationship
Principles that control the
development and representation
of natural intelligence in the
neocortex provide a guide to the
implementation of machine
intelligence.(Numenta
Hierarchical Temporal Memory)
A function applied to a string
representing data or a concept
results in a value or vector
meaningful for comparison.
e.g. Using Kolmogorov complexity to
measure the strength of relationships
in Memory-Based Reasoning.
6. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
PAST AND (RE)PRESENT
No taxation without representation.
No Machine Intelligence without (knowledge) representation.
7. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
FUNDAMENTAL PRINCIPLES
“When the map and the terrain disagree, believe the terrain.”
Gause and Weinberg (Exploring Requirements)
It is the pervading law of all things organic, and inorganic, of all things physical
and metaphysical, of all things human and all things superhuman, of all true
manifestations of the head, of the heart, of the soul, that the life is recognizable in
its expression, that form ever follows function. That is the law.
Louis Sullivan: The Tall Office Building Artistically Considered, 1896
8. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
WHAT IS KNOWLEDGE? (BEYOND DATA)
Knowledge may include facts or beliefs and general
information with context.
9. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
Under ideal conditions, people are good - but not perfect - when
communicating in natural languages. We…
understand in context (environment & our own frame of reference)
attempt to resolve ambiguity
have to deal with competing signals, noise
fill in words and meaning and may not hear/understand - what was said/meant…
X
X’
Y
HOW DO PEOPLE COPE WITH IMPRECISION?
10. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
THINK VS REPRESENT
11. CHOICES HAVE CONSEQUENCES
How You Think About a Domain…
…influences your choice of maps and models…
rules and representations…and required operations.
12. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
In
the
N
ew
s
COMMON SENSE VS COMMON KNOWLEDGE
13. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
Classic
AI
WHAT DO YOU SEE?
14. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
Classic
AI
WHAT DO YOU SEE?
83 Birds
1 Flock
A Carwash
in
Your Future
15. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
WHAT DO YOU SEE, AND WHAT DO YOU FEEL?
Motorcycles
A Race
Noises
Smells
Pollution
What you capture depends on your context and experiences.
18. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
ONE SIZE FITS ONE
” To solve really hard problems, we'll have to use several
different representations. This is because each particular
kind of data structure has its own virtues and
deficiencies, and none by itself would seem adequate for
all the different functions involved with what we call
common sense.”
Marvin Minsky
19. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
TWO THINGS NOBODY TELLS YOU ABOUT DATA…
• All data is structured
Google used a neural network with16,000 processors to
search 10,000,000 images from YouTube to identify…cats.
• Beliefs change, truth doesn’t
Representing belief as fact will eventually trip up any system
“Facts change in regular and mathematically understandable ways.”
Samuel Arbesman, The Half-life of Facts, 2012, Penguin Books.
20. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
DEEP STRUCTURE REQUIRES STRONGER METHODS FOR ANALYSIS TO FIND CONCEPTS
Perception: obvious
structure is easy to
process…
but most of the
interesting stuff isn’t
obvious to a
computer.
26. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
START WITH A TAXONOMY
A taxonomy represents the formal structure of classes or types of objects within a domain.
•Generally hierarchical and provide names for each class in the domain.
•May also capture the membership properties of each object in relation to the other objects.
•The rules of a specific taxonomy are used to classify or categorize any object in the domain, so
they must be complete, consistent, and unambiguous. This rigor in specification should ensure that
any newly discovered object must fit into one, and only one, category or object class.
27. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
ONTOLOGIES
An ontology formalizes and specifies the names, definitions,
and attributes of entities within a domain. For practical
purposes, an accepted ontology defines the domain.
28. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
ONTOLOGIES EVOLVE - SYSTEMS MUST BE FLEXIBLE
29. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
DESIGN CHOICE: FINDING SYMBOLS VS USING STATISTICS
Symbolic Logic
Representations
Reasoning
Concepts
Statistical Models
Mechanical Theorem Proving
30. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
RDF - Resource Description Framework - A directed, labeled graph.
RDFS - RDF Specifications Suite Recommendations (Language for representing RDF
vocabularies)
SPARQL - A Semantic Protocol & Query Language for RDF Data
OWL - The Web Ontology Language is a Semantic We
language designed to represent knowledge about things
and relationships between things on the Web.
An OWL Document is an Ontology.
https://www.w3.org/2013/data/
THE SEMANTIC WEB - ALL DATA SHOULD BE ASSOCIATED WITH SEMANTIC ATTRIBUTES (MEANING)
BASICS OF THE W3C SEMANTIC WEB ONTOLOGY STACK
31. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
COMMERCIAL SOLUTIONS
Copyright (c)
Digital Reasoning.
32. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
Copyright (c)
Digital Reasoning.
COMMERCIAL SOLUTIONS
33. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
RECOGNIZING CONCEPTS VS UNDERSTANDING
Courtesy of LoopAI Labs.
34. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
RECOGNITION IS NOT UNDERSTANDING.
https://arxiv.org/abs/1112.6209
35. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
PROXIMITY/DISTANCE ALGORITHMS
Mapped with vectors,
proximity algorithm
based on purpose.
Mapping for autocorrect/complete vs Mapping for meaning
Boy
Bay
Map
Mop
Man
Nay May
Mope
Buy
Hop Hope
Boy
Bay
Map
Mop
Man
Nay
May
Mope
BuyHop
HopeSimilar structure ->
similar meaning in
vision, not always
in language.
Memory-Based
Reasoning
36. Model
Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
GETTING STARTED: CLASSIFICATION NEEDS VALIDATION & VERIFICATION
Classify
Hypothesize
Analyze
or
Synthesize
Perceive
37. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
USE PRE-BUILT KNOWLEDGE RESOURCES
39. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
OFF THE SHELF KNOWLEDGE - NEED TO ASSOCIATE/RECOGNIZE/UNDERSTAND TO
ORGANIZE/REPRESENT
Wordnet(R) Princeton
University "About WordNet."
Princeton University. 2010.
<http://
wordnet.princeton.edu>
41. GRAPHS SHOULD BE PART OF YOUR TOOLKIT
A graph is a structure with vertices and edges.
a
e
dc
b
Old Post Road
Cross Highway
Compo
Shinbone Alley
Elk Road
Old Post Road Paved
Old Post Road 11 miles
Elk Road Dirt
Elk Road 2 miles
Cross Highway toll road
Cross Highway 250 miles
Main Street 1 mile
Shinbone Alley .5 miles
a bus stop
b gas station
b Shell
c Elementary school
d House
e Office building
May be labeled, edges may be directed, all may
be stored/processed by properties
represented as key/value pairs.
42. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
GRAPHS 101. SORT OF
You Probably Already Think In Graphs if…
You watch detective shows
You know trivia about movies
You remember relationships between people
You took a biology class
43. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
GRAPHS 101. SORT OF
Wikipedia contributors. "Taxonomy (biology)." Wikipedia,
The Free Encyclopedia. Wikipedia, The Free Encyclopedia,
11 May. 2016. Web. 12 May. 2016.
44. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
GRAPHS 101. SORT OF
45. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
GRAPHS 101. SORT OF
Family Tree
LinkedIn Tree
46. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
GRAPHS 101. SORT OF
Typical crazy wall whiteboard - from Fargo.
A screen from IBM I2 Coplink
47. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
BUY OR BUILD IT YOURSELF WITH…
Commercial tools
Open Source tools
Prebuilt data
Graphs…
48. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
Sentiment/Emotion/Theme/Concept Analysis
Don’t let the search for perfection
interfere with the path to progress.
IF THE INTELLIGENCE IS ARTIFICIAL, WHY OBSESS ABOUT UNDERSTANDING?
49. adrian@storminsights.com
adrian@aragonresearch.com
Twitter @ajbowles
Skype ajbowles
If you would like to connect on LinkedIn,
please let me know that you that you
registered for the Smart Data webinar series.
Upcoming SmartData Webinar Dates & Topics
Sept. 14 Advances in Natural Language Processing II:
NL Generation
Oct. 12 Choosing the Right Data Management Architecture
for Cognitive Computing
Nov. 9 See Me Feel Me, Touch Me, Heal Me:
The Rise of the Cognitive Interface
KEEP IN TOUCH
New Content from Aragon Research
AragonResearch.com
50. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
#MODERNAI
END OF DECK
51. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
NLP
Natural Language Understanding
NLU
Natural Language Generation
NLG
?
NATURAL LANGUAGE PROCESSING (NLP)
52. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
DEEP NLU - DEEP QA WITH IBM WATSON
Question Analysis
What is being asked?
Question classification:
any words with double
meanings?
Puzzle question, factoid…?
Detect
focus
LAT
relations
53. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
Google Cloud NLP
Focus: Extract meaning
NLU COMPONENTS
54. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
IBM Watson Conversation
BUILD IT YOURSELF WITH…
55. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
AGI MINIMUM REQUIREMENTS
or
Big Knowledge + Modest Processing
(Reasoning, KM…)
Big Processing + Big Data
(Reasoning, KM…)
With sufficient processing power, and
access to enough clean, validated data,
just in time knowledge acquisition.
Starting with sufficient knowledge
(includes the model with
assumptions) makes processing
requirements relatively modest to
accommodate incremental activities.
56. Copyright (c) 2017 by STORM Insights Inc. All Rights reserved.
Content Acquisition
Building the corpus
For Jeopardy! this had to be
completed before the game
commenced.
Ingested encyclopedias,
dictionaries, thesauri,
newswire articles, literary
works, databases,
taxonomies, ontologies…
IRL, we can identify and use new resources
based on the problem at hand.
57. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
CASE STUDY: DEEP QA WITH IBM WATSON
Question Analysis
What is being asked?
Question classification:
any words with double
meanings?
Puzzle question, factoid…?
Detect
focus
LAT
relations
58. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
CASE STUDY: DEEP QA WITH IBM WATSON
Relation-detection
“They’re the two states you could be reentering if
you’re crossing Florida’s norther border.”
Category: Head North
borders(Florida, ?,x,north)
59. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
CASE STUDY: DEEP QA WITH IBM WATSON
60. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
CASE STUDY: DEEP QA WITH IBM WATSON
61. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
CASE STUDY: DEEP QA WITH IBM WATSON
Hypothesis Generation
& Scoring
Use a candidate answer with the
question, try to prove correct with
a degree of confidence
supported by the evidence.
Scoring may use a variety of
relationships:
temporal
spatial
geospatial
taxonomic classification
correlation between candidate
and question…
62. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
CASE STUDY: DEEP QA WITH IBM WATSON
“Chile shares its longest
land border with this country.”
63. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
CASE STUDY: DEEP QA WITH IBM WATSON
Evaluating Potential Answers
Watson scores evidence in
multiple dimensions
What works for a factoid question
may not work for a puzzle question.
“Chile shares its longest
land border with this country.”
64. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
CASE STUDY: DEEP QA WITH IBM WATSON
65. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
CASE STUDY: DEEP QA WITH IBM WATSON
Merging & Ranking
Identifying the most likely
answer based on confidence
scores.
Answer scores are merged
before ranking and
confidence estimation.
Uses ML approach to
compare with training set
data when confidence
scores in different categories
result in “too close to call”
results.
66. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
GRAPHS 101. SORT OF
Wikipedia contributors. "Graph database." Wikipedia, The Free Encyclopedia. Wikipedia, The Free Encyclopedia, 11
67. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
LEXICAL ANSWER TYPE DISTRIBUTION
Predicting lexical answer types in open domain question and answering (qa) systems US 20130035931 A1 2013, Ferrucci, Gliozzo, Kalyanpur
68. Copyright (c) 2017 by STORM Insights Inc. All Rights Reserved.
DEEP QA VS SEARCH