Principal Data Scientist
Booz Allen Hamilton
http://www.boozallen.com/datascience
Kirk Borne
@KirkDBorne
Semantic AI: Smart Data for
Smarter Discovery & Actions
Six Core Aspects of Semantic AI
https://bit.ly/2Kxw8H5
•Hybrid Approach
•Data Quality
•Data as a Service
•Structured Data Meets Text
•No Black-box
•Towards Self-optimizing Machines
Ever since we first explored our world…
http://www.livescience.com/27663-seven-seas.html 3
…We have asked questions about everything around us.
https://atillakingthehun.wordpress.com/2014/08/07/atlantis-not-lost/
4
So, we have collected evidence (data) to answer our questions,
which leads to more questions, which leads to more data collection,
which leads to more questions, which leads to… BIG DATA!
5
https://www.linkedin.com/pulse/exponential-growth-isnt-cool-combinatorial-tor-bair
So, we have collected evidence (data) to answer our questions,
which leads to more questions, which leads to more data collection,
which leads to more questions, which leads to… BIG DATA!
y ~ 2 * x (linear growth)
y ~ 2 ^ x (exponential growth)
6
https://www.linkedin.com/pulse/exponential-growth-isnt-cool-combinatorial-tor-bair
y ~ x! ≈ x ^ x
→ Combinatorial Growth!
(all possible interconnections,
linkages, and interactions)
3+1 V’s of Big Data:
Volume = most annoying V
Velocity = most challenging V
Variety = most rich V for discovery
Value = the most important V
“All the World is a Graph” – Shakespeare?
(Graphic by Cray, for Cray Graph Engine CGE)
7
http://www.cray.com/products/analytics/cray-graph-engine
Semantic, Meaning-filled Data:
• Ontologies (formal)
• Taxonomies (class hierarchies)
• Folksonomies (informal)
• Tagging / Annotation
– Automated (Machine Learning)
– Crowdsourced
– “Breadcrumbs” (user trails)
Broad, Enriched Data:
• Linked Data (RDF)
– All of those combinations!
• Graph Databases
• Machine Learning
• Cognitive Analytics
• Context
• The 360o view
Making Sense of the World with Smart Data
The Human Connectome Project:
mapping and linking the major
pathways in the brain.
http://www.humanconnectomeproject.org/
8
Semantic AI in the Internet of Things (IoT):
Internet of
Everything
https://www.nsf.gov/news/news_images.jsp?cntn_id=122028 9
The Internet of Things (IoT) will be an interconnected network of Sensors and
Dynamic Data-Driven Application Systems (dddas.org) =>
Leading to a Combinatorial Explosive Growth of Smart Data!
IoT will power an “Internet of Context” – empowering smarter
actionable intelligence from contextual data everywhere!
1) Class Discovery: Find the categories of objects
(population segments), events, and behaviors in your
data. + Learn the rules that constrain the class
boundaries (that uniquely distinguish them).
2) Correlation (Predictive and Prescriptive Power)
Discovery: Finding trends, patterns, dependencies in
data, which reveal the governing principles or behavioral
patterns (the object’s “DNA”).
3) Novelty (Surprise!) Discovery:
Finding new, rare, one-in-a-[million / billion / trillion]
objects, events, or behaviors.
4) Association (or Link) Discovery: (Graph and Network
Analytics) – Find the unusual (interesting) co-occurring
Make your data smarter with Machine Learning =
= generate semantic tags that describe discoveries
10
(Graphic by S. G. Djorgovski, Caltech)
SEMANTIC AI USE CASE IN ENVIRONMENTAL SCIENCE:
From Data to Information to Knowledge to Understanding
11
Semantic AI tags new discoveries for search, re-use, & building the knowledge graph!
12
SEMANTIC AI USE CASE IN ENVIRONMENTAL SCIENCE:
Semantic AI creates a Smarter Data Narrative
• It is best when we understand our data’s context and meaning…
• … the Semantics! This is based on Ontologies.
• My students memorized the definition of an Ontology…
–“is_a formal, explicit specification of a shared conceptualization.”
from Tom Gruber (Stanford)
• Semantic “facts” can be expressed in a database as RDF triples:
{subject, predicate, object} = {noun, verb, noun}
13
Get Smart (Data)!
• Collect, Create, Connect smart data across your repositories.
• Build Actionable Knowledge with Semantic AI, not databases!
… then Explore and Exploit Your Knowledge Graph.
14http://ghostednotes.com/category/semantic-web
Chapters
Indexes
Covers
Tablesof
Contents
https://www.quora.com/What-is-the-main-goal-of-semantic-web
Query your data for Patterns & Knowledge
(Action)(Discovery)
Andreas Blumauer
CEO & Managing Partner
Semantic Web Company /
PoolParty Semantic Suite
Semantic AI
Bringing Machine Learning, NLP
and Knowledge Graphs together
Agenda
16
Semantic
AI
▸ A Quick Introduction to the Semantic Web
▹ Semantic Web in Use
▹ Reasoning
▹ The Linked Data Lifecycle
▸ Six Core Aspects of Semantic AI
▹ Data Quality
▹ Data as a Service
▹ No black-box
▹ Hybrid approach
▹ Structured data meets text
▹ Towards self
optimizing machines
A Quick Introduction
To the Semantic Web
Benefiting from Knowledge Graphs and
Semantic Web Standards
17
The Semantic
Web
A standards-based
graph of
knowledge graphs
18
Linking Open Data cloud diagram 2017, by Andrejs Abele, John P. McCrae, Paul Buitelaar, Anja Jentzsch and Richard Cyganiak. http://lod-
cloud.net/
Semantic Web
in Use
Knowledge Graphs
to support Search
and Q&A engines
Knowledge Graphs (KG) can
cover general knowledge (often
also called cross-domain or
encyclopedic knowledge), or
provide knowledge about special
domains such as biomedicine.
In most cases KGs are based on
Semantic Web standards, and
have been generated by a mixture
of automatic extraction from text
or structured data, and manual
curation work.
Examples:
▸ DBpedia
▸ Google Knowledge Graph
▸ YAGO
▸ OpenCyc
▸ Wikidata
19 Who is the inventor of the World Wide Web?
Reasoning
Knowledge Graphs
& Knowledge
Extraction
20
Perth
Australia
Perth is one of
the most isolated
major cities in the
world, with a
population of
2,022,044 living
in Greater Perth.
Australia is a
member of the
OECD, United
Nations, G20,
ANZUS, and
the World
Trade
Organisation.
Country
City
is a
is a
is located in
Avoid illogical
answers:
Support complex
Q&A:
distance between
Which cities located in
the
Commonwealth of
Nations
have a population of
Commonwealt
h of Nations
Internation
al
Organisatio
n
is part of
is a
The Linked Data
Life Cycle
Creating Semantic
Data along the
Data Life Cycle
21
Auer, S. et al. (2012). Managing the life-cycle of linked data with the LOD2 stack.In International semantic Web conference (pp. 1-16). Springer,
Berlin, Heidelberg. https://link.springer.com/content/pdf/10.1007/978-3-642-35173-0_1.pdf
Six Core Aspects of
Semantic AI
#SemanticAI: Bringing Machine Learning,
NLP and Knowledge Graphs together
22
Six Core Aspects
of Semantic AI
1. Data Quality: Semantically enriched data serves as a basis for better data
quality and provides more options for feature extraction.
2. Data as a Service: Linked data based on W3C Standards can serve as an
enterprise-wide data platform and helps to provide training data for machine
learning in a more cost-efficient way.
3. No black-box: Semantic AI ultimately leads to AI governance that works on
three layers: technically, ethically, and on the legal layer.
4. Hybrid approach: Semantic AI is the combination of methods derived from
symbolic AI and statistical AI.
5. Structured data meets text: Most machine learning algorithms work well
either with text or with structured data.
6. Towards self optimizing machines: Machine learning can help to extend
knowledge graphs, and in return, knowledge graphs can help to improve ML
algorithms.
https://www.datasciencecentral.com/profiles/blogs/six-core-aspects-of-semantic-ai
23
1. Data Quality
Benchmarking
the PoolParty
Semantic
Classifier
24
Reegle thesaurus
A comprehensive SKOS taxonomy
for the clean energy sector
(http://data.reeep.org/thesaurus/guide)
● 3,420 concepts
● 7,280 labels (English version)
● 9,183 relations (broader/narrower + related)
Document Training Set
1,800 documents in 7 classes
Renewable Energy, District Heating Systems,
Cogeneration, Energy Efficiency, Energy (general),
Climate Protection, Rural Electrification
▸ Improvement of 5.2% (F1 score) compared to
traditional (term-based) SVM
1. Data Quality
PoolParty
Semantic
Classifier in a
Nutshell
25
PoolParty Semantic Classifier combines machine learning algorithms
(SVM, Deep Learning, Naive Bayes, etc.) with Semantic Knowledge Graphs.
2. Data as a
Service
26
Structured Data
Machine
Learning
Cognitive
Applications
2. Data as a
Service
27 Unstructured Data
Structured Data
Machine
Learning
Cognitive
Applications
2. Data as a
Service
28 Unstructured Data
Structured Data
Knowledge Graphs
Machine
Learning
Cognitive
Applications
2. Data as a
Service
Knowledge Graphs
as a Data Model
for Machine
Learning
Wilcke X, Bloem P, De Boer V. The Knowledge Graph as the Default Data Model for Machine Learning.
Data Science. 2017 Oct 17;1-19. Available from, DOI: 10.3233/DS-170007
29 “Traditionally, when
faced with
heterogeneous
knowledge in a machine
learning context, data
scientists preprocess
the data and engineer
feature vectors so they
can be used as input for
learning algorithms
(e.g., for classification).”
3. No Black Box
Infrastructure to
overcome
information
asymmetries
between the
developers of AI
systems and other
stakeholders
30
3. No Black Box
Explainable AI
Classifiers based on ML algorithms such as Deep Learning perform better when training data is
semantically enhanced. Additional features are derived from a controlled vocabulary, which also
make the used features more transparent to the Data Scientist.
31
4. Hybrid
Approach
32
Artificial Intelligence
ANN
Symbolic AISub-Symbolic AI Statistical AI
KR & reasoning
NLP
Machine Learning
Word Embedding Deep Learning
Natural Language
Understanding
Entity Recognition &
Linking
Knowledge Extraction
Semantic enhanced
Text Classification
In Semantic AI, various methods
from Symbolic AI are combined with
machine learning methods, and/or
neuronal networks.
Examples:
● Semantic enrichment of
text corpora to enhance
word embeddings
● Extraction of semantic features
from text to improve ML-based
classification tasks
● Combine ML-based with Graph-
based entity extraction
● Knowledge Graphs as a Data
Model for Machine Learning
● ….
5. Structured
Data meets Text
33 Purchase
History
Social
Media
Recommender
Personal Assistant
Prediction
Customer Retention
Classification
Intent Detection
Examples for use
cases based on
(Semantic) AI
34
6. Towards self
optimizing
machines
35 ▸ Semantic AI is the next-generation
Artificial Intelligence
▸ Machine learning can help to extend
knowledge graphs (e.g., through
‘corpus-based ontology learning’ or
through graph mapping based on
‘spreading activation’), and in return,
knowledge graphs can help to improve
ML algorithms (e.g., through ‘distant
supervision’).
▸ This integrated approach ultimately
leads to systems that work like self
optimizing machines after an initial
setup phase, while being transparent to
the underlying knowledge models.
▸ Graph Convolutional Networks (in
progress) promise new insights
Mike Bergman: Knowledge-based Artificial Intelligence
(2014) http://www.mkbergman.com/1816/knowledge-based-artificial-
intelligence/
▸ To understand
▹ Content aboutness in a defined
framework
▹ Data relationships and context within
a
unified organizational model
▹ Connections across disparate datasets
▸ To increase precision
▹ Hierarchical or other mapped
relationships allow for recommending
similar content when exact matches
not found
▹ Granularity allows for more specific
recommendations
▹ Consistency across structure results
more precise analysis and predictions
Source: Suzanne Carroll, Data Science Product Director at XO Group
Why
Data Scientists
need
Semantic Models
36
Next steps
▸ Mail: andreas.blumauer@semantic-web.com
▸ LinkedIn: https://www.linkedin.com/in/andreasblumauer
▸ Download: White Paper ‘Introducing Semantic AI’
▸ Visit: SEMANTiCS Conference
▸ E-Learn: PoolParty Academy
37
© Semantic Web Company - http://www.semantic-web.com and http://www.poolparty.biz/

BrightTALK - Semantic AI

  • 1.
    Principal Data Scientist BoozAllen Hamilton http://www.boozallen.com/datascience Kirk Borne @KirkDBorne Semantic AI: Smart Data for Smarter Discovery & Actions
  • 2.
    Six Core Aspectsof Semantic AI https://bit.ly/2Kxw8H5 •Hybrid Approach •Data Quality •Data as a Service •Structured Data Meets Text •No Black-box •Towards Self-optimizing Machines
  • 3.
    Ever since wefirst explored our world… http://www.livescience.com/27663-seven-seas.html 3
  • 4.
    …We have askedquestions about everything around us. https://atillakingthehun.wordpress.com/2014/08/07/atlantis-not-lost/ 4
  • 5.
    So, we havecollected evidence (data) to answer our questions, which leads to more questions, which leads to more data collection, which leads to more questions, which leads to… BIG DATA! 5 https://www.linkedin.com/pulse/exponential-growth-isnt-cool-combinatorial-tor-bair
  • 6.
    So, we havecollected evidence (data) to answer our questions, which leads to more questions, which leads to more data collection, which leads to more questions, which leads to… BIG DATA! y ~ 2 * x (linear growth) y ~ 2 ^ x (exponential growth) 6 https://www.linkedin.com/pulse/exponential-growth-isnt-cool-combinatorial-tor-bair y ~ x! ≈ x ^ x → Combinatorial Growth! (all possible interconnections, linkages, and interactions) 3+1 V’s of Big Data: Volume = most annoying V Velocity = most challenging V Variety = most rich V for discovery Value = the most important V
  • 7.
    “All the Worldis a Graph” – Shakespeare? (Graphic by Cray, for Cray Graph Engine CGE) 7 http://www.cray.com/products/analytics/cray-graph-engine
  • 8.
    Semantic, Meaning-filled Data: •Ontologies (formal) • Taxonomies (class hierarchies) • Folksonomies (informal) • Tagging / Annotation – Automated (Machine Learning) – Crowdsourced – “Breadcrumbs” (user trails) Broad, Enriched Data: • Linked Data (RDF) – All of those combinations! • Graph Databases • Machine Learning • Cognitive Analytics • Context • The 360o view Making Sense of the World with Smart Data The Human Connectome Project: mapping and linking the major pathways in the brain. http://www.humanconnectomeproject.org/ 8
  • 9.
    Semantic AI inthe Internet of Things (IoT): Internet of Everything https://www.nsf.gov/news/news_images.jsp?cntn_id=122028 9 The Internet of Things (IoT) will be an interconnected network of Sensors and Dynamic Data-Driven Application Systems (dddas.org) => Leading to a Combinatorial Explosive Growth of Smart Data! IoT will power an “Internet of Context” – empowering smarter actionable intelligence from contextual data everywhere!
  • 10.
    1) Class Discovery:Find the categories of objects (population segments), events, and behaviors in your data. + Learn the rules that constrain the class boundaries (that uniquely distinguish them). 2) Correlation (Predictive and Prescriptive Power) Discovery: Finding trends, patterns, dependencies in data, which reveal the governing principles or behavioral patterns (the object’s “DNA”). 3) Novelty (Surprise!) Discovery: Finding new, rare, one-in-a-[million / billion / trillion] objects, events, or behaviors. 4) Association (or Link) Discovery: (Graph and Network Analytics) – Find the unusual (interesting) co-occurring Make your data smarter with Machine Learning = = generate semantic tags that describe discoveries 10 (Graphic by S. G. Djorgovski, Caltech)
  • 11.
    SEMANTIC AI USECASE IN ENVIRONMENTAL SCIENCE: From Data to Information to Knowledge to Understanding 11
  • 12.
    Semantic AI tagsnew discoveries for search, re-use, & building the knowledge graph! 12 SEMANTIC AI USE CASE IN ENVIRONMENTAL SCIENCE:
  • 13.
    Semantic AI createsa Smarter Data Narrative • It is best when we understand our data’s context and meaning… • … the Semantics! This is based on Ontologies. • My students memorized the definition of an Ontology… –“is_a formal, explicit specification of a shared conceptualization.” from Tom Gruber (Stanford) • Semantic “facts” can be expressed in a database as RDF triples: {subject, predicate, object} = {noun, verb, noun} 13
  • 14.
    Get Smart (Data)! •Collect, Create, Connect smart data across your repositories. • Build Actionable Knowledge with Semantic AI, not databases! … then Explore and Exploit Your Knowledge Graph. 14http://ghostednotes.com/category/semantic-web Chapters Indexes Covers Tablesof Contents https://www.quora.com/What-is-the-main-goal-of-semantic-web Query your data for Patterns & Knowledge (Action)(Discovery)
  • 15.
    Andreas Blumauer CEO &Managing Partner Semantic Web Company / PoolParty Semantic Suite Semantic AI Bringing Machine Learning, NLP and Knowledge Graphs together
  • 16.
    Agenda 16 Semantic AI ▸ A QuickIntroduction to the Semantic Web ▹ Semantic Web in Use ▹ Reasoning ▹ The Linked Data Lifecycle ▸ Six Core Aspects of Semantic AI ▹ Data Quality ▹ Data as a Service ▹ No black-box ▹ Hybrid approach ▹ Structured data meets text ▹ Towards self optimizing machines
  • 17.
    A Quick Introduction Tothe Semantic Web Benefiting from Knowledge Graphs and Semantic Web Standards 17
  • 18.
    The Semantic Web A standards-based graphof knowledge graphs 18 Linking Open Data cloud diagram 2017, by Andrejs Abele, John P. McCrae, Paul Buitelaar, Anja Jentzsch and Richard Cyganiak. http://lod- cloud.net/
  • 19.
    Semantic Web in Use KnowledgeGraphs to support Search and Q&A engines Knowledge Graphs (KG) can cover general knowledge (often also called cross-domain or encyclopedic knowledge), or provide knowledge about special domains such as biomedicine. In most cases KGs are based on Semantic Web standards, and have been generated by a mixture of automatic extraction from text or structured data, and manual curation work. Examples: ▸ DBpedia ▸ Google Knowledge Graph ▸ YAGO ▸ OpenCyc ▸ Wikidata 19 Who is the inventor of the World Wide Web?
  • 20.
    Reasoning Knowledge Graphs & Knowledge Extraction 20 Perth Australia Perthis one of the most isolated major cities in the world, with a population of 2,022,044 living in Greater Perth. Australia is a member of the OECD, United Nations, G20, ANZUS, and the World Trade Organisation. Country City is a is a is located in Avoid illogical answers: Support complex Q&A: distance between Which cities located in the Commonwealth of Nations have a population of Commonwealt h of Nations Internation al Organisatio n is part of is a
  • 21.
    The Linked Data LifeCycle Creating Semantic Data along the Data Life Cycle 21 Auer, S. et al. (2012). Managing the life-cycle of linked data with the LOD2 stack.In International semantic Web conference (pp. 1-16). Springer, Berlin, Heidelberg. https://link.springer.com/content/pdf/10.1007/978-3-642-35173-0_1.pdf
  • 22.
    Six Core Aspectsof Semantic AI #SemanticAI: Bringing Machine Learning, NLP and Knowledge Graphs together 22
  • 23.
    Six Core Aspects ofSemantic AI 1. Data Quality: Semantically enriched data serves as a basis for better data quality and provides more options for feature extraction. 2. Data as a Service: Linked data based on W3C Standards can serve as an enterprise-wide data platform and helps to provide training data for machine learning in a more cost-efficient way. 3. No black-box: Semantic AI ultimately leads to AI governance that works on three layers: technically, ethically, and on the legal layer. 4. Hybrid approach: Semantic AI is the combination of methods derived from symbolic AI and statistical AI. 5. Structured data meets text: Most machine learning algorithms work well either with text or with structured data. 6. Towards self optimizing machines: Machine learning can help to extend knowledge graphs, and in return, knowledge graphs can help to improve ML algorithms. https://www.datasciencecentral.com/profiles/blogs/six-core-aspects-of-semantic-ai 23
  • 24.
    1. Data Quality Benchmarking thePoolParty Semantic Classifier 24 Reegle thesaurus A comprehensive SKOS taxonomy for the clean energy sector (http://data.reeep.org/thesaurus/guide) ● 3,420 concepts ● 7,280 labels (English version) ● 9,183 relations (broader/narrower + related) Document Training Set 1,800 documents in 7 classes Renewable Energy, District Heating Systems, Cogeneration, Energy Efficiency, Energy (general), Climate Protection, Rural Electrification ▸ Improvement of 5.2% (F1 score) compared to traditional (term-based) SVM
  • 25.
    1. Data Quality PoolParty Semantic Classifierin a Nutshell 25 PoolParty Semantic Classifier combines machine learning algorithms (SVM, Deep Learning, Naive Bayes, etc.) with Semantic Knowledge Graphs.
  • 26.
    2. Data asa Service 26 Structured Data Machine Learning Cognitive Applications
  • 27.
    2. Data asa Service 27 Unstructured Data Structured Data Machine Learning Cognitive Applications
  • 28.
    2. Data asa Service 28 Unstructured Data Structured Data Knowledge Graphs Machine Learning Cognitive Applications
  • 29.
    2. Data asa Service Knowledge Graphs as a Data Model for Machine Learning Wilcke X, Bloem P, De Boer V. The Knowledge Graph as the Default Data Model for Machine Learning. Data Science. 2017 Oct 17;1-19. Available from, DOI: 10.3233/DS-170007 29 “Traditionally, when faced with heterogeneous knowledge in a machine learning context, data scientists preprocess the data and engineer feature vectors so they can be used as input for learning algorithms (e.g., for classification).”
  • 30.
    3. No BlackBox Infrastructure to overcome information asymmetries between the developers of AI systems and other stakeholders 30
  • 31.
    3. No BlackBox Explainable AI Classifiers based on ML algorithms such as Deep Learning perform better when training data is semantically enhanced. Additional features are derived from a controlled vocabulary, which also make the used features more transparent to the Data Scientist. 31
  • 32.
    4. Hybrid Approach 32 Artificial Intelligence ANN SymbolicAISub-Symbolic AI Statistical AI KR & reasoning NLP Machine Learning Word Embedding Deep Learning Natural Language Understanding Entity Recognition & Linking Knowledge Extraction Semantic enhanced Text Classification In Semantic AI, various methods from Symbolic AI are combined with machine learning methods, and/or neuronal networks. Examples: ● Semantic enrichment of text corpora to enhance word embeddings ● Extraction of semantic features from text to improve ML-based classification tasks ● Combine ML-based with Graph- based entity extraction ● Knowledge Graphs as a Data Model for Machine Learning ● ….
  • 33.
    5. Structured Data meetsText 33 Purchase History Social Media Recommender Personal Assistant Prediction Customer Retention Classification Intent Detection
  • 34.
    Examples for use casesbased on (Semantic) AI 34
  • 35.
    6. Towards self optimizing machines 35▸ Semantic AI is the next-generation Artificial Intelligence ▸ Machine learning can help to extend knowledge graphs (e.g., through ‘corpus-based ontology learning’ or through graph mapping based on ‘spreading activation’), and in return, knowledge graphs can help to improve ML algorithms (e.g., through ‘distant supervision’). ▸ This integrated approach ultimately leads to systems that work like self optimizing machines after an initial setup phase, while being transparent to the underlying knowledge models. ▸ Graph Convolutional Networks (in progress) promise new insights Mike Bergman: Knowledge-based Artificial Intelligence (2014) http://www.mkbergman.com/1816/knowledge-based-artificial- intelligence/
  • 36.
    ▸ To understand ▹Content aboutness in a defined framework ▹ Data relationships and context within a unified organizational model ▹ Connections across disparate datasets ▸ To increase precision ▹ Hierarchical or other mapped relationships allow for recommending similar content when exact matches not found ▹ Granularity allows for more specific recommendations ▹ Consistency across structure results more precise analysis and predictions Source: Suzanne Carroll, Data Science Product Director at XO Group Why Data Scientists need Semantic Models 36
  • 37.
    Next steps ▸ Mail:andreas.blumauer@semantic-web.com ▸ LinkedIn: https://www.linkedin.com/in/andreasblumauer ▸ Download: White Paper ‘Introducing Semantic AI’ ▸ Visit: SEMANTiCS Conference ▸ E-Learn: PoolParty Academy 37 © Semantic Web Company - http://www.semantic-web.com and http://www.poolparty.biz/