SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
1.
Knowledge Graphs: Smart Big Data
Dieter Fensel & Umutcan Şimşek
STI Innsbruck, University of Innsbruck, Austria
May 5, 2021
2.
Knowledge Graphs
1. What are Knowledge Graphs in a Nutshell
2. Why they are so desperately needed!
1. World Wide Web
2. Virtual intelligent agents
3. Physical intelligent agents
3. How to build them effectively and efficiently?
1. Knowledge Graph Methodology
2. Knowledge Graph Generation
3. Knowledge Graph Hosting
4. Knowledge Graph Curation
5. Knowledge Graph Deployment
4. The Proof Of The Pudding Is In The Eating
1. Open Touristic Knowledge Graph
2. Chatbots
5. Conclusions
2
3.
1. Knowledge Graphs in a Nutshell
Google Knowledge Graph
• “A huge knowledge graph of interconnected entities and their attributes”.
Amit Singhal, Senior Vice President at Google
• “A knowledge based used by Google to enhance its search engine’s results with
semantic-search information gathered from a wide variety of sources”
http://en.wikipedia.org/wiki/Knowledge_Graph
• Based on information derived from
many sources including Freebase,
CIA World Factbook, Wikipedia
• Contains many billion facts and “Objects”
3
4/1/2021 www.sti-innsbruck.at
4.
1. Knowledge Graphs in a Nutshell
very large semantic nets that integrate various and heterogenous
information sources to represent knowledge about certain domains.
4
Heterogenous
data from different
sources can be
easily integrated
Large-scale
Knowledge Graphs
can get really big
really fast
No Schema
at least not in
the sense of
Relational
Databases.
5.
1. Knowledge Graphs in a Nutshell
Name Instances Facts Types Relations
DBpedia (English) 4,806,150 176,043,129 735 2,813
YAGO 4,595,906 25,946,870 488,469 77
Freebase 49,947,845 3,041,722,635 26,507 37,781
Wikidata 15,602,060 65,993,797 23,157 1,673
NELL 2,006,896 432,845 285 425
OpenCyc 118,499 2,413,894 45,153 18,526
Google´s Knowledge Graph 570,000,000 18,000,000,000 1,500 35,000
Google´s Knowledge Vault 45,000,000 271,000,000 1,100 4,469
Yahoo! Knowledge Graph 3,443,743 1,391,054,990 250 800
Knowledge Graphs in the wild: Large ABoxes in comparison to smaller TBoxes
6.
2. Why are Knowledge Graphs needed
1. World Wide Web
2. Virtual intelligent agents, e.g. bots
3. Physical intelligent agents, e.g. autonomous cars
6
7.
World Wide Web (WWW)
Technologiy for eMarketing and eCommerce
The Web
Search
7
• Has no central content repository (only DNS)
• Early 90tees: Bookmark lists (list of magic keys
ala „Open Sesame“)
• Mid 90tees: various competing search engines
based on different paradigms
• Google made the game
8.
• Helping the web to scale infinitely
World Wide Web (WWW)
<html> <body>
<a onto=„page:Researcher“>
<h2>Welcome on my homepage</html>
My name is <a onto=[name=body]“> Richard
Benjamins</a>.
</body></html> HTML-A 1996
4/1/2021 www.sti-innsbruck.at 8
9.
World Wide Web (WWW)
Evolving Technologies for eMarketing and eCommerce
The Web
Search
9
• Has no central content repository (only DNS)
• Early 90tees: Bookmark lists (list of magic keys
ala „Open Sesame“)
• 90tees: Bookmark lists
• Google made the game.
• Semantic approaches were pushed aside by
matrix multiplication of word vectors.
Semantic Web:
NO !
10.
World Wide Web (WWW)
Evolving Technologies for eMarketing and eCommerce
The Web
Search
10
11.
Google as a Query Answering Engine
[2012]
World Wide Web Web: Google 2.0
Semantic Web:
YES !
4/1/2021 www.sti-innsbruck.at 11
12.
2. Semantic Web: Google 2.0
12
4/1/2021 www.sti-innsbruck.at
T
h
e
W
e
b
19.
World Wide Web (WWW)
Evolving Technologies for eMarketing and eCommerce
The Web
Search
Semantic Web
Query Answering
19
20.
World Wide Web (WWW)
Evolving Technologies for eMarketing and eCommerce
The Web
Search
Semantic Web
Query Answering
20
21.
World Wide Web (WWW)
Evolving Technologies for eMarketing and eCommerce
The Web
Search
Semantic Web
Query Answering
Knowledge Graph
Goal and Service
Oriented Dialog
21
22.
World Wide Web (WWW)
Evolving Technologies for eMarketing and eCommerce
Knowledge Graph
Goal and Service Oriented Dialog
22
24.
Bots
Evolving Technologies for eMarketing and eCommerce
The Web
Search
Semantic Web
Query Answering
Knowledge Graph
Goal and Service
Oriented Dialoque
24
25.
Bots
4
• Misunderstandings due to lack
shared common sense, i.e.
world knowledge.
25
28.
Bots
• The quality of Intelligent Assistants depends directly on the quality of the
Knowledge Graph.
• Problem: “Garbage in Garbage out”
• Requirements for the Knowledge Graph:
• well structured (using an ontology - schema.org)
• accurate information (correctness)
• large and detailed coverage (completeness)
• Timeliness of knowledge
==> Method- and Tool-supported Knowledge Graph Lifecycle
28
29.
Bots
User
1. understand
Intent
+
Parameters
2. map Query
3. query
Knowledge
Graph
4. Natural
Language
Generation
29
30.
„Armed“ physical agent
How Knowledge Graphs can prevent AI
from killing people
30
31.
The brave New World of AI
• Autonomous Driving
31
32.
Failures of AI technology
• In May 2016 Joshua Brown was killed by his car because its autopilot
mixed up a very long car (large wheelbase) with a traffic sign.
32
33.
Failures of AI technology
• In May 2016 Joshua Brown was killed by his car because its auto pilot
mixed up a very long car (large wheelbase) with a traffic sign.
This is what the auto
pilot „saw“
33
34.
Failures of AI technology
• In May 2016 Joshua Brown was killed by his car because its auto pilot
mixed up a very long car (large wheelbase) with a traffic sign.
This is what the auto
pilot „saw“
34
Why did none of the 10,000++ engineers involved not have the trivial
idea to connect the car with a Knowledge Graph containing traffic
data that simply knows that there is no traffic sign?
35.
Failures of AI technology
• In March 2018 Elaine Herzberg was the first victim of a full autonomously driving
car.
35
36.
Failures of AI technology
• In March 2018 Elaine Herzberg was the first victom of a full autonomously driving
car.
• Besides many software bugs by Uber a la Boeing a core issue was that the car
assumed that pedestrians cross streets only on crosswalks.
• Make assumptions explicit and confirm them with a knowledge graph.
• In this case she still would be alive!
36
53.
3.2. Knowledge Generation
• https://www.schema.org/
• Started in 2011 by Bing, Google, Yahoo!, and Yandex to annotate websites.
• Has become de facto standard.
• We use it for the website channel as well as for all other channels as an
reference model for our semantic annotations.
• We define domain-specific extensions (that also restrict the genericity of entire
schema.org).
53
55.
3.2. Knowledge Generation
• The use of semantic annotations has experienced a tremendous surge in activity since the
introduction of schema.org.
• Schema.org was introduced with 297 classes and 187 properties,
• which over have grown to 779 Types, 1390 Properties 15 Datatypes, 81 Enumerations and
437 Enumeration members.*
• The provided corpus of
• types (e.g. LocalBusiness, SkiResort, Restaurant),
• properties (e.g. name, description, address),
• range restrictions (e.g. Text, URL, PostalAddress),
• and enumeration values (e.g. DayOfWeek, EventStatusType, ItemAvailability)
covers large numbers of different domains.
55
* on 01.04.2021
56.
3.2. Knowledge Generation
• We define domain-specific extensions (that also restrict the genericity of
entire schema.org).
• Domain Specifications:
• restrict genericity and
• extend domain-specificity
of schema.org.
• Are based on SHACL
• https://schema-tourism.sti2.org/
• We use value restriction not as
inference mechanism but as integrity constraint.
Schema.org
Domain
Domain Specification
56
57.
3.2. Knowledge Generation
Our Methodology:
• the bottom-up part,
which describes the steps of
the initial annotation process;
• the domain specification
modeling; and
• the top-down part, which
applies the constructed
models.
57
58.
3.2. Knowledge Generation
Semantify.it1:
A platform for creating, hosting, validating, verifying, and publishing
schema.org annotated data
• annotation of static data based on schema.org templates
🡪 Domain Specifications2
• annotation of different schemata and dynamic data based on
RML3 mappings🡪 Rocket RML4
1 https://semantify.it
2 http://ds.sti2.org
3 https://rml.io
4 https://github.com/semantifyit/RocketRML
58
60.
3.2. Knowledge Generation
• Semi-automatic
• Annotation Editor suggests mappings/extracted information
• e.g. extract information from web pages (by HTML tags).
• Use partial NLU to find similarities of the content and schema.org vocabulary.
• Manual adaptions needed to define and to evaluate.
• Instance of the general issues of wrapper generation.
60
61.
3.2. Knowledge Generation
• Mapping (more than 95% of the story)
• integrate large and fast changing data sets,
• map different formats to the ontology used in our Knowledge Graph,
• Various frameworks: XLWrap, Mapping Master (M2), a generic XMLtoRDF tool
providing a mapping document (XML document) that has a link between an XML
Schema and an OWL ontology, Tripliser, GRDDL, R2RML, RML, ...
• We developed an efficient mapping engine for the RDF Mapping
Language RML, called RocketRML. It is a rule-based engine that
efficiently processes RML mappings and creates RDF data.
• The semantify.it platform features a wrapper API where these
mappings can be stored and applied to corresponding data.
61
62.
RML:
● Easier to learn RML than a programming language
● Easy sharing
● Mapping can be visualized
● Mapfiles can be faster to write than code
● Easily change mappings
● Rocket RML pre-compiles joins to improve performance
by several order of magnitudes.
RML YARRRML Matey
62
3.2. Knowledge Generation
63.
3.2. Knowledge Generation
Automatic extraction of knowledge from text
representations and web pages
• Tasks
• named entity recognition,
• concept mining, text mining,
• relation detection, …
• Methods
• Information Extraction,
• Natural Language Processing (NLP),
• Machine Learning (ML)
63
• Systems:
• GATE (text analysis & language
processing)
• OpenNLP (supports most common
NLP tasks)
• RapidMine (data preparation, machine
learning, deep learning, text mining,
predictive analysis)
• Ontotext / Sirma
• Important when large amounts of semi-
structured information is crawled (from
the Web) through semantify.
64.
3.2. Knowledge Generation
Evaluation of semantic annotations:
• The semantify.it evaluator is a web-tool that offers the possibility to
validate schema.org annotations that are scraped from websites.
• Verification: The annotations are checked against plain schema.org and
against domain specifications.
• Validation: The annotations are checked whether they accurately
describe of the content of the web site.
• https://semantif.it/evaluate
64
65.
3.2. Knowledge Generation
Evaluation of semantic annotations:
• Notice we take the content of the web site as Golden Standard.
• We do NOT evaluate the accuracy of that content in regard to the
„real“ world.
• We check whether a phone number confirms to the
formal constraints.
• We do not make robocalls to hotels.
to check whether the „right“ hotel pick up the phone.
65
67.
3.2. Knowledge Generation
• Annotation of dynamic and active data with WASA (earlier called
WSMO).
• Dynamic: Actions to obtain dynamic data (e.g. weather forecast).
• Active: Actions that can be taken on entities in a Knowledge Graph (e.g. a
room offering of a Hotel can have BuyAction attached to it).
• An action is an instance of schema.org/Action type.
• Describe the invocation mechanism (e.g. endpoint, HTTP method, encoding
type).
• Describe input and output parameters with SHACL (another implementation
of domain specifications).
• Grounding and lifting for existing Web APIs.
67
69.
3.2 Knowledge Generation
• Semantify.actions provides an interface for annotating a Web API as a
collection of schema:Action annotations.
• A basic interface for annotating the essential constraints on input and
output (e.g. range restrictions, minimum cardinality).
• A more advanced interface is available for each property to define
further constraints (e.g. relationships between properties).
• Web APIs are wrapped for WASA clients:
○ Request grounding with XQuery to XML and JSON
○ Response lifting from XML and JSON to schema.org with RML
○ Utility functions can be defined in Javascript lifting and grounding (e.g., for data
transformation, conditional mappings).
69
70.
3.2 Knowledge Generation
EventSearch a schema:SearchAction;
schema:name "Event Search";
schema:description "Search different types of events based
on name location or date";
schema:actionStatus schema:PotentialActionStatus;
schema:target [
a schema:EntryPoint;
schema:urlTemplate :query;
schema:encodingFormat "application/ld+json";
schema:contentType "application/ld+json";
schema:httpMethod "POST"
];
wasa:actionShape [
a sh:NodeShape;
sh:property [
sh:path schema:object;
sh:group wasa:Input;
sh:class schema:Event;
70
sh:node [
sh:property [
sh:path schema:name;
sh:datatype
xsd:string;
sh:maxCount 1
];
sh:property [
sh:path
schema:startDate;
sh:datatype xsd:date;
sh:maxCount 1
];
sh:property [
sh:path
schema:endDate;
sh:datatype xsd:date;
sh:maxCount 1
];
...
An excerpt from a
schema.org action
with potential action
status.
The value of
schema:target
describes the
invocation mechanism
The value of
wasa:actionShape
contains a SHACL
shape that describes
the input, output and
their relationship.
See the full annotation at: https://bit.ly/32jiMZJ
72.
3.3. Knowledge Hosting
• Semantically annotated data can be serialized to JSON-LD
• storage in document store MongoDB
• native JSON storage;
• well-integrated in current state of the art software with NodeJS
• performant search, through indexing;
• Allows efficient publication of annotations on webpages;
• not hardware intensive
no native RDF querying with SPARQL.
72
73.
3.3. Knowledge Hosting
Native storage of semantically annotated data versus relational database
• Relational databases have rigid schemas:
• good for data quality and optimization (query and storage)
• bad for integrating heterogeneous and dynamic sources
• Storing RDF data in RDBMS:
• when a new type or property comes, in a graph database it is just a new node or
edge; in relational databases schema needs to be rebuilt.
• Triple tables may make some optimizations ineffective
• complex queries over connected data can cause performance problems: too many
joins (possibly self joins)
• Virtualization as an alternative:
• Query optimization is problematic (things can go wrong while query rewriting)
• no native reasoning
73
74.
3.3. Knowledge Hosting
• RDF store: GraphDB
• very powerful CRUD operations
• named graphs for versioning
• full implementation of SPARQL
• powerful reasoning over big data sets
• very hardware intensive
• Task 4. Developed by Ontotext
https://www.ontotext.com/
74
76.
3.4. Knowledge Curation
76
● Type hierarchy
● Property definitions
● Property value
assertions
● aligned with data
model of schema.org
Should support Formalism
Knowledge Representation Formalism
77.
3.4. Knowledge Curation
77
● Open environment
● Predominantly
incomplete
information
● should be usable
and extendable by
everyone
● Closed, controlled
environment
● Complete
information for
restricted context
● usable and
extendable with
permission only
Web
(Enterprise)
Knowledge Graph
Knowledge Representation Formalism -
Application Scope
79.
3.4. Knowledge Curation
•We defined a simple KR formalism formalizing
essentials of schema.org
• Tbox: isA statements of types, domain and range definitions for properties
(using them globally or locally)
• Abox: isElementOf(I,t) statements, propertyValue statements p(i1,i2), and
sameAs(i1,i2) statements
• Enables a formal definition of the knowledge curation task:
• assessment,
• cleaning, and
• enrichment.
79
80.
3.4. Knowledge Curation
80
● Closed-World-Assumption
○ Verification
● Separation TBox and ABox
○ Simplification
○ Independent tasks
Restrictions Extensions
● Support for disjunctive ranges
● Local properties
Resource Description Framework
Schema (RDFS) as basis for MSKR
81.
3.4. Knowledge Curation
Informal definition of MSKR
● Two disjoint and finite sets of type (T) and
property names (P)
● Finite number of type definitions isA(t1, t2)
○ t1 and t2 ∈ T
○ isA is reflexive & transitive
● Finite number of property definitions:
○ hasDomain(p, t) with p ∈ P and t ∈ T
○ Range definition for property p (p ∈ P), t1 and t2 ∈ T
■ Simple definition: Global property definition:
hasRange(p, t2)
■ Refined definition: Local property: hasRange(p,
t2) for domain t1 (hasLocalRange(p, t1, t2))
● countable set of instance identifiers (I)
○ i, i1, i2 ∈ I
● Instance assertions: isElementOf(i, t)
○ Semantics: If isA(t1, t2) & isElementOf(i, t1) THEN
isElementOf(i, t2)
● Property value assertions: p(i1, i2)
● Equality assertions: isSameAs(i1, i2)
○ symmetric, reflexive, and transitive
TBox ABox
81
82.
3.4.1. Knowledge Assessment
• First step to improve the quality of a KG: Assess the situation
• Knowledge Assessment describes and defines the process of assessing the
quality of a Knowledge Graph.
• The goal is to measure the usefulness of a Knowledge Graph.
• Evaluation
• Overall process to determine the quality of a Knowledge Graph.
• Select quality dimensions, metrics, evaluation functions, and weights for
metrics and dimensions.
• Evaluate representative subsets accordingly.
82
83.
3.4.1. Knowledge Assessment
[Paulheim et al., 2019] identify the following subtasks:
• specifying datasets and Knowledge Graphs,
• specifying the evaluation protocol,
• specifying the evaluation metrics,
• specifying the task for task-specific evaluation,
• and defining the setting in terms of intristic vs. task-based, and automatic versus human-
centric evaluation,
• as well as the need to keep the results reproducible.
H. Paulheim, M. Sabon, M. Choches, and W. Beck: Evaluation of Knowledge Graphs. In P. A. Bonatti, S. Decker, A. Polleres, and V. Presutti:
Knowledge Graphs: New Directions for Knowledge Representation on the Semantic Web, Dagstuhl Reports, 8(9):29-111, 2019.
83
84.
84
Dimensions:
1. accessibility
2. accuracy (veracity)
3. completeness
4. concise
representation
5. consistent
representation
6. cost-effectiveness
7. flexibility
8. interoperability
9. relevancy
10. timeliness
(velocity)
11. trustworthiness
12. understandability
13. variety
an extended list can be found in [Fensel
et al., 2020]
3.4.1. Knowledge Assessment
85.
85
Each dimension has a set of metrics. Each metric has a calculation function:
Example metric calculation from Understandability dimension:
3.4.1. Knowledge Assessment
86.
86
Some dimensions are more contextual, i.e., needs external information alongside
the Knowledge Graph
Example metric calculation from Relevancy dimension:
3.4.1. Knowledge Assessment
87.
87
Calculate the
assessment score
Calculate a weighted
aggregate score for the
Knowledge Graph for
each domain or task.
Decide on Dimension
Weights
Each dimension may
have have different
level of importance
for different domains
or tasks.
Decide on Metric
Weights
Each metric may have
different impact on
the calculation of the
dimension to which
they belong
3.4.1. Knowledge Assessment
88.
88
A Running Example for Knowledge Cleaning and Enrichment
3.4.1. Knowledge Assessment
89.
Domain Property Range
s:LandmarksOrHistoricalBuildings s:address s:PostalAddress
s:containedInPlace s:Place
s:PostalAddress s:streetAddress s:Text
s:addressLocality s:Text
s:addressCountry s:Country
s:postalCode s:Text
s:TouristAttraction s:availableLanguage s:Text
A subset of schema.org for the running example
89
3.4.1. Knowledge Assessment
90.
URI URI
i t
URI
i1
URI or
“Literal”
p
i2
Instance Assertion
Property Value Assertion
*
URI
i1
URI
sameAs
i2
Equality Assertion
*
i is an instance of the type t
the value of property p on
instance i1 is i2
i1 is the same instance as i2
90
3.4.1. Knowledge Assessment
92.
3.4.1. Knowledge Assessment
Methodologies
• Total Data Quality Management (TDQM) and Data Quality Assessment allow identifying
important quality dimension and their requirements from various perspectives.
• Other methodologies already defined quality metrics that allow a semi-automatic assessment
based on data integrity constraints. Those are for example User-driven, Test-driven assessment
and a manual assessment based on crowd's experts (Crowdsourcing-driven assessment).
• Besides that, there are quality assessment approaches which use statistical distribution for
measuring the correctness of statements, SPARQL queries for the identification of functional
dependency violations and missing values.
92
93.
3.4.1. Knowledge Assessment
Tools and Methods:
• LINK-QA
• using network metrics
• Luzzu (Linked Open Datasets)
• thirty data quality metrics based on Dataset Quality Ontology
• Sieve
• Flexible in expressing various quality assessment methods
• SWIQA (Semantic Web Information Quality Assessment Framework)
• data quality rules & quality scores for identifying wrong data
• Validata
• online tool for testing/validating RDF data against ShEx-schemas
93
94.
3.4.1. Knowledge Assessment
Sleve:
• Sieve for Data Quality Assessment is a framework which consist of two modules:
• a Quality Assessment module and
• a Data Fusion module
• The Quality Assessment Module involves four steps:
1. Data Quality Indicator allows to define an aspect of a data set that may demonstrate the suitability of it for
intended use. For example, meta-information about the creation of a data set, information about the
provider, or ratings provided by the consumers.
2. Scoring Functions define the assessment of the quality indicator based on its quality dimension. Scoring
functions range from simple comparisons, over set functions, aggregation functions, to more complex
statistical functions, text-analysis, or network analysis methods.
3. Assessment Metric calculates the assessment score based on indicators and scoring functions.
4. Aggregate Metric allows users to aggregate new metrics that can generate new assessment values.
94
96.
3.4.1. Knowledge Assessment: Our Approach
96
Process model
● Definition of weights for
different metrics and
different dimensions
● Domain-specific
configuration
● Score between 0 and 1
○ for each dimension
○ adding up its
weighted metrics
scores
● some metrics not
automatable
● Aggregated score
between 0 and 1
○ adding up weighted
dimension scores
97.
3.4.1. Knowledge Assessment: Our Approach
97
Quality Assessment Tool (QAT)
● Provided as Software as a Service (Saas)
● periodically fetches information from
configured data sources
○ automatically
○ on-demand planned
● User defines weights for dimensions and
metrics
● Overall score accessible via
○ API
○ User Interface
98.
3.4.2. Knowledge Cleaning
• The goal of knowledge cleaning is to improve the correctness of a Knowledge
Graph
• Major objectives are
• error detection and
• error correction of
● wrong instance assertions
● wrong property value assertions
● wrong equality assertions
98
102.
3.4.2. Knowledge Cleaning
What Verification Validation
Semantic
Annotations
check schema
conformance and
integrity constraints
compare with web
resource
Knowledge Graphs check schema
conformance and
integrity constraints
compare with "real"
world
102
103.
Actions taken to improve the accuracy of Knowledge Graphs:
Error Detection
Identify errors from
different error sources
Error Correction
Correct the identified
errors manually or
semi-automatically
103
3.4.2. Knowledge Cleaning
104.
Equality Assertions
● Syntactic errors in i1 or i2
● Assertion is semantically wrong
Property Value
Assertions
● Syntactic errors in i1, i2 or p
● p does not exist in the vocabulary
● Domain and range violations
● Assertion is semantically wrong
Instance Assertions
● Syntactic errors in the instance identifiers
● Type does not exist in the vocabulary
● Assertion is semantically wrong
Error sources and types
3.4.2. Knowledge Cleaning
104
106.
Error Correction:
• Wrong Instance assertions
• There can be syntactic errors in instance identifiers
ex:Eiffel Tower not valid without encoding
ex:Eiffel_Tower
106
3.4.2. Knowledge Cleaning
107.
Error Correction:
• Wrong Instance assertions
• The type may not exist in the vocabulary
ex:Eiffel_Tower s:Landmark no such type exists in schema.org
s:LandmarksOrHistoricalBuildings
107
3.4.2. Knowledge Cleaning
108.
Error Correction:
• Wrong Instance assertions
• The assertion may be semantically wrong
ex:Eiffel_Tower s:Event the Eiffel Tower is not an Event. (at
least in this example)
delete the instance assertion
108
3.4.2. Knowledge Cleaning
109.
Error Correction:
• Wrong property value assertions
• There may be syntactic errors in instance or property identifier in an assertion.
ex:Eiffel Tower i is not valid without encoding
ex:Eiffel_Tower
ex:Champ_de_Mars
109
3.4.2. Knowledge Cleaning
110.
Error Correction:
• Wrong property value assertions
• There is no property p in the vocabulary
ex:Eiffel_Tower no such property exists in
schema.org
ex:Champ_de_Mars
110
3.4.2. Knowledge Cleaning
111.
Error Correction:
• Wrong property value assertions
• The type (t) of i1 is not in the domain of property (p)
ex:Eiffel_Tower
not in the domain of
s:availableLanguage
“fr”
s:LandmarksOrHistoricalBuildings s:TouristAttraction
add new instance assertion
111
3.4.2. Knowledge Cleaning
112.
Error Correction:
• Wrong property value assertions
• The type (t) of i2 is not in the range of p for any of the types in its domains.
ex:Eiffel Tower ex:Champ_de_Mars
s:VisualArtWork
not in the range of
s:containedInPlace
s:Place
delete the wrong assertion
and add new instance
assertion
112
3.4.2. Knowledge Cleaning
113.
Error Correction:
• Wrong property value assertions
• The property value assertion is semantically wrong.
ex:Eiffel Tower
s:address
“75001”
“75007”
fix the value
wrong ZIP code
113
3.4.2. Knowledge Cleaning
114.
Error Correction:
• Wrong equality assertions
• Either i1 or i2 or both may be syntactically wrong
• Fix the issue in a manner similar to previous error types.
114
3.4.2. Knowledge Cleaning
115.
Error Correction:
• Wrong equality assertions
• The equality assertion may be semantically wrong.
ex:Eiffel Tower
dbpedia:Paris_
Las_Vegas
two related, but
different things
delete the assertion or
create a “weaker” link
115
3.4.2. Knowledge Cleaning
116.
Tools:
• The existing tools mainly focus on detection of errors. Common approaches:
• Statistical distribution of instance and property value assertions
• Integrity constraints with SPARQL and shapes.
• Correction approaches typically use certain heuristics for syntactical errors and
external trusted Knowledge Graphs for other error types.
116
3.4.2. Knowledge Cleaning
117.
Tools:
• Automating detection of semantically wrong assertions is tricky. How do we
touch the “real world”?
• Take an existing, trustworthy Knowledge Graph as an oracle
• See the websites from where annotations are collected as the source of
truth.
• Similar to Semantify.it Validator approach
117
3.4.2. Knowledge Cleaning
118.
Methods & Tools:
• HoloClean
● Use of integrity constraints,
● external data,
● quantitative statistics.
● Steps
• separate entry datasets into noisy and clean dataset
• assign uncertainty score over the value of noisy datasets
• compute marginal probability for each value to be repaired.
118
3.4.2. Knowledge Cleaning
119.
3.4.2. Knowledge Cleaning
Methods & Tools:
• HoloClean
● use of integrity constraints,
● external data, and
● quantitative statistics.
● Steps
• separate entry datasets into noisy and clean dataset
• assign uncertainty score over the value of noisy datasets
• compute marginal probability for each value to be repaired
• SDValidate
● uses statistical distribution functions
● three steps:
• compute relative predicate frequency for each statement
• each statement selected in first step -> assign score of confidence
• apply threshold of confidence.
• Similar steps for instance assertions.
119
120.
3.4.2. Knowledge Cleaning
Methods & Tools:
• The LOD Laundromat
● cleans Linked Open Data
● takes SPARQL endpoint/archived dataset as entry dataset
● guesses the serialisation format
● identifies syntax errors using a library while parsing RDF
● saves RDF data in canonical format
120
121.
3.4.2. Knowledge Cleaning
Methods & Tools:
• The LOD Laundromat
● cleans Linked Open Data
● takes SPARQL endpoint/archived dataset as entry dataset
● guesses the serialisation format
● identifies syntax errors using a library while parsing RDF
● saves RDF data in canonical format
• KATARA
● identifies correct & incorrect data
● generates possible corrections for wrong data
121
122.
3.4.2. Knowledge Cleaning
Methods & Tools:
• The LOD Laundromat
● cleans Linked Open Data
● takes SPARQL endpoint/archived dataset as entry dataset
● guesses the serialisation format
● identifies syntax errors using a library while parsing RDF
● saves RDF data in canonical format
• KATARA
● identifies correct & incorrect data
● generates possible corrections for wrong data
• SPIN
● SPARQL Constraint Language
● generates SPARQL Query templates based on data quality problems
• inconsistency
• lack of comprehensibility
• heterogeneity
• Redundancy
• Nowadays, SPIN has turned into SHACL, a language for validating RDF graphs.
122
123.
3.4.2. Knowledge Cleaning: System survey
• Verification
• Quality Assessment Frameworks such as Luzzu (A Quality Assessment Framework for Linked Open
Datasets), Sieve (Linked Data Quality Assessment and Fusion), SWIQA (Semantic Web Information
Quality Assessment Framework), and WIQA (Web Information Quality Assessment Framework).
• Approaches that check the conformance of RDF graphs against specifications: Alegro Graph Tool,
RDFUnit, SHACL (Shapes Constraint Language) and ShEx (Shape Expressions), Stardog ICV,
TopBraid, and Validata.
• Tools that use statistical distributions to predict the types of instances (e.g., SDType) and to detect
erroneous relationships that connect two resources (e.g., HoloClean, SDValidate).
• More approaches: KATARA, LOD Laundromat, …
• Validation
• Fact validation frameworks: COPAAL (Corroborative Fact Validation), DeFacto (Deep Fact
Validation), FactCheck, FacTify, Leopard, Surface, S3K, and TISCO.
123
124.
3.4.2. Knowledge Cleaning: Our approach
• VeriGraph: Verification framework for large Knowledge Graphs. It detects errors
by verifying the instances in a Knowledge Graph against a set of given domains-
specific patterns (expressed in SHACL).
• Validation report shows the inconsistencies found (including a human readable
path to the error).
124
125.
3.4.2. Knowledge Cleaning:
Our approach
Verification process: Only the
necessary subset of a KG is loaded
into the memory per DS (i.e. a SHACL
shape). The constraints are checked
on the memory. No one SPARQL
query per constraint component
approach.
125
126.
3.4.2. Knowledge Cleaning: Our approach
Implemented in the semantify.it platform
126
Several improvements are ongoing (e.g. better handling of cyclic data)
https://github.com/semantifyit/VeriGraph
https://semantify.it/verigraph
127.
3.4.3 Knowledge Enrichment
• The goal of knowledge enrichment is to improve the completeness of a
Knowledge Graph by adding new statements.
• The process of Knowledge Enrichment has the following phases:
• New Knowledge Source detection
• New Knowledge Source integration (URI normalization)
• Integrate the instances:
• Duplicate detection and alignment
• Property-Value-Statements correction.
127
128.
128
Integrate the Instances
Two major issues:
1. Identifying and
resolving duplicates
1. Resolving conflicting
property value
assertions.
Identify New Sources
This process can be
automated to some
extend for Open
Knowledge Graphs.
Identifying proprietary
sources automatically
is tricky.
Integrate the
Schema
The relevant parts of
the schemas of new
sources are mapped
to schema.org.
3.4.3 Knowledge Enrichment
Process Model
129.
3.4.3 Knowledge Enrichment
• Knowledge Source detection: search for additional sources of assertions for the
Knowledge Graph
• Open sources
• Closed sources
• Knowledge Source integration
• Tbox: define mappings
• Abox: integrate new assertions into the the Knowledge Graph
• Identifying and resolving duplicates
• Invalid property statements such as domain/range violations and having multiple
values for a unique property, also known in the data quality literature as
contradicting or uncertain attribute value resolution.
129
130.
3.4.3 Knowledge Enrichment
• New Knowledge Source integration (URI normalization).
• URIs are critical for identifying instances and integrating
them in a Knowledge Graph.
• Find canonical URIs for instances in a Knowledge Graph
from the best external sources.
130
132.
3.4.3 Knowledge Enrichment
• Synthetic URI generation
• Generate a URI for every incoming instance
• Advantage: really efficient
• Disadvantage: No integration of external sources “for free”
• Bottom-up URI search
• Start from a seed source and follow the same as links
• Advantage: No pre-selected sources, potentially larger number of sources
can be discovered
• Disadvantage: Leaves non-RDF sources out.
132
Alternative Approaches to URI Normalization
133.
3.4.3 Knowledge Enrichment
• Identification by Description
• No URIs - Instances are identified with their property values
• Advantage: No effort for creating/finding URIs
• Disadvantage: Not web friendly. Very challenging to refer to individual
entities.
• Google Search
• Use certain property values as part of a Google search query and use the
search results as URIs
• Advantage: URIs are already ranked
• Disdvantage: No control over ranking algorithm. No guarantee that URIs are
cool*
133
Alternative Approaches to URI Normalization
* “Cool URIs don’t change”: https://www.w3.org/Provider/Style/URI
135.
3.4.3. Knowledge Enrichment
Methods and tools for duplicate detection and resolution:
• Silk is a framework for achieving entity linking.
• It tackles three tasks:
1. link discovery that defines similarity metrics to calculate a total similarity
value for a pair of entities.
2. evaluation of the correctness and completeness of generated links, and
3. a protocol for maintaining data that allows source dataset and target
dataset to exchange generated link sets.
135
136.
3.4.3. Knowledge Enrichment
Methods and tools for duplicate detection and resolution:
• Legato is a linking tool based on indexing techniques.
• It implements the following steps:
1. data cleaning that filters properties from two input datasets. For example, properties that
do not help the comparison.
2. Instance profiling that creates instance profiles based on Concise Bounded Description for
the source.
3. Pre-matching that applies indexing techniques (it takes TF-IDF values), filters such as
tokenization and stop-words removal, and cosine similarity to preselect the entity links.
4. Link repairing that validates each link produced by searching for a link to a target source.
136
137.
3.4.3. Knowledge Enrichment
Methods and tools for duplicate detection and resolution:
• SERIMI tries to match instances between two datasets.
• It has three steps:
• property selection, allows users to select relevant properties from source
dataset,
• the selection of candidates from a target dataset, uses string matching of
properties to select a set of candidates, and
• the disambiguation of candidates, measures the similarity for each candidate
applying a contrast model, which returns a degree of confidence.
• ADEL, Duke, Dedupe, LIMES, ...
137
138.
3.4.3. Knowledge Enrichment
Property-Value-Statements correction:
• KnoFuss allows data fusion using different methods.
• The workflow of KnoFuss is as follows:
1. It receives a dataset to be integrated into the target dataset,
2. It performs co-referencing using a similarity method, detects conflicts
utilizing ontological constraints, and resolve inconsistencies
3. It produces a dataset to be integrated into the target dataset.
138
139.
3.4.3. Knowledge Enrichment
Property-Value-Statements correction:
• ODCleanStore is a framework for cleaning, linking, quality assessment, and fusing RDF data.
• The fusion module allows users to configure conflict resolution strategies based on
provenance and quality metadata. e.g. :
1. an arbitrary value, ANY, MIN, MAX, SHORTEST or LONGEST is selected from the
conflicting values,
2. computes AVG, MEDIAN, CONCAT of conflicting values,
3. the value with the highest (BEST) aggregate quality is selected,
4. the value with the newest (LATEST) time is selected, and
5. ALL input values are preserved.
139
140.
3.4.3. Knowledge Enrichment
Property-Value-Statements correction:
• Sieve, is a framework that consists of two modules; a Quality assessment module and a Data Fusion module.
• The Data Fusion module describes various fusion policies that are applied for fusing conflicting values.
• FAG, FuSem, MumMer, …
140
142.
wd:Q243
wd:Q2873520
wd:Q142
wd:Q217925
31.03.1889
wd:Q778243
“Eiffel Tower”
“Champ de
Mars”
“France”
“Stephen Sauvestre”
“avenue Anatole-France”
An excerpt from the
Wikidata entity of Eiffel
Tower
142
3.4.3. Knowledge Enrichment
143.
Assume, we want to enrich the landmarks in our Knowledge Graph.
Integrate the Instances
Identify New Sources
Integrate the
Schema
143
3.4.3. Knowledge Enrichment
assumption: highest ranked external source
144.
Integrate the Instances
Identify New Sources
Integrate the
Schema
Schema.org Type Wikidata Type Schema.org Property* Wikidata Property
LandmarksOrHistoricalBuildings landmark address/streetAddress located on street.label
address/addressCountry country
ex:architect architect
ex:openingDate date of official opening
*Includes properties from an
extension
144
3.4.3. Knowledge Enrichment
assumption: highest ranked external source
145.
Integrate the Instances
Identify New Sources
Integrate the
Schema
Duplicates
wd:Q243
“Eiffel Tower”
ex:Eiffel_Tower
We found a duplicate instance after integrating landmark instances from Wikidata. Identification of
duplicates is typically done by applying similarity metrics to a set of property values on both instances.
145
3.4.3. Knowledge Enrichment
assumption: highest ranked external source
146.
Integrate the Instances
Identify New Sources
Integrate the
Schema
Duplicates Conflicting Property Values
ex:Eiffel_Tower s:address
“5 Avenue Anatole
France”
“Champ de Mars”
“avenue Anatole-
France”
Too many street addresses! Delete two property value assertions
not even a
street
146
3.4.3. Knowledge Enrichment
147.
3.4.3. Knowledge Enrichment: Our approach
• Enrichment Framework: Identifies duplicates in Knowledge Graphs and resolves
conflicting property values.
• Workflow:
• Input: a Knowledge Graph.
• Duplicate Detection Process: semi-automatic feature selection, data
normalization, setup (e.g. similarity metrics), run, and duplicate entities
viewer.
• Resolving Conflicting Property Values: define fusion strategies (e.g. decides
what to do based on similarity values), run, monitoring fusion process Work-
in-progress.
• Output: Report of duplicate entities found and fused.
147
148.
3.4.3. Knowledge Enrichment: Our approach
Duplicate Detection as a Service (DDaaS) is a service-oriented framework that allows
linking duplicate instances within a Knowledge Graph or among Knowledge Graphs.
• Input: Knowledge Graph(s)
• Workflow: First, the dataset(s) is indexed, second, DDaaS performs a blocking
procedure that filters obvious non-duplicates, afterwards, it compares all left
instances, finally, DDaaS computes a similarity score for all instances.
• Output: Report of instances together with a similarity value between 0 and 1.
148
149.
3.4.3. Knowledge Enrichment: Our approach
DDaaS consists of multiple services for different tasks:
• Indexing and blocking allows the indexing of datasets to perform very fast search
operations and the filtering of obvious non-matching pairs to reduce the number
of comparisons.
• Record matching calculates an aggregated score between two instances based on
the similarity of all their common attributes.
• Gold standard selects a representative subset as a training model to help improve
the configuration learning service.
• Configuration learning generates a tuned configuration that efficiently maximizes
the identification of duplicates.
149
150.
3.4.3. Knowledge Enrichment: Our approach
• DDaaS development was influenced by three other tools; Duke, LIMES, and Silk.
• We evaluate DDaaS and the three tools over two datasets; Restaurants and SPIMBENCH.
• Results of this first iteration show great potential and leave much space for possible
future improvements, such as blocking enhancements, less network overhead,
orchestration by using an Apache Kafka instance, and many more. 150
https://www.cs.utexas.edu/users/ml/riddle/
https://project-
hobbit.eu/challenges/om2020/
151.
3.4.4. Knowledge Deployment
• Building, implementing, and curating Knowledge Graphs is a time-
consuming and costly activity.
• Integrating large amounts of facts from heterogeneous information
sources does not come for free.
• [Paulheim, 2018] estimates the average cost for one fact in a
Knowledge Graph between $0,1 and $6 depending on the amount
of mechanization.
151
153.
3.4.4. Knowledge Deployment
• We build a knowledge access layer on top of the Knowledge Graph helping to
connect this resource to applications.
• Knowledge management technology:
• based on graph‐based repositories host the Knowledge Graph (as a
semantic data lake).
• The knowledge management layer is responsible for storing, managing
and providing semantic description of resources.
• Inference engines based on deductive reasoning engines:
• implements agents that defines view on this graph together with context
data on user requests.
• It accesses the graph to gain data for its reasoning that provides input to
the dialogue engine interacting with the human user.
153
154.
3.4.4. Knowledge Deployment
What are the reasons:
• Scalability issues (Trillions of triples)
• Context refinement for (support different points of view)
• introduce rich constraints (Knowledge Cleaning)
• additional knowledge derivation (Knowledge Enrichment).
• Provide a reusable application layer / middle ware on top of a knowledge graph
• access rights
• integrates additional information sources from the application
• context,
• personalization,
• task etc.
154
160.
3.4.4. Knowledge Deployment
160
Knowledge can be deployed to power various applications. Examples
are:
• semantic annotations on the web (the main development reason in
the beginning),
• data analytics, and
• conversational agents.
161.
3.4.4. Knowledge Deployment
161
{
"@context": "http://schema.org",
"@type": "Recipe",
"image": "https://img.chefkoch-
cdn.de/rezepte/1031841208350942/bilder/1324326/crop-
960x540/kaiserschmarrn-tiroler-landgasthofrezept.jpg",
"name": "Kaiserschmarrn - Tiroler Landgasthofrezept",
"recipeIngredient": [
"100 g Rosinen",
"5 EL Rum oder Cognac oder Wasser",
"6 Eigelb",
"1 Pck. Bourbon-Vanillezucker",
"1 EL, gehäuft Zucker",
"1 Prise(n) Salz",
"250 g Mehl",
"500 ml Milch",
"50 g Butter , zerlassen",
"6 Eiweiß",
"4 TL Puderzucker",
"n. B. Butter zum Braten, ca. 15 - 25 g je Pfanne"
],
….
162.
3.4.4. Knowledge Deployment
162
Data analytics based on Tyrolean Tourism Knowledge Graph
• Historical data stored in Tyrolean Tourism
Knowledge Graph in different Named
Graphs.
• Each Named Graph has provenance
information attached.
• A SPARQL query calculates average
Minimum and maximum prices across
different named graphs that were imported
from TVB Mayrhofen data sources for
accommodation (i.e. feratel).
163.
3.4.4. Knowledge Deployment:
Service-driven Dialog
163
• Scalable dialog system development
• Knowledge Graph containing semantic
web service annotations must decouple
dialog systems from web services they
consume.
• An example of decoupling knowledge
from the communication channel.
• Example: A dialog system generated
from feratel API, later extended with
ticketmaster API for events, without
re-programming the backend.
• https://dialsws.xyz
164.
3.4.4. Deployment: Service-driven Dialog
164
User
1. understand
Intent
+
Parameters
2. select web
service
schema:Action
instances
3. invoke
Result with
potential
actions
4. Natural
Language
Generation
“need a room for 2 in
mayrhofen from Friday
to Sunday”
165.
3.4.4. Deployment: Service-driven Dialog
165
User
1. understand
Intent
+
Parameters
2. select web
service
schema:Action
instances
3. invoke
Result with
potential
actions
4. Natural
Language
Generation
“need a room for 2 in
mayrhofen from Friday
to Sunday”
Classifies to an
intent and extracts
slot values
166.
3.4.4. Deployment: Service-driven Dialog
166
User
1. understand
Intent
+
Parameters
2. select web
service
schema:Action
instances
3. invoke
Result with
potential
actions
4. Natural
Language
Generation
Intent:
SearchLodgingReservation
checkinDate: 23.04.2021
checkoutDate : 25.04.2021
numAdults: 2
location: Mayrhofen
“need a room for 2 in
mayrhofen from Friday
to Sunday”
Classifies to an
intent and extracts
slot values
167.
3.4.4. Deployment: Service-driven Dialog
167
User
1. understand
Intent
+
Parameters
2. select web
service
schema:Action
instances
3. invoke
Result with
potential
actions
4. Natural
Language
Generation
Intent:
SearchLodgingReservation
checkinDate: 23.04.2021
checkoutDate : 25.04.2021
numAdults: 2
location: Mayrhofen
check constraints
defined by the relevant
action shapes based on
the values extracted
from user utterances
“need a room for 2 in
mayrhofen from Friday
to Sunday”
Classifies to an
intent and extracts
slot values
168.
3.4.4. Deployment: Service-driven Dialog
168
User
1. understand
Intent
+
Parameters
2. select web
service
schema:Action
instances
3. invoke
Result with
potential
actions
4. Natural
Language
Generation
Intent:
SearchLodgingReservation
checkinDate: 23.04.2021
checkoutDate : 25.04.2021
numAdults: 2
location: Mayrhofen
check constraints
defined by the relevant
action shapes based on
the values extracted
from user utterances
“need a room for 2 in
mayrhofen from Friday
to Sunday”
Classifies to an
intent and extracts
slot values
Suitable actions for the
intent are selected and
further input parameters
collected through dialog if
necessary
169.
3.4.4. Deployment: Service-driven Dialog
169
User
1. understand
Intent
+
Parameters
2. select web
service
schema:Action
instances
3. invoke
Result with
potential
actions
4. Natural
Language
Generation
Intent:
SearchLodgingReservation
checkinDate: 23.04.2021
checkoutDate : 25.04.2021
numAdults: 2
location: Mayrhofen
check constraints
defined by the relevant
action shapes based on
the values extracted
from user utterances
“need a room for 2 in
mayrhofen from Friday
to Sunday”
Classifies to an
intent and extracts
slot values
Suitable actions for the
intent are selected and
further input parameters
collected through dialog if
necessary
Selected action is invoked
with a schema.org action
annotation that contains
the provided values
{ “@type”: “SearchAction”
“object”: “LodgingReservation”,
“checkInDate”: “23.04.2021”
“checkOutDate”: “25.04.2021”
…
170.
3.4.4. Deployment: Service-driven Dialog
170
User
1. understand
Intent
+
Parameters
2. select web
service
schema:Action
instances
3. invoke
Result with
potential
actions
4. Natural
Language
Generation
Intent:
SearchLodgingReservation
checkinDate: 23.04.2021
checkoutDate : 25.04.2021
numAdults: 2
location: Mayrhofen
check constraints
defined by the relevant
action shapes based on
the values extracted
from user utterances
schema:LodgingReservation
instances with various offers
are returned with potential
actions (e.g. BuyAction)
“need a room for 2 in
mayrhofen from Friday
to Sunday”
Classifies to an
intent and extracts
slot values
Suitable actions for the
intent are selected and
further input parameters
collected through dialog if
necessary
Selected action is invoked
with a schema.org action
annotation that contains
the provided values
{ “@type”: “SearchAction”
“object”: “LodgingReservation”,
“checkInDate”: “23.04.2021”
“checkOutDate”: “25.04.2021”
…
171.
3.4.4. Deployment: Service-driven Dialog
171
User
1. understand
Intent
+
Parameters
2. select web
service
schema:Action
instances
3. invoke
Result with
potential
actions
4. Natural
Language
Generation
Intent:
SearchLodgingReservation
checkinDate: 23.04.2021
checkoutDate : 25.04.2021
numAdults: 2
location: Mayrhofen
check constraints
defined by the relevant
action shapes based on
the values extracted
from user utterances
schema:LodgingReservation
instances with various offers
are returned with potential
actions (e.g. BuyAction)
“need a room for 2 in
mayrhofen from Friday
to Sunday”
Classifies to an
intent and extracts
slot values
Suitable actions for the
intent are selected and
further input parameters
collected through dialog if
necessary
Selected action is invoked
with a schema.org action
annotation that contains
the provided values
{ “@type”: “SearchAction”
“object”: “LodgingReservation”,
“checkInDate”: “23.04.2021”
“checkOutDate”: “25.04.2021”
…
The results are turned into NL statements. Potential
actions are presented to the user to continue with the
conversation: “Doubleroom for 220 EUR. Do you want
to buy this offer?”
172.
4. The Proof Of The Pudding Is In The Eating
• Open Touristic Knowledge Graphs
• Tyrolean Tourism Knowledge Graph: Integrates heterogenous data from 10+
Destination Management Organizations (DMO) - 12B+ triples. University
prototype and show case.
• German Tourism Knowledge Graph - Integrates heterogenous data from the
Regional Marketing Organizations (LMO*) in Germany. Real world application
• Dialog systems
• Conversational agents that help users to achieve their goals with the help of
Knowledge Graphs:
• Knowledge Graphs as a source of domain knowledge (about static, dynamic and active
data)
• Knowledge Graphs for training Natural Language Understanding models
172
* Landesmarketing Organization
173.
4.1 Open Touristic Knowledge Graph
What is an Open Knowledge Graph?
In essence: a graph database that complies with the
5* Open Data Principles:
173
https://5stardata.info
open license
machine
readable
open
format
complies
with rdf
linking to other data sets
174.
4.1 Open Touristic Knowledge Graph
Prominent examples of Open Knowledge Graphs?
The Linked Open Data Cloud:
Contains (as of May 2020):
● 1.255 data sets
● 16,174 links
174
https://lod-cloud.net/
175.
4.1 Open Touristic Knowledge Graph
Linked Open Data (LOD)
Use LOD to integrate and lookup data about
● hiking trails
● places and routes
● points-of-interest
● ski slopes
● time-tables for public transport
175
175
hiking trails points-of-interest ski slopes Time-tables for public
transport
176.
4.1.1 Tyrolean Tourism Knowledge Graph
We build the Tyrol Knowledge Graph (TKG) as a nucleus for this initiative
• It is a five star linked open data set published in GraphDB providing a SPARQL endpoint for the
provisioning of touristic data of Tyrol, Austria.
• The TKG currently contains data about touristic infrastructure like accommodation businesses,
restaurants, points of interests, events, recipes, et.
• These data of the TKG fall under three categories of data:
• Static data is information which is rarely changing like the address of a hotel.
• Dynamic data is fast changing information, like availabilities and prices.
• Active data describe actions that can be executed.
• Currently the TKG contains over 12B+ statements, 55% are explicit and 45% are inferred.
• https://tirol.kg/ 176
177.
4.1.1 Tyrolean Tourism Knowledge Graph
• The Tyrol Knowledge Graph (TKG) integrates and connects data from 11 DMOs and several sources
including:
• touristic data sources:
• open data sources:
• It includes entities of the following (schema.org) types:
• LocalBusiness
• POIs, Infrastructure
• SportsActivityLocations (e.g. Trails, SkiResorts)
• Events
• Offers
• WebCams
• Mobility and Transport
177
179.
4.1.1 Tyrolean Tourism Knowledge Graph
The Tyrol Knowledge Graph
is used to answer questions
such as:
• “Where can I have
traditional Tyrolean food
when going cross country
skiing?”
• “Show me WebCams near
Kölner Haus”
• “How many people are living
in Serfaus?”
179
180.
4.1.2 German Tourism Knowledge Graph
• The German Tourism Knowledge Graph will integrate semantically
annotated tourism-related data from 16 LMOs/Magic Cities.
• Contracted by the Deutsche Zentrale für Tourismus (DZT), planned
to be finished in 2022.
• An ecosystem for Knowledge Creation, Hosting and Deployment is
being built.
• Read more at:
https://open-data-germany.org/open-data-germany/
180
181.
4.1.2 German Tourism Knowledge Graph
• Knowledge Creation
• Scalable Import API based on schema.org and domain-specific patterns
• Provenance tracking
• Machine-readable licenses for the data at different levels (e.g. organization and instance)
• Knowledge Hosting
• GraphDB with reasoning support
• Interfaces for exporting subgraphs of the Knowledge Graph (e.g. per LMO)
• Knowledge Curation
• Assessment: on-demand and periodic
• Cleaning: Integrity constraint-based detection and semi-automated correction
• Enrichment: Duplicate detection and linking of external sources
• Knowledge Deployment
• SPARQL endpoint with GeoSPARQL support
• Various visualization options: graph, tabular and map
• Access via a RESTful API
181
182.
4.1.2 German Tourism Knowledge Graph
182
https://open-data-germany.org/knowledge-graph-kuenstliche-intelligenz/
The Knowledge Graph will foster the development of intelligent applications for German tourism
183.
4.1.2 German Tourism Knowledge Graph
• Compliant with the standards developed by The Open Data
Tourism Alliance (ODTA).
• The consortium consists of
183
184.
4.1.2 German Tourism Knowledge Graph
The Open Data Tourism Alliance (ODTA)
• develops a de-facto standard for semantic annotation of touristic content, data, and services in
the D-A-CH area and Italy (the group was formerly known as DACH-KG)
• based on schema.org and its adaptation by Domain Specifications
• it should become the backbone of an 5* Open Data Knowledge Graph for touristic data in D-A-CH
and Italy
*) The dataset gets awarded one star if the data are provided under an open license.
**) Two stars, if the data are available as structured data.
***) Three stars, if the data are also available in a non-proprietary format.
****) Four stars if URIs are used, that the data can be referenced and
*****) five stars, if the data set are linked to other data sets that can provide context.
https://www.tourismuszukunft.de/2019/05/dach-kg-neue-ergebnisse-naechste-schritte-beim-thema-open-data/
184
185.
4.1.2 German Tourism Knowledge Graph
Members of ODTA
• Touristic experts from the DACH-region (Germany (D), Austria (A), Switzerland (CH)) and Italy
(South-Tyrol)
• the Austrian and German touristic associations,
• LTOs (Tirol, Vorarlberg, Wien, Brandenburg, Thüringen, …)
• Associated: DMOs (Mayrhofen, Seefeld, …)
• STI Innsbruck and STI International
• Planned is an extension by technology providers
(Datacycle, Feratel, Hubermedia, infomax, LandinSicht, Onlim, Outdooractive, TSO, ...)
185
186.
4.2. Virtual Agents
• The global conversational AI market is expected to grow from USD 4.8 billion in 2020 to USD
13.9 billion by 2025, at a Compound Annual Growth Rate (CAGR) of 21.9% during the
forecast period.
• Google Trends shows, that interest in chatbots has increased almost 5 times from 2015-
2020. In fact, 1.4 billion people use messaging apps and are willing to talk to chatbots.
• For businesses, chatbots are the brand communication with the largest growth - with
chatbot usage seeing a 92% use increase since 2019.
• 67% of worldwide consumers interacted with a chatbot to get customer support between
2019-2020. Their willingness to engage with chatbots in a variety of ways, with usage for
purchases, meeting scheduling and mail list sign-ups was more than doubling from 2019 to
2020.
186
https://acquire.io/blog/chatbots-trends/
https://www.drift.com/blog/state-of-conversational-marketing/
https://www.invespcro.com/blog/chatbots-customer-service/
https://www.marketsandmarkets.com/Market-Reports/conversational-ai-market-49043506.html
https://research.aimultiple.com/chatbot-stats/
187.
• Voice has become more users' first choice for mobile search. 65% of consumers ages 25-49 years
old talk to their voice-enabled devices daily.
• Artificial intelligence-based voice assistance (AI-voice) are becoming a primary user interface for
all digital devices – including smartphones, smart speakers, personal computers, automobiles,
and home appliances.
• In 2020, more than 1 billion devices worldwide were equipped with Google’s AI-voice Assistant,
and more than hundred million Amazon Alexa devices have been sold up to then – and neither
number accounts for devices equipped with voice assistants from Apple, Microsoft, Samsung, or
across the digital worlds of China and Asia.
• Around 4.2 billion digital voice assistants are currently being used in devices around the world. By
2024, the number of digital voice assistants will reach 8.4 billion units – a number higher than the
world’s population.
4.2. Virtual Agents
187
https://blog.google/products/assistant/ces-2020-google-assistant/
https://www.pwc.com/us/en/services/consulting/library/consumer-intelligence-series/voice-assistants.html
https://searchengineland.com/voice-gaining-on-mobile-browser-as-top-choice-for-smartphone-based-search-313433
https://www.statista.com/statistics/973815/worldwide-digital-voice-assistant-in-use/
https://www.theverge.com/2019/1/4/18168565/amazon-alexa-devices-how-many-sold-number-100-million-dave-limp
188.
4.2. Virtual Agents
Onlim
• A pioneer in automating customer
communication via Conversational AI
• Enterprise solutions for making data and
knowledge available for conversational
interfaces.
• Team of 35+ highly experienced AI
experts, specialists in semantics and
data science; experienced management
team.
• Academic Spin-off of University of
Innsbruck.
• HQ in Austria.
188
Tourism
Retail
Finance
Education
Utilities
189.
4.2. Virtual Agents
Onlim
189
Knowledge Graph – Data platform Conversational & Analytics platform Multi-channel platform
Integrates chatbots with data, delivers
insights, and optimizes conversations &
the knowledge graph.
Connects the chatbot to website chat, voice
assistants, messengers, phone systems and
external APIs.
Integrates enterprise data and services,
interlinks them and applies semantics for
machine understanding of these data.
190.
4.2. Virtual Agents
190
Starting with tourism we have entered
various other customers and sectors:
Tourism
Retail
Finance
Education
Utilities
191.
4.2. Virtual Agents
Our aim:
• Establish a maximally automated knowledge lifecycle: NLU training, Query generation,
Querying and representing world knowledge, as well as Natural Language Generation.
• Automatically distribute knowledge into all available channels.
• Core are methodologies, methods, and tools to generate, host, curate, deploy, and access
Knowledge Graphs containing frillions of statements from heterogeneous, distributed, and
dynamic sources. 191
192.
5. Conclusions: What are Knowledge Graphs?
• A big mess of frillions of facts describing the world from multiple point of
views.
• A Rosseta stone allowing humans and machines to exchange meaning.
• Merging smart and big data into one new paradigm of explainable Artificial
Intelligence and explicit representation of large-scale knowledge (not data).
• Capturing large fraction of human knowledge explicitly turning it into a brain
for/of human kind.
• A thrilling area of research and engineering with large impact and deep issues
to be resolved which you may want to join.
192
193.
5. Conclusions: Why KGs are important?
• Knowledge Graphs are enabling technology for:
• Virtual agents (Information search, eMarketing, and eCommerce)
• Cyperphysical Systems (Internet of the Thing, Smart Meters, etc.)
• Physical Agents (drones, cars, satellites, androids, etc.)
• Because knowledge is power and without you look like a foolish.
• Statistical Analysis or matrix multiplication of large data volumes can
bring you a long way but lack the integration of world knowledge and the
understandable interaction (explainable AI) with humans.
193
194.
5. Conclusions
• Turning them into a useful resource for problem-solving is not a
trivial task as there are various challenges:
• Size / Scalability
• Heterogenity
• Active and Dynamic data
• Quality: Correctness and Completeness
194
195.
5. Conclusions
• We developed a methodology for developing and maintaining
(large) KGs:
• Task and process model
• A work bench (semantify)
• A large number of commercial applications (Onlim)
• Large and open knowledge graphs
• Knowledge Graph enabled chat bot solutions
195
197.
5. Conclusions: Workbench
Semantify.it
A platform for
• creation
• evaluation and
• deployment
of web annotations and
knowledge graphs.
https://semantify.it
Research prototype
proceed with caution!
197
198.
5. Conclusions: Applications
198
• Knowledge graphs become the future storage for digital assets for companies.
• Knowledge will be accessed and unlocked literally by asking questions in
text/voice.
• Access to knowledge will be through multi channels: Chatbots, Voice
Assistants, Telephone Assistants, Search Widgets, Augmented Reality, Virtual
Reality, etc.
• Applications will reach all verticals: utility, retail, tourism, banking, etc.