Talking knowledge-graphs

STI Innsbruck
STI InnsbruckSTI Innsbruck
Talking Knowledge Graphs
Dieter Fensel with the help of the entire MindLab team
STI Innsbruck, University of Innsbruck, Austria
May 17, 2019
Prerequisite
MindLab:
• MindLab is a self-funded cooperative research project with the
objective to develop methods and software tools for modeling and
implementing scalability for knowledge graphs.
• Partners
2
Talking Knowledge Graphs
1. Motivation
2. The Grand Challenges
3. The Crux Of The Matter
4. The Proof Of The Pudding Is In The Eating
5. Key Takeaway
3
1. Motivation
• Text/Voice becomes mainstream
• Use cases are still basic
• Knowledge is Power!Without knowledge
-> no understanding of users needs and goals
Please, book a table
in a restaurant with
roast pork having
reasonable prices in
Mayrhofen for
tonight
Restaurant in
Mayrhofen?
Has roast pork?
price?
Image: ©amazon.com
Sorry, I
don’t know
how to help
you!
4
1. Motivation
Please, book a table in
a restaurant with
roast pork having
reasonable prices in
Mayrhofen for tonight
Image: ©amazon.com
KG
action:
TableReservation
type: Restaurant,
offers: Roast Pork
Location: Mayrhofen
Price: price_level
generated query:
?- tableReservationAction(),
type(Restaurant),
offers(RoastPork).
Predefined rules:
● tableReservationAction:
book a table in a given
Restaurant
● type: return all elements
of type <type>
● offers: return all
elements that offer
<offer>
● ...
Query
Generation
NLG
Extracted
Knowledge
Generated
Language
output
Knowledge Graph
contains deep,
accurate, and up-to-
date knowledge
about leasurement
services in Tyrol.
5
2. The Grand Challenges
User
1. understand
Intent
+
Parameters
2. map Query
3. query
Knowledge
Graph
4. Natural
Language
Generation
6
2. The Grand Challenges: Unterstand
NLU
• Voice/Text recognition already quite good
• However require significant manual labor
Manual work
• Design intents based on schema of Knowledge Graph
• Define utterances (example questions) per intent
• Mark parameters that should be extracted from utterances
Automation
• Entity detection: Push entities from Knowledge Graph
• Detect unanswered questions
• Use Knowledge Graph to update/extend NLU:
• create utterances
• supervised-learning: extend utterances with unanswered questions
User
understand
Intent
+
Parameters
map Query
query
Knowledge
1.
2.
3.
NLG1
4.
NLU Knowledge
7
2. The Grand Challenges: Query Generation
• Basis: detected intent & extracted parameters during NLU
• Map extracted information (intent & parameters) on predefined rules
• Query: Combination of rules on SPARQL queries
• Additional restriction rules
• Define a view on a relevant subgraph of the Knowledge Graphs
 A Chatbots may not have access to the whole Knowledge Graph
(prevent frillions, inconsistencies, and implements access right restrictions)
User
understand
Intent
+
Parameters
map Query
query
Knowledge
1.
2.
3.
NLG1
4.
Generated
query
Intent
(with
parameters)
Query
generation
Predefined rules
8
2. The Grand Challenges
Querying the Knowledge Graph
• Query is a combination of predefined rules accessing the knowledge through
SPARQL
• Knowledge Graph must provide:
• Large volumes of data
• Integration from heterogeneous resources
• Accessing distributed sources
• Providing dynamic updates (temperature, etc.)
• Defining sub graphs
• Curated in regard to inconsistencies and incompleteness
User
understand
Intent
+
Parameters
map Query
query
Knowledge
1.
2.
3.
NLG1
4.
9
2. The Grand Challenges
Natural Language Generation
Manual work
• Define templates based on
• structure of data
• information that should be given to the user
Automatic
• Generate
• templates out of the Knowledge Graph
• textual answers from the Knowledge Graph
• follow up questions to run dialogs
User
understand
Intent
+
Parameters
map Query
query
Knowledge
1.
2.
3.
NLG1
4.
10
3. The Crux Of The Matter
• The quality of the Intelligent Assistants depends directly on the quality of the
Knowledge Graph
• Problem: “Garbage in Garbage out”
• Requirements for the Knowledge Graph:
• well structured (using an ontology - schema.org)
• accurate information (correctness)
• large and detailed coverage (completeness)
• Timeliness of knowledge
==> Knowledge Graph Lifecycle
11
Knowledge Creation
Knowledge Hosting
Knowledge Cleaning
Knowledge
Enrichment
Knowledge Curation
Knowledge
Deployment
Knowledge
Assesment
3. The Crux Of The Matter: Process Model
12
3. The Crux Of The Matter: KG Task Model
Knowledge Graph Maintenance
Knowledge
Hosting
Knowledge
Curation
Knowledge
Deployment
Knowledge
Assesment
Knowledge
Cleaning
Knowledge
Enrichement
Error Detection Error Correction
Evaluation Correctness Completeness
Knowledge Source
detection
Knowledge Source
integration
Duplicate
detection
Property-Value-
Statements correction
Knowledge Creation
Edit Semi-automatic AutomaticMapping
13
3. The Crux Of The Matter: KG Task Model
Knowledge Graph Maintenance
Knowledge
Hosting
Knowledge
Curation
Knowledge
Deployment
Knowledge
Assesment
Knowledge
Cleaning
Knowledge
Enrichement
Error Detection Error Correction
Evaluation Correctness Completeness
Knowledge Source
detection
Knowledge Source
integration
Duplicate
detection
Property-Value-
Statements correction
Knowledge Creation
Edit Semi-automatic AutomaticMapping
13
MindLab Status Year 1
3. The Crux Of The Matter: KG Task Model
Knowledge Graph Maintenance
Knowledge
Hosting
Knowledge
Curation
Knowledge
Deployment
Knowledge
Assesment
Knowledge
Cleaning
Knowledge
Enrichement
Error Detection Error Correction
Evaluation Correctness Completeness
Knowledge Source
detection
Knowledge Source
integration
Duplicate
detection
Property-Value-
Statements correction
Knowledge Creation
Edit Semi-automatic AutomaticMapping
13
MindLab Status Year 1
3. The Crux Of The Matter: KG Task Model
Knowledge Graph Maintenance
Knowledge
Hosting
Knowledge
Curation
Knowledge
Deployment
Knowledge
Assesment
Knowledge
Cleaning
Knowledge
Enrichement
Error Detection Error Correction
Evaluation Correctness Completeness
Knowledge Source
detection
Knowledge Source
integration
Duplicate
detection
Property-Value-
Statements correction
Knowledge Creation
Edit Semi-automatic AutomaticMapping
13
MindLab Status Year 1
3. The Crux Of The Matter: KG Task Model
Knowledge Graph Maintenance
Knowledge
Hosting
Knowledge
Curation
Knowledge
Deployment
Knowledge
Assesment
Knowledge
Cleaning
Knowledge
Enrichement
Error Detection Error Correction
Evaluation Correctness Completeness
Knowledge Source
detection
Knowledge Source
integration
Duplicate
detection
Property-Value-
Statements correction
Knowledge Creation
Edit Semi-automatic AutomaticMapping
13
MindLab Status Year 1
3. The Crux Of The Matter: KG Task Model
Knowledge Graph Maintenance
Knowledge
Hosting
Knowledge
Curation
Knowledge
Deployment
Knowledge
Assesment
Knowledge
Cleaning
Knowledge
Enrichement
Error Detection Error Correction
Evaluation Correctness Completeness
Knowledge Source
detection
Knowledge Source
integration
Duplicate
detection
Property-Value-
Statements correction
Knowledge Creation
Edit Semi-automatic AutomaticMapping
13
MindLab Status Year 1
3. The Crux Of The Matter: KG Task Model
Knowledge Graph Maintenance
Knowledge
Hosting
Knowledge
Curation
Knowledge
Deployment
Knowledge
Assesment
Knowledge
Cleaning
Knowledge
Enrichement
Error Detection Error Correction
Evaluation Correctness Completeness
Knowledge Source
detection
Knowledge Source
integration
Duplicate
detection
Property-Value-
Statements correction
Knowledge Creation
Edit Semi-automatic AutomaticMapping
13
MindLab Status Year 1
3. The Crux Of The Matter: KG Task Model
Knowledge Graph Maintenance
Knowledge
Hosting
Knowledge
Curation
Knowledge
Deployment
Knowledge
Assesment
Knowledge
Cleaning
Knowledge
Enrichement
Error Detection Error Correction
Evaluation Correctness Completeness
Knowledge Source
detection
Knowledge Source
integration
Duplicate
detection
Property-Value-
Statements correction
Knowledge Creation
Edit Semi-automatic AutomaticMapping
13
MindLab Status Year 1
3. The Crux Of The Matter: KG Task Model
Knowledge Graph Maintenance
Knowledge
Hosting
Knowledge
Curation
Knowledge
Deployment
Knowledge
Assesment
Knowledge
Cleaning
Knowledge
Enrichement
Error Detection Error Correction
Evaluation Correctness Completeness
Knowledge Source
detection
Knowledge Source
integration
Duplicate
detection
Property-Value-
Statements correction
Knowledge Creation
Edit Semi-automatic AutomaticMapping
13
MindLab Status Year 1
3. The Crux Of The Matter: KG Task Model
Knowledge Graph Maintenance
Knowledge
Hosting
Knowledge
Curation
Knowledge
Deployment
Knowledge
Assesment
Knowledge
Cleaning
Knowledge
Enrichement
Error Detection Error Correction
Evaluation Correctness Completeness
Knowledge Source
detection
Knowledge Source
integration
Duplicate
detection
Property-Value-
Statements correction
Knowledge Creation
Edit Semi-automatic AutomaticMapping
13
MindLab Status Year 1
3. The Crux Of The Matter: KG Task Model
Knowledge Graph Maintenance
Knowledge
Hosting
Knowledge
Curation
Knowledge
Deployment
Knowledge
Assesment
Knowledge
Cleaning
Knowledge
Enrichement
Error Detection Error Correction
Evaluation Correctness Completeness
Knowledge Source
detection
Knowledge Source
integration
Duplicate
detection
Property-Value-
Statements correction
Knowledge Creation
Edit Semi-automatic AutomaticMapping
13
MindLab Status Year 1
3. The Crux Of The Matter: KG Task Model
Knowledge Graph Maintenance
Knowledge
Hosting
Knowledge
Curation
Knowledge
Deployment
Knowledge
Assesment
Knowledge
Cleaning
Knowledge
Enrichement
Error Detection Error Correction
Evaluation Correctness Completeness
Knowledge Source
detection
Knowledge Source
integration
Duplicate
detection
Property-Value-
Statements correction
Knowledge Creation
Edit Semi-automatic AutomaticMapping
13
MindLab Status Year 2 (our dreams)
3. The Crux Of The Matter
Knowledge Generation
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
Edit Semi-automatic AutomaticMapping
14
3. The Crux Of The Matter
Knowledge Generation
• https://www.schema.org/
• Started in 2011 by Bing, Google,Yahoo!, andYandex to annotate websites.
• Has become de facto standard.
• We use it for the web site channel as well as for all other channels as an
reference model for our semantic annotations.
• However, we use value restriction not as inference mechanism but as integrity
constraint.
• We define domain specific extensions (that also restrict the genericity of entire
schema.org).
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
Edit Semi-automatic AutomaticMapping
15
3. The Crux Of The Matter
Knowledge Generation
• The use of semantic annotations has experienced a tremendous surge in activity since the
introduction of schema.org.
• Schema.org was introduced with 297 classes and 187 relations,
• which over have grown to 598 types, 862 properties, and 114 enumeration values.
• The provided corpus of
• types (e.g. LocalBusiness, SkiResort, Restaurant),
• properties (e.g. name, description, address),
• range restrictions (e.g. Text, URL, PostalAddress),
• and enumeration values (e.g. DayOfWeek, EventStatusType, ItemAvailability)
covers large numbers of different domains, including the tourism domain.
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
Edit Semi-automatic AutomaticMapping
16
3. The Crux Of The Matter
Knowledge Generation
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
Edit Semi-automatic AutomaticMapping
17
3. The Crux Of The Matter
Knowledge Generation
• Domain Specifications:
• restrict generality and
• extend domain-specifity
of schema.org
• Are based on Shacl
• https://schema-tourism.sti2.org/
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
Edit Semi-automatic AutomaticMapping
Schema.org
Domain
Domain Specification
18
3. The Crux Of The Matter
Knowledge Generation
Our Methodology:
• the bottom-up part,
which describes the steps of
the initial annotation process;
• the domain specification
modeling; and
• the top-down part, which
applies the constructed
models.
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
Edit Semi-automatic AutomaticMapping
19
3. The Crux Of The Matter
Knowledge Generation
Manual Annotation Editor
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
Edit Semi-automatic AutomaticMapping
20
3. The Crux Of The Matter
Knowledge Generation
• Semi-automatic
• Annotation Editor suggests mappings/extracted information
• e.g. extract information from web pages (by HTML tags).
• Use partial NLU to find similarities of the content and schema.org vocabulary.
• Manual adaptions needed to define and to evaluate.
• Instance of the general issues of wrapper generation.
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
Edit Semi-automatic AutomaticMapping
21
3. The Crux Of The Matter
Knowledge Generation
• Mapping (more than 95% of the story)
• integrate large and fast changing data sets
• map different formats to the ontology used in our Knowledge Graph
• Various frameworks: XLWrap, Mapping Master (M2), a generic XMLtoRDF tool providing a
mapping document (XML document) that has a link between an XML Schema and an OWL
ontology, Tripliser, GRDDL, R2RML, RML, ...
• We developed a customization of RML, called RocketRML.
• The semantify.it platform features a wrapper API where these
mappings can be stored and applied to corresponding data
sources.
• The wrapper translates the data according to the mappings and
stores it as JSON-LD in a MongoDB.
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
Edit Semi-automatic AutomaticMapping
22
3. The Crux Of The Matter
Knowledge Generation
Automatic extraction of knowledge from text representations and web
pages
• Tasks
• named entity recognition,
• concept mining, text mining,
• relation detection, …
• Methods
• Information Extraction
• Natural Language Processing (NLP)
• Machine Learning (ML)
• Systems:
• GATE (text analysis & language processing)
• OpenNLP (supports most common NLP tasks)
• RapidMine (data preparation, machine learning, deep learning, text mining, predictive analysis)
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
Edit Semi-automatic AutomaticMapping
23
3. The Crux Of The Matter
Knowledge Generation
Evaluation of semantic annotations:
• The semantify.it validator is a web-tool that offers the possibility to
validate schema.org annotations that are scrapped from websites.
• Verification: The annotations are checked against plain schema.org
and against domain specifications
• Validation : The annotations are checked whether they accurately
describe of the content of the web site.
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
Edit Semi-automatic AutomaticMapping
23
3. The Crux Of The Matter
Knowledge Generation
Evaluation of semantic annotations:
• Notice we take the content of the web site as Golden Standard.
• We do NOT evaluate the accuracy of that content in regard to the
„real“ world.
• We check whether a phone number confirms to the
formal constraints.
• We do not make robocalls to hotels
to check whether the „right“ hotel pick up the phone.
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
Edit Semi-automatic AutomaticMapping
23
3. The Crux Of The Matter
Knowledge Generation
Evaluation
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
Edit Semi-automatic AutomaticMapping
23
3. The Crux Of The Matter
Knowledge Hosting
Semantify.it1):
A platform for creating, hosting, validating, verifying, and publishing
schema.org annotated data
• annotation of static data based on schema.org templates
 Domain Specifications2)
• annotation of dynamic data based on
RML mappings RocketRML3)
1) https://semantify.it
2) http://ds.sti2.org
3) https://github.com/semantifyit/RocketRML 24
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
Edit Semi-automatic AutomaticMapping
3. The Crux Of The Matter
Knowledge Hosting Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
Annotation - Tool
(e.g. semantify.it)
Document store
(e.g. MongoDB)
Graph database
(e.g. GraphDB)
Hosting ...
Semabtic Web
Annotations
25
Knowledge Graphs
3. The Crux Of The Matter
Knowledge Hosting
• Semantically annotated date can be serialized to JSON-LD
• storage in document store MongoDB
• native JSON storage
• well integrated in current state of the art software with NodeJS
• performant search, through indexing
• not hardware intensive
no native RDF querying with SPARQL
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
26
3. The Crux Of The Matter
Knowledge Hosting
• Native storage of semantically annotated data
• RDF store: GraphDB
• very powerful CRUD operations
• named graphs for versioning
• full implementation of SPARQL
• powerful reasoning over big data sets
no web frameworks available
• very hardware intensive
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
27
3. The Crux Of The Matter
Knowledge Curation
• We defined a simple KR formalism formalizing
essentials of schema.org
• Tbox: isA statements of types, domain and range definitions for properties
(using them globally or locally)
• Abox: isElementOf(I,t) statements, Property-Value Statements p(i1,i2), and
sameAs(i1,i2) statements
• Enables a formal definition of the knowledge curation task (assessment,
cleaning, and enrichment).
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement
Evaluation Correctness Completeness
28
3. The Crux Of The Matter
Knowledge Assesment
• Knowledge Assessment describes and defines the process
of assessing the quality of a Knowledge Graph.
• The goal is to measure the usefulness of a Knowledge Graph.
• Evaluation
• Overall process to determine the quality of a
Knowledge Graph.
• Select quality dimensions and metrics (see literature on data quality).
• Evaluate representative subsets accordingly.
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement
Evaluation Correctness Completeness
29
3. The Crux Of The Matter
Knowledge Assesment
• Correctness
• Identify the amount of wrong assertions
• Completeness
• Identify missing assertion sets
• Furthers
accessibility, accuracy, appropriate amount, believability, completeness, concise
representation, consistent representation, cost-effectiveness, easy of
manipulating, easy of operation, easy of understanding, flexibility, free-of-error,
interpretability, objectivity, relevancy, reputation, security, timeliness, traceability,
understandability, value-added, and variety
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement
Evaluation Correctness Completeness
30
3. The Crux Of The Matter
Knowledge Assesment
[Paulheim et al., 2019] identify the following subtasks:
• specifying datasets and Knowledge Graphs,
• specifying the evaluation protocol,
• specifying the evaluation metrics,
• specifying the task for task-specific evaluation,
• and defining the setting in terms of intristic vs. task-baed, and automatic versus human-
centric evaluation,
• as well as the need to keep the results reproducible.
H. Paulheim, M. Sabon, M. Choches, and W. Beck: Evaluation of Knowledge Graphs. In P. A. Bonatti, S. Decker, A. Polleres, and V. Presutti:
Knowledge Graphs: New Directions for Knowledge Representation on the Semantic Web, Dagstuhl Reports, 8(9):29-111, 2019.
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement
Evaluation Correctness Completeness
31
3. The Crux Of The Matter
Knowledge Assesment
Methodologies
• Total Data Quality Management (TDQM) [Wang, 1998] and Data Quality Assessment [Pipino et
al., 2002] allow identifying important quality dimension and their requirements from various
perspectives.
• Other methodologies already defined quality metrics that allow a semi-automatic assessment
based on data integrity constraints. Those are for example User-driven assessment [Zaveri et al.,
2013], Test-driven assessment [Kontokostas et al., 2014] and a manual assessment based on
crowd's experts (Crowdsourcing-driven assessment [Acosta et al., 2013]).
• Besides that, there are quality assessment approaches which use statistical distribution for
measuring the correctness of statements [Paulheim & Bizer, 2014], SPARQL queries for the
identification of functional dependency violations and missing values [Fürber & Hepp, 2010a]
[Fürber & Hepp, 2010b].
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement
Evaluation Correctness Completeness
32
3. The Crux Of The Matter
Knowledge Assesment
Tools and Methods:
• LINK-QA
• using network metrics
• Luzzu (Linked Open Datasets)
• thirty data quality metrics based on Dataset Quality Ontology.
• Sieve
• flexibly expressing quality assessment methods
• fusion methods
• SWIQA (Semantic Web Information Quality Assessment Framework)
• data quality rules & quality scores for identifying wrong data
• Validata
• online tool for testing/validating RDF data against ShEx-schemas
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement
Evaluation Correctness Completeness
33
3. The Crux Of The Matter
Knowledge Assesment
Sleve:
• Sieve for Data Quality Assessment [Mendes et al., 2012] is a framework which
consist of two modules:
• a Quality Assessment module and
• a Data Fusion module
• The Quality Assessment Module involves four steps:
1. Data Quality Indicator allows to define an aspect of a data set that may demonstrate the suitability of it for
intended use. For example, meta-information about the creation of a data set, information about the
provider, or ratings provided by the consumers.
2. Scoring Functions define the assessment of the quality indicator based on its quality dimension. Scoring
functions range from simple comparisons, over set functions, aggregation functions, to more complex
statistical functions, text-analysis, or network analysis methods.
3. Assessment Metric calculates the assessment score based on indicators and scoring functions.
4. Aggregate Metric allows users to aggregate new metrics that can generate new assessment values.
• http://sieve.wbsg.de/
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement
Evaluation Correctness Completeness
34
3. The Crux Of The Matter
Knowledge Cleaning
• The goal of knowledge cleaning is to improve the correctness of a knowledge
graph
• Major objectives
• error detection and
• error correction of
● wrong instance assertions
● wrong property value assertions
● wrong equality assertions
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement
Error Detection Error Correction
35
3. The Crux Of The Matter
Knowledge Cleaning
Tbox Abox
Knowledge
Curation
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement
Error Detection Error Correction
36
3. The Crux Of The Matter
Knowledge Cleaning
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement
Error Detection Error Correction
What Verification Validation
Semantic
Annotations
check schema
conformance and
integrity constraints
compare with web
resource
Knowledge Graphs check schema
conformance and
integrity constraints
compare with "real"
world
37
3. The Crux Of The Matter
Knowledge Cleaning
Error correction of wrong instance assertions isElementOf (i1,i2):
• i is not a proper instance identifier:
Delete assertion or correct i
• t is not an existing type name:
Delete assertion or correct t
• The instance assertion is (semantically) wrong:
• Delete assertion or find proper t
• and do NOT: find a proper i (would neither scale nor making sense)
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement
Error Detection Error Correction
38
3. The Crux Of The Matter
Knowledge Cleaning
Error correction of wrong property value assertions: p(i1,i2):
• p is not a proper property name: Delete assertion or correct p
• i1 is not a proper instance identifier: Delete assertion or correct i1
• i1 is not in any domain of p: Delete assertion or add assertion
isElementOf(i1,t) with t is a domain of p.
• i2 is not a proper instance identifier: Delete assertion or correct i2
• i2 is not in the range of p for any domain of i1:
• Delete assertion or
• add a proper isElementOf assertion for i1 that adds a domain for which i2 is an instance of the range of the property
or
• add a proper isElementOf assertion for i2 that turns it into an instance of a range of the property applied to a domain
of p where i1 is an element.
• The property assertion is (semantically) wrong: delete assertion or correct it. In this case, you
should most likely define proper i2, or search for better p, or search for better i1.
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement
Error Detection Error Correction
39
3. The Crux Of The Matter
Knowledge Cleaning
Error correction of wrong equality assertions: isSameAs(i1,i2):
• i1 is not a proper instance identifier: Delete assertion or correct i1
• i2 is not a proper instance identifier: Delete assertion or correct i2
• The identity assertion is (semantically) wrong: Delete assertion or
replace it by a skos operator1.
1 which however does not come with operational semantics.
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement
Error Detection Error Correction
40
3. The Crux Of The Matter
Knowledge Cleaning
Methods &Tools:
• HoloClean
● Use of integrity constraints,
● external data,
● quantitative statistics.
● Steps
• separate entry datasets into noisy and clean dataset
• assign uncertainty score over the value of noisy datasets
• compute marginal probability for each value to be repaired.
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement
Error Detection Error Correction
41
3. The Crux Of The Matter
Knowledge Cleaning
Methods &Tools:
• HoloClean
● use of integrity constraints,
● external data, and
● quantitative statistics.
● Steps
• separate entry datasets into noisy and clean dataset
• assign uncertainty score over the value of noisy datasets
• compute marginal probability for each value to be repaired
• SDValidate
● uses statistical distribution functions
● three steps:
• compute relative predicate frequency for each statement
• each statement selected in first step -> assign score of confidence
• apply threshold of confidence.
• Similar steps for instance assertions.
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement
Error Detection Error Correction
42
3. The Crux Of The Matter
Knowledge Cleaning
Methods & Tools:
• The LOD Laundromat [Beek et al., 2014]
● cleans Linked Open Data
● takes SPARQL endpoint/archived dataset as entry dataset
● guesses the serialisation format
● identifies syntax errors using a library while parsing RDF
● saves RDF data in canonical format
[Beek et al., 2014] W. Beek, L. Rietveld, H. R. Bazoobandi, J. Wielemaker, and S. Schlobach: LOD Laundromat: A Uniform Way of Publishing
Other People’s Dirty Data. In Proceedings of the 13th International Semantic Web Conference (ISWC2014), Springer, LNCS 8796, Riva del
Garda, Italy, October 19-23, 2014.
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement
Error Detection Error Correction
43
3. The Crux Of The Matter
Knowledge Cleaning
Methods & Tools:
• The LOD Laundromat [Beek et al., 2014]
● cleans Linked Open Data
● takes SPARQL endpoint/archived dataset as entry dataset
● guesses the serialisation format
● identifies syntax errors using a library while parsing RDF
● saves RDF data in canonical format
• KATARA [Chu et al., 2015]
● identifies correct & incorrect data
● generates possible corrections for wrong data
[Chu et al., 2015] X. Chu, J. Morcos, I. F. Ilyas, M. Ouzzani, P. Papotti, N. Tang, and Y. Ye: KATARA: reliable data cleaning with knowledge bases
and crowdsourcing. In Proceedings of the 41st International Conference on Very Large Data Bases (PVLDB2015), VLDB Endowment, 8(12),
2015.
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement
Error Detection Error Correction
44
3. The Crux Of The Matter
Knowledge Cleaning
Methods & Tools:
• The LOD Laundromat [Beek et al., 2014]
● cleans Linked Open Data
● takes SPARQL endpoint/archived dataset as entry dataset
● guesses the serialisation format
● identifies syntax errors using a library while parsing RDF
● saves RDF data in canonical format
• KATARA [Chu et al., 2015]
● identifies correct & incorrect data
● generates possible corrections for wrong data
• SPIN [Fürber et al., 2010b]
● SPARQL Constraint Language
● generates SPARQL Query templates based on data quality problems
• inconsistency
• lack of comprehensibility
• heterogeneity
• Redundancy
• Nowadays, SPIN has turned into SHACL, a language for validating RDF graphs.
[Fürber & Hepp, 2010b] C. Fürber and M. Hepp: Using semantic web resources for data quality management. In Proceedings of the 17th
International Conference on Knowledge Engineering and Management by the Masses (EKAW2010), Springer, LNCS 6317, Lisbon, Portugal,
October 11-15, 2010.
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement
Error Detection Error Correction
45
3. The Crux Of The Matter
Knowledge Enrichment
• The goal of knowledge enrichment is to improve the completeness of a
knowledge graph by adding new statements
• The process of Knowledge Enrichment has four phases:
• New Knowledge Source detection
• New Knowledge Source integration
• Duplicate detection and alignment
• Property-Value-Statements correction
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement
Knowledge Source detection Knowledge Source integration Duplicate detection
Property-Value-
Statements correction
46
3. The Crux Of The Matter
Knowledge Enrichment
• Knowledge Source detection
• search for additional sources of assertions for the KG
• Open sources
• Closed sources
• Knowledge Source integration
• Tbox: define mappings
• Abox: integrate new assertions into the KG
• Identifying and resolving duplicates
• Invalid property statements such as domain/range violations and having multiple values for a
unique property
• also known in the data quality literature as contradicting or uncertain attribute value
resolution.
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement
Knowledge Source detection Knowledge Source integration Duplicate detection
Property-Value-
Statements correction
47
3. The Crux Of The Matter
Knowledge Enrichment
Duplicate detection:
https://www.cs.umd.edu/~getoor/Tutorials/ER_VLDB2012.pdf
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement
Knowledge Source detection Knowledge Source integration Duplicate detection
Property-Value-
Statements correction
48
3. The Crux Of The Matter
Knowledge Enrichment
Methods and tools for duplicate detection and resolution:
• Silk is a framework for achieving entity linking.
• It tackles three tasks:
1. link discovery that defines similarity metrics to calculate a total similarity
value for a pair of entities
2. evaluation of the correctness and completeness of generated links, and
3. a protocol for maintaining data that allows source dataset and target
dataset to exchange generated link sets.
http://silkframework.org/
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement
Knowledge Source detection Knowledge Source integration Duplicate detection
Property-Value-
Statements correction
49
3. The Crux Of The Matter
Knowledge Enrichment
Methods and tools for duplicate detection and resolution:
• Legato [Achichi et al., 2017] is a linking tool based on indexing techniques.
• It implements the following steps:
1. data cleaning that filters properties from two input datasets. For example, properties that do not help the
comparison.
2. Instance profiling that creates instance profiles based on Concise Bounded Description for the source.
3. Pre-matching that applies indexing techniques (it takes TF-IDF values), filters such as tokenization and stop-words
removal, and cosine similarity to preselect the entity links.
4. Link repairing that validates each link produced by searching for a link to a target source.
[Achichi et al., 2017] M. Achichi, Z. Bellahsene, and K. Todorov: Lgato results for OAEI 2017. In Proceedings of the 12th International Workshop
on Ontology Matching (OM2017) co-located with the 16th International Semantic Web Conference (ISWC2017). CEUR Workshops, vol. 2032,
Vienna, Austria, October 21, 2017.
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement
Knowledge Source detection Knowledge Source integration Duplicate detection
Property-Value-
Statements correction
50
3. The Crux Of The Matter
Knowledge Enrichment
Methods and tools for duplicate detection and resolution:
• SERIMI [Araujo et al., 2011] tries to match instances between two datasets.
• It has three steps:
• property selection, allows users to select relevant properties from source dataset,
• the selection of candidates from a target dataset, uses string matching of properties
to select a set of candidates, and
• the disambiguation of candidates, measures the similarity for each candidate
applying a contrast model, which returns a degree of confidence.
• ADEL, Duke, Dedupe, LIMES, ...
[Araujo et al., 2011] S. Araujo, J. Hidders, D. Schwabe, and A. P. de Vries: SERIMI - Resource Description Similarity, RDF Instance Matching and Interlinking.
In Proceedings of the 6th International Workshop on Ontology Matching (OM2011), CEUR Workshop, vol. 814, Bonn, Germany, October 24, 2011.
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement
Knowledge Source detection Knowledge Source integration Duplicate detection
Property-Value-
Statements correction
51
3. The Crux Of The Matter
Knowledge Enrichment
Property-Value-Statements correction:
• KnoFuss allows data fusion using different methods.
• The workflow of KnoFuss is as follows:
1. It receives a dataset to be integrated into the target dataset,
2. It performs co-referencing using a similarity method, detects conflicts
utilizing ontological constraints, and resolve inconsistencies
3. It produces a dataset to be integrated into the target dataset.
• http://technologies.kmi.open.ac.uk/knofuss/
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement
Knowledge Source detection Knowledge Source integration Duplicate detection
Property-Value-
Statements correction
52
3. The Crux Of The Matter
Knowledge Enrichment
Property-Value-Statements correction:
• ODCleanStore [Michelfeit & Necaský, 2012] is a framework for cleaning, linking, quality
assessment, and fusing RDF data.
• The fusion module allows users to configure conflict resolution strategies based on
provenance and quality metadata. e.g. :
1. an arbitrary value, ANY, MIN, MAX, SHORTEST or LONGEST is selected from the
conflicting values,
2. computes AVG, MEDIAN, CONCAT of conflicting values,
3. the value with the highest (BEST) aggregate quality is selected,
4. the value with the newest (LATEST) time is selected, and
5. ALL input values are preserved.
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement
Knowledge Source detection Knowledge Source integration Duplicate detection
Property-Value-
Statements correction
53
3. The Crux Of The Matter
Knowledge Enrichment
Property-Value-Statements correction:
• Sieve [Mendes et al., 2012], is a framework that consists of two modules; a Quality assessment module and
a Data Fusion module.
• The Data Fusion module describes various fusion policies that are applied for fusing conflicting values.
• FAG, FuSem, MumMer, …
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement
Knowledge Source detection Knowledge Source integration Duplicate detection
Property-Value-
Statements correction
54
3. The Crux Of The Matter
Knowledge Deployment
• Building, implementing, and curating Knowledge Graphs is a time-
consuming and costly activity.
• Integrating large amounts of facts from heterogeneous information
sources does not come for free.
• [Paulheim, 2018b] estimates the average cost for one fact in a
Knowledge Graph between $0,1 and $6 depending on the amount
of mechanization.
[Paulheim, 2018b] H. Paulheim: How much is a Triple? Estimating the Cost of Knowledge Graph Creation. In ISWC-P&D-
Industry-BlueSky 2018: Proceedings of the ISWC 2018 Posters & Demonstrations, Industry and Blue Sky Ideas Tracks co-
located with 17th International Semantic Web Conference (ISWC 2018) Monterey, USA, October 8-12, 2018. http://www.
heikopaulheim.com/docs/iswc_bluesky_cost2018.pdf
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
55
3. The Crux Of The Matter
Knowledge Deployment
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
Name Instances Facts Types Relations
DBpedia (English) 4,806,150 176,043,129 735 2,813
YAGO 4,595,906 25,946,870 488,469 77
Freebase 49,947,845 3,041,722,635 26,507 37,781
Wikidata 15,602,060 65,993,797 23,157 1,673
NELL 2,006,896 432,845 285 425
OpenCyc 118,499 2,413,894 45,153 18,526
Google´s Knowledge Graph 570,000,000 18,000,000,000 1,500 35,000
Google´s Knowledge Vault 45,000,000 271,000,000 1,100 4,469
Yahoo! Knowledge Graph 3,443,743 1,391,054,990 250 800
56
3. The Crux Of The Matter
Knowledge Deployment
• We build a knowledge access layer on top of the Knowledge Graph helping to connect this resource to
applications.
• Knowledge management technology:
• based on graph‐based repositories host the Knowledge Graph (as a semantic data lake).
• The knowledge management layer is responsible for storing, managing and providing semantic
description of resources
• Inference engines (SemBase) based on deductive reasoning engines:
• implements agents that defines view on this graph together with context data on user requests.
• It accesses the graph to gain data for its reasoning that provides input to the dialogue engine
interacting with the human user.
• Reasons:
• Help to implement access rights, bypass inconsistencies and frillions
• Integrates additional information sources from the application (context, personalization, task etc.)
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
57
3. The Crux Of The Matter
Knowledge Deployment
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
Input MongoDB Semantify.it
editing
crawling
mapping
Storage GraphDB Hosting the Knowledge Graph
Output
Views
Reasoning Agent
Reasoning Agent
Reasoning Agent
58
3. The Crux Of The Matter
Knowledge Deployment
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
Knowledge Infrastructure
Generic Application Layer Conversational Interfaces
59
4. The Proof Of The pudding Is In The Eating
Onlim
• The pioneer in automating customer communication via AI chatbots and
conversational interfaces
• Enterprise solutions for making data and knowledge available for conversational
interfaces
• Team of 25+ highly experienced AI experts, specialists in semantics and data science
• Spin-off of University of Innsbruck
• HQ in Europe (Vienna, Telfs)
Current FocusVerticals
60
UtilitiesTourismRetail
Education Financial Services
4. The Proof Of The pudding Is In The Eating
Onlim
61
4. The Proof Of The pudding Is In The Eating
• The Chatbot market is expected to grow from its current market value (2018) of more than $250 million to over $1.34
billion by 2024.
• The growth is due to the evolving usage of chatbots for content marketing activities such as digital marketing and
advertising.
• With the rise of Artificial Intelligence (AI) and conversational user interfaces, we are increasingly likely to interact with a
bot than ever before.
• Businesses are following customers onto messaging platforms. 90% of businesses use Facebook to respond to service
requests.
• But also the transfer from social towards conversational interfaces is impressing. Bots on Facebook messenger can
tremendously help businesses in dealing with that issue.
• https://www.sdcexec.com/software-technology/news/21011880/chatbot-market-to-grow-at-31-percent-cagr-from-2018-to-2024
• https://www.gartner.com/smarterwithgartner/gartner-predicts-a-virtual-world-of-exponential-change/
• https://www.businessinsider.in/tech/data-a-massive-hidden-shift-is-driving-companies-to-use-a-i-bots-inside-facebook-messenger/slidelist/52240155.cms
62
4. The Proof Of The pudding Is In The Eating
• In 2017, 20 % of the web searches were conducted via voice assistants.
• Artificial intelligence-based voice assistance (AI-voice) will soon be a primary user interface for all digital
devices – including smartphones, smart speakers, personal computers, automobiles, and home appliances.
• As of mid-January 2019, more than 1 billion devices worldwide were equipped with Google’s AI-voice
Assistant, and another hundred million devices spoke with Amazon’s Alexa – and neither number accounts
for devices equipped with voice assistants from Apple, Microsoft, Samsung, or across the digital worlds of
China and Asia.
• Juniper Research forecasts the global market for voice assistants to grow at a 25.4 percent CAGR over the
next five years, with 8 billion active voice assistants (across all platforms and devices) by 2023.
https://voicebot.ai/2019/01/07/google-assistant-to-be-available-on-1-billion-devices-this-month-10x-more-than-alexa/
https://www.juniperresearch.com/press/press-releases/digital-voice-assistants-in-use-to-triple
63
4. The Proof Of The pudding Is In The Eating
• Chatbots and Voice Assistants have started to play an increasing role in customer communication for many
business in various verticals.
• Especially in tourism they are proving more and more benefits in terms of convenience, availability, and fast
access to information delivery and customer support through the entire customer journey.
• In the dreaming and planning phase hotels and Destination Management Organizations (DMOs) can
provide information through Chatbots and Voice Assistants about the hotel and/or the region, the
surroundings, and weather conditions to potential guests.
• In the booking phase, from booking the hotel and transport to buying connected services, e.g. ski
tickets, all becomes much simpler and efficient by using natural language.
• Finally in the experiences phase, Chatbots and Voice Assistants can also announce special offers or
events. All requested information and processes are available 24/7/365 and instantly. For hotels guests
in particular, the stay experience can be enriched by providing them access to hotel services and
beyond.
64
4. The Proof Of The pudding Is In The Eating
• ATouristic Knowledge Graph integrates and connects data from several sources including:
• touristic data sources:
• open data sources:
• It includes entities of the following types:
• LocalBusiness
• POIs, Infrastructure
• SportsActivityLocations (e.g.Trails, SkiResorts)
• Events
• Offers
• WebCams
• Mobility andTransport
65
4. The Proof Of The pudding Is In The Eating
SkiRouteCableCar
Slope
SkiResort
Touristic Knowledge Graph excerpt
SkiResort, Lifts, Slopes, WebCams
ChairLift
WebCam
Data Visualisation
(based on GraphDB)
containedInPlace
SkiLift
TBar
SnowReport
subClassOf
containedInPlace
66
4. The Proof Of The pudding Is In The Eating
The Touristic KG is used to
answer questions such as:
• “Where can I have a
traditional Tyrolean food
when going cross country
skiing?”
• “Show me WebCams near
Kölner Haus”
• “How many people are
leaving in Serfaus?”
67
4. The Proof Of The pudding Is In The Eating
The Dach-KG working group
• develops a de facto standard for semantic annotation of touristic content, data, and services in
the DACH area
• based on schema.org and its adaptation by domain specifications
• it should become the backbone of an open 5* Knowledge Graph for touristic data in DACH
*) The dataset gets awarded one star if the data are provided under an open license.
**) Two stars, if the data are available as structured data.
***) Three stars, if the data are also available in a non-proprietary format.
****) Four stars if URIs are used, that the data can be referenced and
*****) five stars, if the data set are linked to other data sets that can provide context.
https://www.tourismuszukunft.de/2019/05/dach-kg-neue-ergebnisse-naechste-schritte-beim-thema-open-data/
68
4. The Proof Of The pudding Is In The Eating
Members of the Dach-KG working group
• Touristic experts from the DACH-region (Germany (D), Austria (A), Switzerland (CH)) and Italy
(South-Tyrol)
• the Austrain and German touristic associations,
• LTOs (Tirol, Vorarlberg, Wien, Brandenburg, Thüringen, …)
• Associated: DMOs (Mayrhofen, Seefeld, …)
• STI Innsbruck and STI International
• Planned is an extension by technology providers
(Datacycle, Feratel, Hubermedia, infomax, LandinSicht, Onlim, Outdooractive, TSO, ...)
69
4. The Proof Of The pudding Is In The Eating
We build the Tyrol Knowledge Grapgh (TKG) as a nucleus for this innitiative
• It is a five star linked open data set published in GraphDB providing a SPARQL endpoint for the provisioning
of touristic data of Tyrol, Austria.
• The TKG currently contains data about touristic infrastructure like accommodation businesses, restaurants,
points of interests, events, recipes, et. The data of the TKG fall under three categories of data:
• Static data is information which is rarely changing like the address of a hotel.
• Dynamic data is fast changing information, like availabilities and prices.
• Active data describe actions that can be executed, for example, the description of a purchase- or
reservation.
• At November 25, 2018, the TKG contained around 5 billion statements, of which 55% are explicit and 45%
are inferred. Every day the Knowledge Graph grows by around 8 million statements.
• http://graphdb.sti2.at:8080/
70
4. The Proof Of The pudding Is In The Eating
There is a world beyond leasurement:
71
UtilitiesTourismRetail
Financial ServicesEducation
5. Key Takeaway
Our aim:
• Establish a maximally automated knowledge lifecycle: NLU training, Query generation, Querying
and representing world knowledge, as well as Natural Language Generation
• Automatically distribute knowledge into all available channels
• Core are methodologies, methods, and tools to generate, host, curate, deploy, and access
Knowledge Graphs containing frillions of statements from heterogeneous, distributed, and
dynamic sources.
amazon.com
Knowledge
Graph
©google.com ©slack.com ©facebook.com
...
72
73
1 of 87

Recommended

Talking knowledge graphs ny by
Talking knowledge graphs nyTalking knowledge graphs ny
Talking knowledge graphs nySTI Innsbruck
854 views30 slides
Knowledge graphs by
Knowledge graphsKnowledge graphs
Knowledge graphsSTI Innsbruck
349 views67 slides
Knowledge Graphs: Smart Big Data by
Knowledge Graphs: Smart Big DataKnowledge Graphs: Smart Big Data
Knowledge Graphs: Smart Big DataSTI Innsbruck
125 views210 slides
Big data, Analytics and Beyond by
Big data, Analytics and BeyondBig data, Analytics and Beyond
Big data, Analytics and BeyondQuantUniversity
253 views25 slides
kaggle_meet_up by
kaggle_meet_upkaggle_meet_up
kaggle_meet_upMarios Michailidis
5.6K views34 slides
Building High Available and Scalable Machine Learning Applications by
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsYalçın Yenigün
1.6K views66 slides

More Related Content

Similar to Talking knowledge-graphs

PyData Global: Thrifty Machine Learning by
PyData Global: Thrifty Machine LearningPyData Global: Thrifty Machine Learning
PyData Global: Thrifty Machine LearningRebecca Bilbro
59 views37 slides
Multi-modal sources for predictive modeling using deep learning by
Multi-modal sources for predictive modeling using deep learningMulti-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningSanghamitra Deb
38 views40 slides
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018 by
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Sri Ambati
460 views18 slides
Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013 by
Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013
Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013Neo4j
2.5K views59 slides
fINAL Lesson_1_Course_Introduction_v1.pptx by
fINAL Lesson_1_Course_Introduction_v1.pptxfINAL Lesson_1_Course_Introduction_v1.pptx
fINAL Lesson_1_Course_Introduction_v1.pptxdataKarthik
5 views31 slides
H2O World - Intro to Data Science with Erin Ledell by
H2O World - Intro to Data Science with Erin LedellH2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin LedellSri Ambati
3K views30 slides

Similar to Talking knowledge-graphs(20)

PyData Global: Thrifty Machine Learning by Rebecca Bilbro
PyData Global: Thrifty Machine LearningPyData Global: Thrifty Machine Learning
PyData Global: Thrifty Machine Learning
Rebecca Bilbro59 views
Multi-modal sources for predictive modeling using deep learning by Sanghamitra Deb
Multi-modal sources for predictive modeling using deep learningMulti-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learning
Sanghamitra Deb38 views
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018 by Sri Ambati
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Sri Ambati460 views
Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013 by Neo4j
Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013
Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013
Neo4j2.5K views
fINAL Lesson_1_Course_Introduction_v1.pptx by dataKarthik
fINAL Lesson_1_Course_Introduction_v1.pptxfINAL Lesson_1_Course_Introduction_v1.pptx
fINAL Lesson_1_Course_Introduction_v1.pptx
dataKarthik5 views
H2O World - Intro to Data Science with Erin Ledell by Sri Ambati
H2O World - Intro to Data Science with Erin LedellH2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin Ledell
Sri Ambati3K views
HPCAC - the state of bioinformatics in 2017 by philippbayer
HPCAC - the state of bioinformatics in 2017HPCAC - the state of bioinformatics in 2017
HPCAC - the state of bioinformatics in 2017
philippbayer370 views
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat... by Alok Singh
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Alok Singh105 views
Barga Data Science lecture 1 by Roger Barga
Barga Data Science lecture 1Barga Data Science lecture 1
Barga Data Science lecture 1
Roger Barga263 views
Pemanfaatan Big Data Dalam Riset 2023.pptx by elisarosa29
Pemanfaatan Big Data Dalam Riset 2023.pptxPemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptx
elisarosa291 view
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z... by Maurice Nsabimana
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
ODSC West 2022 – Kitbashing in ML by Bryan Bischof
ODSC West 2022 – Kitbashing in MLODSC West 2022 – Kitbashing in ML
ODSC West 2022 – Kitbashing in ML
Bryan Bischof60 views
CIS 375 Focus Dreams/newtonhelp.com by bellflower87
CIS 375 Focus Dreams/newtonhelp.comCIS 375 Focus Dreams/newtonhelp.com
CIS 375 Focus Dreams/newtonhelp.com
bellflower8753 views
training_presentation by Yudi512144
training_presentationtraining_presentation
training_presentation
Yudi5121444 views
Machine Learning Foundations for Professional Managers by Albert Y. C. Chen
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional Managers
Albert Y. C. Chen1.2K views
Afternoons with Azure - Azure Machine Learning by CCG
Afternoons with Azure - Azure Machine Learning Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning
CCG195 views
2014 manchester-reproducibility by c.titus.brown
2014 manchester-reproducibility2014 manchester-reproducibility
2014 manchester-reproducibility
c.titus.brown2.6K views
Machine Learning Vs. Deep Learning – An Example Implementation by Synerzip
Machine Learning Vs. Deep Learning – An Example ImplementationMachine Learning Vs. Deep Learning – An Example Implementation
Machine Learning Vs. Deep Learning – An Example Implementation
Synerzip77 views

More from STI Innsbruck

Lecture5a by
Lecture5aLecture5a
Lecture5aSTI Innsbruck
135 views49 slides
Lecture5 by
Lecture5Lecture5
Lecture5STI Innsbruck
72 views27 slides
Lecture4a by
Lecture4aLecture4a
Lecture4aSTI Innsbruck
87 views15 slides
Lecture4 by
Lecture4Lecture4
Lecture4STI Innsbruck
67 views21 slides
Lecture3 by
Lecture3Lecture3
Lecture3STI Innsbruck
71 views47 slides
Lecture2 by
Lecture2Lecture2
Lecture2STI Innsbruck
57 views26 slides

More from STI Innsbruck(20)

11 -web_application_development_process_and_project_management_ by STI Innsbruck
11  -web_application_development_process_and_project_management_11  -web_application_development_process_and_project_management_
11 -web_application_development_process_and_project_management_
STI Innsbruck101 views
06 testing and-usability_on_the_web by STI Innsbruck
06 testing and-usability_on_the_web06 testing and-usability_on_the_web
06 testing and-usability_on_the_web
STI Innsbruck68 views
04a developing applications-with_web_ml by STI Innsbruck
04a developing applications-with_web_ml04a developing applications-with_web_ml
04a developing applications-with_web_ml
STI Innsbruck56 views

Recently uploaded

Black and White Modern Science Presentation.pptx by
Black and White Modern Science Presentation.pptxBlack and White Modern Science Presentation.pptx
Black and White Modern Science Presentation.pptxmaryamkhalid2916
16 views21 slides
Perth MeetUp November 2023 by
Perth MeetUp November 2023 Perth MeetUp November 2023
Perth MeetUp November 2023 Michael Price
19 views44 slides
virtual reality.pptx by
virtual reality.pptxvirtual reality.pptx
virtual reality.pptxG036GaikwadSnehal
11 views15 slides
20231123_Camunda Meetup Vienna.pdf by
20231123_Camunda Meetup Vienna.pdf20231123_Camunda Meetup Vienna.pdf
20231123_Camunda Meetup Vienna.pdfPhactum Softwareentwicklung GmbH
33 views73 slides
SAP Automation Using Bar Code and FIORI.pdf by
SAP Automation Using Bar Code and FIORI.pdfSAP Automation Using Bar Code and FIORI.pdf
SAP Automation Using Bar Code and FIORI.pdfVirendra Rai, PMP
22 views38 slides
Report 2030 Digital Decade by
Report 2030 Digital DecadeReport 2030 Digital Decade
Report 2030 Digital DecadeMassimo Talia
15 views41 slides

Recently uploaded(20)

Black and White Modern Science Presentation.pptx by maryamkhalid2916
Black and White Modern Science Presentation.pptxBlack and White Modern Science Presentation.pptx
Black and White Modern Science Presentation.pptx
maryamkhalid291616 views
Perth MeetUp November 2023 by Michael Price
Perth MeetUp November 2023 Perth MeetUp November 2023
Perth MeetUp November 2023
Michael Price19 views
SAP Automation Using Bar Code and FIORI.pdf by Virendra Rai, PMP
SAP Automation Using Bar Code and FIORI.pdfSAP Automation Using Bar Code and FIORI.pdf
SAP Automation Using Bar Code and FIORI.pdf
Web Dev - 1 PPT.pdf by gdsczhcet
Web Dev - 1 PPT.pdfWeb Dev - 1 PPT.pdf
Web Dev - 1 PPT.pdf
gdsczhcet60 views
Attacking IoT Devices from a Web Perspective - Linux Day by Simone Onofri
Attacking IoT Devices from a Web Perspective - Linux Day Attacking IoT Devices from a Web Perspective - Linux Day
Attacking IoT Devices from a Web Perspective - Linux Day
Simone Onofri15 views
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N... by James Anderson
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
James Anderson66 views
Unit 1_Lecture 2_Physical Design of IoT.pdf by StephenTec
Unit 1_Lecture 2_Physical Design of IoT.pdfUnit 1_Lecture 2_Physical Design of IoT.pdf
Unit 1_Lecture 2_Physical Design of IoT.pdf
StephenTec12 views
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... by Bernd Ruecker
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
Bernd Ruecker33 views
Igniting Next Level Productivity with AI-Infused Data Integration Workflows by Safe Software
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Safe Software257 views
1st parposal presentation.pptx by i238212
1st parposal presentation.pptx1st parposal presentation.pptx
1st parposal presentation.pptx
i2382129 views
Lilypad @ Labweek, Istanbul, 2023.pdf by Ally339821
Lilypad @ Labweek, Istanbul, 2023.pdfLilypad @ Labweek, Istanbul, 2023.pdf
Lilypad @ Labweek, Istanbul, 2023.pdf
Ally3398219 views
Spesifikasi Lengkap ASUS Vivobook Go 14 by Dot Semarang
Spesifikasi Lengkap ASUS Vivobook Go 14Spesifikasi Lengkap ASUS Vivobook Go 14
Spesifikasi Lengkap ASUS Vivobook Go 14
Dot Semarang37 views

Talking knowledge-graphs

  • 1. Talking Knowledge Graphs Dieter Fensel with the help of the entire MindLab team STI Innsbruck, University of Innsbruck, Austria May 17, 2019
  • 2. Prerequisite MindLab: • MindLab is a self-funded cooperative research project with the objective to develop methods and software tools for modeling and implementing scalability for knowledge graphs. • Partners 2
  • 3. Talking Knowledge Graphs 1. Motivation 2. The Grand Challenges 3. The Crux Of The Matter 4. The Proof Of The Pudding Is In The Eating 5. Key Takeaway 3
  • 4. 1. Motivation • Text/Voice becomes mainstream • Use cases are still basic • Knowledge is Power!Without knowledge -> no understanding of users needs and goals Please, book a table in a restaurant with roast pork having reasonable prices in Mayrhofen for tonight Restaurant in Mayrhofen? Has roast pork? price? Image: ©amazon.com Sorry, I don’t know how to help you! 4
  • 5. 1. Motivation Please, book a table in a restaurant with roast pork having reasonable prices in Mayrhofen for tonight Image: ©amazon.com KG action: TableReservation type: Restaurant, offers: Roast Pork Location: Mayrhofen Price: price_level generated query: ?- tableReservationAction(), type(Restaurant), offers(RoastPork). Predefined rules: ● tableReservationAction: book a table in a given Restaurant ● type: return all elements of type <type> ● offers: return all elements that offer <offer> ● ... Query Generation NLG Extracted Knowledge Generated Language output Knowledge Graph contains deep, accurate, and up-to- date knowledge about leasurement services in Tyrol. 5
  • 6. 2. The Grand Challenges User 1. understand Intent + Parameters 2. map Query 3. query Knowledge Graph 4. Natural Language Generation 6
  • 7. 2. The Grand Challenges: Unterstand NLU • Voice/Text recognition already quite good • However require significant manual labor Manual work • Design intents based on schema of Knowledge Graph • Define utterances (example questions) per intent • Mark parameters that should be extracted from utterances Automation • Entity detection: Push entities from Knowledge Graph • Detect unanswered questions • Use Knowledge Graph to update/extend NLU: • create utterances • supervised-learning: extend utterances with unanswered questions User understand Intent + Parameters map Query query Knowledge 1. 2. 3. NLG1 4. NLU Knowledge 7
  • 8. 2. The Grand Challenges: Query Generation • Basis: detected intent & extracted parameters during NLU • Map extracted information (intent & parameters) on predefined rules • Query: Combination of rules on SPARQL queries • Additional restriction rules • Define a view on a relevant subgraph of the Knowledge Graphs  A Chatbots may not have access to the whole Knowledge Graph (prevent frillions, inconsistencies, and implements access right restrictions) User understand Intent + Parameters map Query query Knowledge 1. 2. 3. NLG1 4. Generated query Intent (with parameters) Query generation Predefined rules 8
  • 9. 2. The Grand Challenges Querying the Knowledge Graph • Query is a combination of predefined rules accessing the knowledge through SPARQL • Knowledge Graph must provide: • Large volumes of data • Integration from heterogeneous resources • Accessing distributed sources • Providing dynamic updates (temperature, etc.) • Defining sub graphs • Curated in regard to inconsistencies and incompleteness User understand Intent + Parameters map Query query Knowledge 1. 2. 3. NLG1 4. 9
  • 10. 2. The Grand Challenges Natural Language Generation Manual work • Define templates based on • structure of data • information that should be given to the user Automatic • Generate • templates out of the Knowledge Graph • textual answers from the Knowledge Graph • follow up questions to run dialogs User understand Intent + Parameters map Query query Knowledge 1. 2. 3. NLG1 4. 10
  • 11. 3. The Crux Of The Matter • The quality of the Intelligent Assistants depends directly on the quality of the Knowledge Graph • Problem: “Garbage in Garbage out” • Requirements for the Knowledge Graph: • well structured (using an ontology - schema.org) • accurate information (correctness) • large and detailed coverage (completeness) • Timeliness of knowledge ==> Knowledge Graph Lifecycle 11
  • 12. Knowledge Creation Knowledge Hosting Knowledge Cleaning Knowledge Enrichment Knowledge Curation Knowledge Deployment Knowledge Assesment 3. The Crux Of The Matter: Process Model 12
  • 13. 3. The Crux Of The Matter: KG Task Model Knowledge Graph Maintenance Knowledge Hosting Knowledge Curation Knowledge Deployment Knowledge Assesment Knowledge Cleaning Knowledge Enrichement Error Detection Error Correction Evaluation Correctness Completeness Knowledge Source detection Knowledge Source integration Duplicate detection Property-Value- Statements correction Knowledge Creation Edit Semi-automatic AutomaticMapping 13
  • 14. 3. The Crux Of The Matter: KG Task Model Knowledge Graph Maintenance Knowledge Hosting Knowledge Curation Knowledge Deployment Knowledge Assesment Knowledge Cleaning Knowledge Enrichement Error Detection Error Correction Evaluation Correctness Completeness Knowledge Source detection Knowledge Source integration Duplicate detection Property-Value- Statements correction Knowledge Creation Edit Semi-automatic AutomaticMapping 13 MindLab Status Year 1
  • 15. 3. The Crux Of The Matter: KG Task Model Knowledge Graph Maintenance Knowledge Hosting Knowledge Curation Knowledge Deployment Knowledge Assesment Knowledge Cleaning Knowledge Enrichement Error Detection Error Correction Evaluation Correctness Completeness Knowledge Source detection Knowledge Source integration Duplicate detection Property-Value- Statements correction Knowledge Creation Edit Semi-automatic AutomaticMapping 13 MindLab Status Year 1
  • 16. 3. The Crux Of The Matter: KG Task Model Knowledge Graph Maintenance Knowledge Hosting Knowledge Curation Knowledge Deployment Knowledge Assesment Knowledge Cleaning Knowledge Enrichement Error Detection Error Correction Evaluation Correctness Completeness Knowledge Source detection Knowledge Source integration Duplicate detection Property-Value- Statements correction Knowledge Creation Edit Semi-automatic AutomaticMapping 13 MindLab Status Year 1
  • 17. 3. The Crux Of The Matter: KG Task Model Knowledge Graph Maintenance Knowledge Hosting Knowledge Curation Knowledge Deployment Knowledge Assesment Knowledge Cleaning Knowledge Enrichement Error Detection Error Correction Evaluation Correctness Completeness Knowledge Source detection Knowledge Source integration Duplicate detection Property-Value- Statements correction Knowledge Creation Edit Semi-automatic AutomaticMapping 13 MindLab Status Year 1
  • 18. 3. The Crux Of The Matter: KG Task Model Knowledge Graph Maintenance Knowledge Hosting Knowledge Curation Knowledge Deployment Knowledge Assesment Knowledge Cleaning Knowledge Enrichement Error Detection Error Correction Evaluation Correctness Completeness Knowledge Source detection Knowledge Source integration Duplicate detection Property-Value- Statements correction Knowledge Creation Edit Semi-automatic AutomaticMapping 13 MindLab Status Year 1
  • 19. 3. The Crux Of The Matter: KG Task Model Knowledge Graph Maintenance Knowledge Hosting Knowledge Curation Knowledge Deployment Knowledge Assesment Knowledge Cleaning Knowledge Enrichement Error Detection Error Correction Evaluation Correctness Completeness Knowledge Source detection Knowledge Source integration Duplicate detection Property-Value- Statements correction Knowledge Creation Edit Semi-automatic AutomaticMapping 13 MindLab Status Year 1
  • 20. 3. The Crux Of The Matter: KG Task Model Knowledge Graph Maintenance Knowledge Hosting Knowledge Curation Knowledge Deployment Knowledge Assesment Knowledge Cleaning Knowledge Enrichement Error Detection Error Correction Evaluation Correctness Completeness Knowledge Source detection Knowledge Source integration Duplicate detection Property-Value- Statements correction Knowledge Creation Edit Semi-automatic AutomaticMapping 13 MindLab Status Year 1
  • 21. 3. The Crux Of The Matter: KG Task Model Knowledge Graph Maintenance Knowledge Hosting Knowledge Curation Knowledge Deployment Knowledge Assesment Knowledge Cleaning Knowledge Enrichement Error Detection Error Correction Evaluation Correctness Completeness Knowledge Source detection Knowledge Source integration Duplicate detection Property-Value- Statements correction Knowledge Creation Edit Semi-automatic AutomaticMapping 13 MindLab Status Year 1
  • 22. 3. The Crux Of The Matter: KG Task Model Knowledge Graph Maintenance Knowledge Hosting Knowledge Curation Knowledge Deployment Knowledge Assesment Knowledge Cleaning Knowledge Enrichement Error Detection Error Correction Evaluation Correctness Completeness Knowledge Source detection Knowledge Source integration Duplicate detection Property-Value- Statements correction Knowledge Creation Edit Semi-automatic AutomaticMapping 13 MindLab Status Year 1
  • 23. 3. The Crux Of The Matter: KG Task Model Knowledge Graph Maintenance Knowledge Hosting Knowledge Curation Knowledge Deployment Knowledge Assesment Knowledge Cleaning Knowledge Enrichement Error Detection Error Correction Evaluation Correctness Completeness Knowledge Source detection Knowledge Source integration Duplicate detection Property-Value- Statements correction Knowledge Creation Edit Semi-automatic AutomaticMapping 13 MindLab Status Year 1
  • 24. 3. The Crux Of The Matter: KG Task Model Knowledge Graph Maintenance Knowledge Hosting Knowledge Curation Knowledge Deployment Knowledge Assesment Knowledge Cleaning Knowledge Enrichement Error Detection Error Correction Evaluation Correctness Completeness Knowledge Source detection Knowledge Source integration Duplicate detection Property-Value- Statements correction Knowledge Creation Edit Semi-automatic AutomaticMapping 13 MindLab Status Year 2 (our dreams)
  • 25. 3. The Crux Of The Matter Knowledge Generation Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment Edit Semi-automatic AutomaticMapping 14
  • 26. 3. The Crux Of The Matter Knowledge Generation • https://www.schema.org/ • Started in 2011 by Bing, Google,Yahoo!, andYandex to annotate websites. • Has become de facto standard. • We use it for the web site channel as well as for all other channels as an reference model for our semantic annotations. • However, we use value restriction not as inference mechanism but as integrity constraint. • We define domain specific extensions (that also restrict the genericity of entire schema.org). Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment Edit Semi-automatic AutomaticMapping 15
  • 27. 3. The Crux Of The Matter Knowledge Generation • The use of semantic annotations has experienced a tremendous surge in activity since the introduction of schema.org. • Schema.org was introduced with 297 classes and 187 relations, • which over have grown to 598 types, 862 properties, and 114 enumeration values. • The provided corpus of • types (e.g. LocalBusiness, SkiResort, Restaurant), • properties (e.g. name, description, address), • range restrictions (e.g. Text, URL, PostalAddress), • and enumeration values (e.g. DayOfWeek, EventStatusType, ItemAvailability) covers large numbers of different domains, including the tourism domain. Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment Edit Semi-automatic AutomaticMapping 16
  • 28. 3. The Crux Of The Matter Knowledge Generation Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment Edit Semi-automatic AutomaticMapping 17
  • 29. 3. The Crux Of The Matter Knowledge Generation • Domain Specifications: • restrict generality and • extend domain-specifity of schema.org • Are based on Shacl • https://schema-tourism.sti2.org/ Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment Edit Semi-automatic AutomaticMapping Schema.org Domain Domain Specification 18
  • 30. 3. The Crux Of The Matter Knowledge Generation Our Methodology: • the bottom-up part, which describes the steps of the initial annotation process; • the domain specification modeling; and • the top-down part, which applies the constructed models. Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment Edit Semi-automatic AutomaticMapping 19
  • 31. 3. The Crux Of The Matter Knowledge Generation Manual Annotation Editor Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment Edit Semi-automatic AutomaticMapping 20
  • 32. 3. The Crux Of The Matter Knowledge Generation • Semi-automatic • Annotation Editor suggests mappings/extracted information • e.g. extract information from web pages (by HTML tags). • Use partial NLU to find similarities of the content and schema.org vocabulary. • Manual adaptions needed to define and to evaluate. • Instance of the general issues of wrapper generation. Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment Edit Semi-automatic AutomaticMapping 21
  • 33. 3. The Crux Of The Matter Knowledge Generation • Mapping (more than 95% of the story) • integrate large and fast changing data sets • map different formats to the ontology used in our Knowledge Graph • Various frameworks: XLWrap, Mapping Master (M2), a generic XMLtoRDF tool providing a mapping document (XML document) that has a link between an XML Schema and an OWL ontology, Tripliser, GRDDL, R2RML, RML, ... • We developed a customization of RML, called RocketRML. • The semantify.it platform features a wrapper API where these mappings can be stored and applied to corresponding data sources. • The wrapper translates the data according to the mappings and stores it as JSON-LD in a MongoDB. Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment Edit Semi-automatic AutomaticMapping 22
  • 34. 3. The Crux Of The Matter Knowledge Generation Automatic extraction of knowledge from text representations and web pages • Tasks • named entity recognition, • concept mining, text mining, • relation detection, … • Methods • Information Extraction • Natural Language Processing (NLP) • Machine Learning (ML) • Systems: • GATE (text analysis & language processing) • OpenNLP (supports most common NLP tasks) • RapidMine (data preparation, machine learning, deep learning, text mining, predictive analysis) Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment Edit Semi-automatic AutomaticMapping 23
  • 35. 3. The Crux Of The Matter Knowledge Generation Evaluation of semantic annotations: • The semantify.it validator is a web-tool that offers the possibility to validate schema.org annotations that are scrapped from websites. • Verification: The annotations are checked against plain schema.org and against domain specifications • Validation : The annotations are checked whether they accurately describe of the content of the web site. Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment Edit Semi-automatic AutomaticMapping 23
  • 36. 3. The Crux Of The Matter Knowledge Generation Evaluation of semantic annotations: • Notice we take the content of the web site as Golden Standard. • We do NOT evaluate the accuracy of that content in regard to the „real“ world. • We check whether a phone number confirms to the formal constraints. • We do not make robocalls to hotels to check whether the „right“ hotel pick up the phone. Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment Edit Semi-automatic AutomaticMapping 23
  • 37. 3. The Crux Of The Matter Knowledge Generation Evaluation Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment Edit Semi-automatic AutomaticMapping 23
  • 38. 3. The Crux Of The Matter Knowledge Hosting Semantify.it1): A platform for creating, hosting, validating, verifying, and publishing schema.org annotated data • annotation of static data based on schema.org templates  Domain Specifications2) • annotation of dynamic data based on RML mappings RocketRML3) 1) https://semantify.it 2) http://ds.sti2.org 3) https://github.com/semantifyit/RocketRML 24 Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment Edit Semi-automatic AutomaticMapping
  • 39. 3. The Crux Of The Matter Knowledge Hosting Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment Annotation - Tool (e.g. semantify.it) Document store (e.g. MongoDB) Graph database (e.g. GraphDB) Hosting ... Semabtic Web Annotations 25 Knowledge Graphs
  • 40. 3. The Crux Of The Matter Knowledge Hosting • Semantically annotated date can be serialized to JSON-LD • storage in document store MongoDB • native JSON storage • well integrated in current state of the art software with NodeJS • performant search, through indexing • not hardware intensive no native RDF querying with SPARQL Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment 26
  • 41. 3. The Crux Of The Matter Knowledge Hosting • Native storage of semantically annotated data • RDF store: GraphDB • very powerful CRUD operations • named graphs for versioning • full implementation of SPARQL • powerful reasoning over big data sets no web frameworks available • very hardware intensive Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment 27
  • 42. 3. The Crux Of The Matter Knowledge Curation • We defined a simple KR formalism formalizing essentials of schema.org • Tbox: isA statements of types, domain and range definitions for properties (using them globally or locally) • Abox: isElementOf(I,t) statements, Property-Value Statements p(i1,i2), and sameAs(i1,i2) statements • Enables a formal definition of the knowledge curation task (assessment, cleaning, and enrichment). Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Evaluation Correctness Completeness 28
  • 43. 3. The Crux Of The Matter Knowledge Assesment • Knowledge Assessment describes and defines the process of assessing the quality of a Knowledge Graph. • The goal is to measure the usefulness of a Knowledge Graph. • Evaluation • Overall process to determine the quality of a Knowledge Graph. • Select quality dimensions and metrics (see literature on data quality). • Evaluate representative subsets accordingly. Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Evaluation Correctness Completeness 29
  • 44. 3. The Crux Of The Matter Knowledge Assesment • Correctness • Identify the amount of wrong assertions • Completeness • Identify missing assertion sets • Furthers accessibility, accuracy, appropriate amount, believability, completeness, concise representation, consistent representation, cost-effectiveness, easy of manipulating, easy of operation, easy of understanding, flexibility, free-of-error, interpretability, objectivity, relevancy, reputation, security, timeliness, traceability, understandability, value-added, and variety Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Evaluation Correctness Completeness 30
  • 45. 3. The Crux Of The Matter Knowledge Assesment [Paulheim et al., 2019] identify the following subtasks: • specifying datasets and Knowledge Graphs, • specifying the evaluation protocol, • specifying the evaluation metrics, • specifying the task for task-specific evaluation, • and defining the setting in terms of intristic vs. task-baed, and automatic versus human- centric evaluation, • as well as the need to keep the results reproducible. H. Paulheim, M. Sabon, M. Choches, and W. Beck: Evaluation of Knowledge Graphs. In P. A. Bonatti, S. Decker, A. Polleres, and V. Presutti: Knowledge Graphs: New Directions for Knowledge Representation on the Semantic Web, Dagstuhl Reports, 8(9):29-111, 2019. Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Evaluation Correctness Completeness 31
  • 46. 3. The Crux Of The Matter Knowledge Assesment Methodologies • Total Data Quality Management (TDQM) [Wang, 1998] and Data Quality Assessment [Pipino et al., 2002] allow identifying important quality dimension and their requirements from various perspectives. • Other methodologies already defined quality metrics that allow a semi-automatic assessment based on data integrity constraints. Those are for example User-driven assessment [Zaveri et al., 2013], Test-driven assessment [Kontokostas et al., 2014] and a manual assessment based on crowd's experts (Crowdsourcing-driven assessment [Acosta et al., 2013]). • Besides that, there are quality assessment approaches which use statistical distribution for measuring the correctness of statements [Paulheim & Bizer, 2014], SPARQL queries for the identification of functional dependency violations and missing values [Fürber & Hepp, 2010a] [Fürber & Hepp, 2010b]. Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Evaluation Correctness Completeness 32
  • 47. 3. The Crux Of The Matter Knowledge Assesment Tools and Methods: • LINK-QA • using network metrics • Luzzu (Linked Open Datasets) • thirty data quality metrics based on Dataset Quality Ontology. • Sieve • flexibly expressing quality assessment methods • fusion methods • SWIQA (Semantic Web Information Quality Assessment Framework) • data quality rules & quality scores for identifying wrong data • Validata • online tool for testing/validating RDF data against ShEx-schemas Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Evaluation Correctness Completeness 33
  • 48. 3. The Crux Of The Matter Knowledge Assesment Sleve: • Sieve for Data Quality Assessment [Mendes et al., 2012] is a framework which consist of two modules: • a Quality Assessment module and • a Data Fusion module • The Quality Assessment Module involves four steps: 1. Data Quality Indicator allows to define an aspect of a data set that may demonstrate the suitability of it for intended use. For example, meta-information about the creation of a data set, information about the provider, or ratings provided by the consumers. 2. Scoring Functions define the assessment of the quality indicator based on its quality dimension. Scoring functions range from simple comparisons, over set functions, aggregation functions, to more complex statistical functions, text-analysis, or network analysis methods. 3. Assessment Metric calculates the assessment score based on indicators and scoring functions. 4. Aggregate Metric allows users to aggregate new metrics that can generate new assessment values. • http://sieve.wbsg.de/ Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Evaluation Correctness Completeness 34
  • 49. 3. The Crux Of The Matter Knowledge Cleaning • The goal of knowledge cleaning is to improve the correctness of a knowledge graph • Major objectives • error detection and • error correction of ● wrong instance assertions ● wrong property value assertions ● wrong equality assertions Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Error Detection Error Correction 35
  • 50. 3. The Crux Of The Matter Knowledge Cleaning Tbox Abox Knowledge Curation Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Error Detection Error Correction 36
  • 51. 3. The Crux Of The Matter Knowledge Cleaning Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Error Detection Error Correction What Verification Validation Semantic Annotations check schema conformance and integrity constraints compare with web resource Knowledge Graphs check schema conformance and integrity constraints compare with "real" world 37
  • 52. 3. The Crux Of The Matter Knowledge Cleaning Error correction of wrong instance assertions isElementOf (i1,i2): • i is not a proper instance identifier: Delete assertion or correct i • t is not an existing type name: Delete assertion or correct t • The instance assertion is (semantically) wrong: • Delete assertion or find proper t • and do NOT: find a proper i (would neither scale nor making sense) Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Error Detection Error Correction 38
  • 53. 3. The Crux Of The Matter Knowledge Cleaning Error correction of wrong property value assertions: p(i1,i2): • p is not a proper property name: Delete assertion or correct p • i1 is not a proper instance identifier: Delete assertion or correct i1 • i1 is not in any domain of p: Delete assertion or add assertion isElementOf(i1,t) with t is a domain of p. • i2 is not a proper instance identifier: Delete assertion or correct i2 • i2 is not in the range of p for any domain of i1: • Delete assertion or • add a proper isElementOf assertion for i1 that adds a domain for which i2 is an instance of the range of the property or • add a proper isElementOf assertion for i2 that turns it into an instance of a range of the property applied to a domain of p where i1 is an element. • The property assertion is (semantically) wrong: delete assertion or correct it. In this case, you should most likely define proper i2, or search for better p, or search for better i1. Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Error Detection Error Correction 39
  • 54. 3. The Crux Of The Matter Knowledge Cleaning Error correction of wrong equality assertions: isSameAs(i1,i2): • i1 is not a proper instance identifier: Delete assertion or correct i1 • i2 is not a proper instance identifier: Delete assertion or correct i2 • The identity assertion is (semantically) wrong: Delete assertion or replace it by a skos operator1. 1 which however does not come with operational semantics. Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Error Detection Error Correction 40
  • 55. 3. The Crux Of The Matter Knowledge Cleaning Methods &Tools: • HoloClean ● Use of integrity constraints, ● external data, ● quantitative statistics. ● Steps • separate entry datasets into noisy and clean dataset • assign uncertainty score over the value of noisy datasets • compute marginal probability for each value to be repaired. Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Error Detection Error Correction 41
  • 56. 3. The Crux Of The Matter Knowledge Cleaning Methods &Tools: • HoloClean ● use of integrity constraints, ● external data, and ● quantitative statistics. ● Steps • separate entry datasets into noisy and clean dataset • assign uncertainty score over the value of noisy datasets • compute marginal probability for each value to be repaired • SDValidate ● uses statistical distribution functions ● three steps: • compute relative predicate frequency for each statement • each statement selected in first step -> assign score of confidence • apply threshold of confidence. • Similar steps for instance assertions. Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Error Detection Error Correction 42
  • 57. 3. The Crux Of The Matter Knowledge Cleaning Methods & Tools: • The LOD Laundromat [Beek et al., 2014] ● cleans Linked Open Data ● takes SPARQL endpoint/archived dataset as entry dataset ● guesses the serialisation format ● identifies syntax errors using a library while parsing RDF ● saves RDF data in canonical format [Beek et al., 2014] W. Beek, L. Rietveld, H. R. Bazoobandi, J. Wielemaker, and S. Schlobach: LOD Laundromat: A Uniform Way of Publishing Other People’s Dirty Data. In Proceedings of the 13th International Semantic Web Conference (ISWC2014), Springer, LNCS 8796, Riva del Garda, Italy, October 19-23, 2014. Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Error Detection Error Correction 43
  • 58. 3. The Crux Of The Matter Knowledge Cleaning Methods & Tools: • The LOD Laundromat [Beek et al., 2014] ● cleans Linked Open Data ● takes SPARQL endpoint/archived dataset as entry dataset ● guesses the serialisation format ● identifies syntax errors using a library while parsing RDF ● saves RDF data in canonical format • KATARA [Chu et al., 2015] ● identifies correct & incorrect data ● generates possible corrections for wrong data [Chu et al., 2015] X. Chu, J. Morcos, I. F. Ilyas, M. Ouzzani, P. Papotti, N. Tang, and Y. Ye: KATARA: reliable data cleaning with knowledge bases and crowdsourcing. In Proceedings of the 41st International Conference on Very Large Data Bases (PVLDB2015), VLDB Endowment, 8(12), 2015. Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Error Detection Error Correction 44
  • 59. 3. The Crux Of The Matter Knowledge Cleaning Methods & Tools: • The LOD Laundromat [Beek et al., 2014] ● cleans Linked Open Data ● takes SPARQL endpoint/archived dataset as entry dataset ● guesses the serialisation format ● identifies syntax errors using a library while parsing RDF ● saves RDF data in canonical format • KATARA [Chu et al., 2015] ● identifies correct & incorrect data ● generates possible corrections for wrong data • SPIN [Fürber et al., 2010b] ● SPARQL Constraint Language ● generates SPARQL Query templates based on data quality problems • inconsistency • lack of comprehensibility • heterogeneity • Redundancy • Nowadays, SPIN has turned into SHACL, a language for validating RDF graphs. [Fürber & Hepp, 2010b] C. Fürber and M. Hepp: Using semantic web resources for data quality management. In Proceedings of the 17th International Conference on Knowledge Engineering and Management by the Masses (EKAW2010), Springer, LNCS 6317, Lisbon, Portugal, October 11-15, 2010. Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Error Detection Error Correction 45
  • 60. 3. The Crux Of The Matter Knowledge Enrichment • The goal of knowledge enrichment is to improve the completeness of a knowledge graph by adding new statements • The process of Knowledge Enrichment has four phases: • New Knowledge Source detection • New Knowledge Source integration • Duplicate detection and alignment • Property-Value-Statements correction Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Knowledge Source detection Knowledge Source integration Duplicate detection Property-Value- Statements correction 46
  • 61. 3. The Crux Of The Matter Knowledge Enrichment • Knowledge Source detection • search for additional sources of assertions for the KG • Open sources • Closed sources • Knowledge Source integration • Tbox: define mappings • Abox: integrate new assertions into the KG • Identifying and resolving duplicates • Invalid property statements such as domain/range violations and having multiple values for a unique property • also known in the data quality literature as contradicting or uncertain attribute value resolution. Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Knowledge Source detection Knowledge Source integration Duplicate detection Property-Value- Statements correction 47
  • 62. 3. The Crux Of The Matter Knowledge Enrichment Duplicate detection: https://www.cs.umd.edu/~getoor/Tutorials/ER_VLDB2012.pdf Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Knowledge Source detection Knowledge Source integration Duplicate detection Property-Value- Statements correction 48
  • 63. 3. The Crux Of The Matter Knowledge Enrichment Methods and tools for duplicate detection and resolution: • Silk is a framework for achieving entity linking. • It tackles three tasks: 1. link discovery that defines similarity metrics to calculate a total similarity value for a pair of entities 2. evaluation of the correctness and completeness of generated links, and 3. a protocol for maintaining data that allows source dataset and target dataset to exchange generated link sets. http://silkframework.org/ Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Knowledge Source detection Knowledge Source integration Duplicate detection Property-Value- Statements correction 49
  • 64. 3. The Crux Of The Matter Knowledge Enrichment Methods and tools for duplicate detection and resolution: • Legato [Achichi et al., 2017] is a linking tool based on indexing techniques. • It implements the following steps: 1. data cleaning that filters properties from two input datasets. For example, properties that do not help the comparison. 2. Instance profiling that creates instance profiles based on Concise Bounded Description for the source. 3. Pre-matching that applies indexing techniques (it takes TF-IDF values), filters such as tokenization and stop-words removal, and cosine similarity to preselect the entity links. 4. Link repairing that validates each link produced by searching for a link to a target source. [Achichi et al., 2017] M. Achichi, Z. Bellahsene, and K. Todorov: Lgato results for OAEI 2017. In Proceedings of the 12th International Workshop on Ontology Matching (OM2017) co-located with the 16th International Semantic Web Conference (ISWC2017). CEUR Workshops, vol. 2032, Vienna, Austria, October 21, 2017. Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Knowledge Source detection Knowledge Source integration Duplicate detection Property-Value- Statements correction 50
  • 65. 3. The Crux Of The Matter Knowledge Enrichment Methods and tools for duplicate detection and resolution: • SERIMI [Araujo et al., 2011] tries to match instances between two datasets. • It has three steps: • property selection, allows users to select relevant properties from source dataset, • the selection of candidates from a target dataset, uses string matching of properties to select a set of candidates, and • the disambiguation of candidates, measures the similarity for each candidate applying a contrast model, which returns a degree of confidence. • ADEL, Duke, Dedupe, LIMES, ... [Araujo et al., 2011] S. Araujo, J. Hidders, D. Schwabe, and A. P. de Vries: SERIMI - Resource Description Similarity, RDF Instance Matching and Interlinking. In Proceedings of the 6th International Workshop on Ontology Matching (OM2011), CEUR Workshop, vol. 814, Bonn, Germany, October 24, 2011. Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Knowledge Source detection Knowledge Source integration Duplicate detection Property-Value- Statements correction 51
  • 66. 3. The Crux Of The Matter Knowledge Enrichment Property-Value-Statements correction: • KnoFuss allows data fusion using different methods. • The workflow of KnoFuss is as follows: 1. It receives a dataset to be integrated into the target dataset, 2. It performs co-referencing using a similarity method, detects conflicts utilizing ontological constraints, and resolve inconsistencies 3. It produces a dataset to be integrated into the target dataset. • http://technologies.kmi.open.ac.uk/knofuss/ Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Knowledge Source detection Knowledge Source integration Duplicate detection Property-Value- Statements correction 52
  • 67. 3. The Crux Of The Matter Knowledge Enrichment Property-Value-Statements correction: • ODCleanStore [Michelfeit & Necaský, 2012] is a framework for cleaning, linking, quality assessment, and fusing RDF data. • The fusion module allows users to configure conflict resolution strategies based on provenance and quality metadata. e.g. : 1. an arbitrary value, ANY, MIN, MAX, SHORTEST or LONGEST is selected from the conflicting values, 2. computes AVG, MEDIAN, CONCAT of conflicting values, 3. the value with the highest (BEST) aggregate quality is selected, 4. the value with the newest (LATEST) time is selected, and 5. ALL input values are preserved. Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Knowledge Source detection Knowledge Source integration Duplicate detection Property-Value- Statements correction 53
  • 68. 3. The Crux Of The Matter Knowledge Enrichment Property-Value-Statements correction: • Sieve [Mendes et al., 2012], is a framework that consists of two modules; a Quality assessment module and a Data Fusion module. • The Data Fusion module describes various fusion policies that are applied for fusing conflicting values. • FAG, FuSem, MumMer, … Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement Knowledge Source detection Knowledge Source integration Duplicate detection Property-Value- Statements correction 54
  • 69. 3. The Crux Of The Matter Knowledge Deployment • Building, implementing, and curating Knowledge Graphs is a time- consuming and costly activity. • Integrating large amounts of facts from heterogeneous information sources does not come for free. • [Paulheim, 2018b] estimates the average cost for one fact in a Knowledge Graph between $0,1 and $6 depending on the amount of mechanization. [Paulheim, 2018b] H. Paulheim: How much is a Triple? Estimating the Cost of Knowledge Graph Creation. In ISWC-P&D- Industry-BlueSky 2018: Proceedings of the ISWC 2018 Posters & Demonstrations, Industry and Blue Sky Ideas Tracks co- located with 17th International Semantic Web Conference (ISWC 2018) Monterey, USA, October 8-12, 2018. http://www. heikopaulheim.com/docs/iswc_bluesky_cost2018.pdf Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment 55
  • 70. 3. The Crux Of The Matter Knowledge Deployment Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment Name Instances Facts Types Relations DBpedia (English) 4,806,150 176,043,129 735 2,813 YAGO 4,595,906 25,946,870 488,469 77 Freebase 49,947,845 3,041,722,635 26,507 37,781 Wikidata 15,602,060 65,993,797 23,157 1,673 NELL 2,006,896 432,845 285 425 OpenCyc 118,499 2,413,894 45,153 18,526 Google´s Knowledge Graph 570,000,000 18,000,000,000 1,500 35,000 Google´s Knowledge Vault 45,000,000 271,000,000 1,100 4,469 Yahoo! Knowledge Graph 3,443,743 1,391,054,990 250 800 56
  • 71. 3. The Crux Of The Matter Knowledge Deployment • We build a knowledge access layer on top of the Knowledge Graph helping to connect this resource to applications. • Knowledge management technology: • based on graph‐based repositories host the Knowledge Graph (as a semantic data lake). • The knowledge management layer is responsible for storing, managing and providing semantic description of resources • Inference engines (SemBase) based on deductive reasoning engines: • implements agents that defines view on this graph together with context data on user requests. • It accesses the graph to gain data for its reasoning that provides input to the dialogue engine interacting with the human user. • Reasons: • Help to implement access rights, bypass inconsistencies and frillions • Integrates additional information sources from the application (context, personalization, task etc.) Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment 57
  • 72. 3. The Crux Of The Matter Knowledge Deployment Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment Input MongoDB Semantify.it editing crawling mapping Storage GraphDB Hosting the Knowledge Graph Output Views Reasoning Agent Reasoning Agent Reasoning Agent 58
  • 73. 3. The Crux Of The Matter Knowledge Deployment Knowledge Graph Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment Knowledge Infrastructure Generic Application Layer Conversational Interfaces 59
  • 74. 4. The Proof Of The pudding Is In The Eating Onlim • The pioneer in automating customer communication via AI chatbots and conversational interfaces • Enterprise solutions for making data and knowledge available for conversational interfaces • Team of 25+ highly experienced AI experts, specialists in semantics and data science • Spin-off of University of Innsbruck • HQ in Europe (Vienna, Telfs) Current FocusVerticals 60 UtilitiesTourismRetail Education Financial Services
  • 75. 4. The Proof Of The pudding Is In The Eating Onlim 61
  • 76. 4. The Proof Of The pudding Is In The Eating • The Chatbot market is expected to grow from its current market value (2018) of more than $250 million to over $1.34 billion by 2024. • The growth is due to the evolving usage of chatbots for content marketing activities such as digital marketing and advertising. • With the rise of Artificial Intelligence (AI) and conversational user interfaces, we are increasingly likely to interact with a bot than ever before. • Businesses are following customers onto messaging platforms. 90% of businesses use Facebook to respond to service requests. • But also the transfer from social towards conversational interfaces is impressing. Bots on Facebook messenger can tremendously help businesses in dealing with that issue. • https://www.sdcexec.com/software-technology/news/21011880/chatbot-market-to-grow-at-31-percent-cagr-from-2018-to-2024 • https://www.gartner.com/smarterwithgartner/gartner-predicts-a-virtual-world-of-exponential-change/ • https://www.businessinsider.in/tech/data-a-massive-hidden-shift-is-driving-companies-to-use-a-i-bots-inside-facebook-messenger/slidelist/52240155.cms 62
  • 77. 4. The Proof Of The pudding Is In The Eating • In 2017, 20 % of the web searches were conducted via voice assistants. • Artificial intelligence-based voice assistance (AI-voice) will soon be a primary user interface for all digital devices – including smartphones, smart speakers, personal computers, automobiles, and home appliances. • As of mid-January 2019, more than 1 billion devices worldwide were equipped with Google’s AI-voice Assistant, and another hundred million devices spoke with Amazon’s Alexa – and neither number accounts for devices equipped with voice assistants from Apple, Microsoft, Samsung, or across the digital worlds of China and Asia. • Juniper Research forecasts the global market for voice assistants to grow at a 25.4 percent CAGR over the next five years, with 8 billion active voice assistants (across all platforms and devices) by 2023. https://voicebot.ai/2019/01/07/google-assistant-to-be-available-on-1-billion-devices-this-month-10x-more-than-alexa/ https://www.juniperresearch.com/press/press-releases/digital-voice-assistants-in-use-to-triple 63
  • 78. 4. The Proof Of The pudding Is In The Eating • Chatbots and Voice Assistants have started to play an increasing role in customer communication for many business in various verticals. • Especially in tourism they are proving more and more benefits in terms of convenience, availability, and fast access to information delivery and customer support through the entire customer journey. • In the dreaming and planning phase hotels and Destination Management Organizations (DMOs) can provide information through Chatbots and Voice Assistants about the hotel and/or the region, the surroundings, and weather conditions to potential guests. • In the booking phase, from booking the hotel and transport to buying connected services, e.g. ski tickets, all becomes much simpler and efficient by using natural language. • Finally in the experiences phase, Chatbots and Voice Assistants can also announce special offers or events. All requested information and processes are available 24/7/365 and instantly. For hotels guests in particular, the stay experience can be enriched by providing them access to hotel services and beyond. 64
  • 79. 4. The Proof Of The pudding Is In The Eating • ATouristic Knowledge Graph integrates and connects data from several sources including: • touristic data sources: • open data sources: • It includes entities of the following types: • LocalBusiness • POIs, Infrastructure • SportsActivityLocations (e.g.Trails, SkiResorts) • Events • Offers • WebCams • Mobility andTransport 65
  • 80. 4. The Proof Of The pudding Is In The Eating SkiRouteCableCar Slope SkiResort Touristic Knowledge Graph excerpt SkiResort, Lifts, Slopes, WebCams ChairLift WebCam Data Visualisation (based on GraphDB) containedInPlace SkiLift TBar SnowReport subClassOf containedInPlace 66
  • 81. 4. The Proof Of The pudding Is In The Eating The Touristic KG is used to answer questions such as: • “Where can I have a traditional Tyrolean food when going cross country skiing?” • “Show me WebCams near Kölner Haus” • “How many people are leaving in Serfaus?” 67
  • 82. 4. The Proof Of The pudding Is In The Eating The Dach-KG working group • develops a de facto standard for semantic annotation of touristic content, data, and services in the DACH area • based on schema.org and its adaptation by domain specifications • it should become the backbone of an open 5* Knowledge Graph for touristic data in DACH *) The dataset gets awarded one star if the data are provided under an open license. **) Two stars, if the data are available as structured data. ***) Three stars, if the data are also available in a non-proprietary format. ****) Four stars if URIs are used, that the data can be referenced and *****) five stars, if the data set are linked to other data sets that can provide context. https://www.tourismuszukunft.de/2019/05/dach-kg-neue-ergebnisse-naechste-schritte-beim-thema-open-data/ 68
  • 83. 4. The Proof Of The pudding Is In The Eating Members of the Dach-KG working group • Touristic experts from the DACH-region (Germany (D), Austria (A), Switzerland (CH)) and Italy (South-Tyrol) • the Austrain and German touristic associations, • LTOs (Tirol, Vorarlberg, Wien, Brandenburg, Thüringen, …) • Associated: DMOs (Mayrhofen, Seefeld, …) • STI Innsbruck and STI International • Planned is an extension by technology providers (Datacycle, Feratel, Hubermedia, infomax, LandinSicht, Onlim, Outdooractive, TSO, ...) 69
  • 84. 4. The Proof Of The pudding Is In The Eating We build the Tyrol Knowledge Grapgh (TKG) as a nucleus for this innitiative • It is a five star linked open data set published in GraphDB providing a SPARQL endpoint for the provisioning of touristic data of Tyrol, Austria. • The TKG currently contains data about touristic infrastructure like accommodation businesses, restaurants, points of interests, events, recipes, et. The data of the TKG fall under three categories of data: • Static data is information which is rarely changing like the address of a hotel. • Dynamic data is fast changing information, like availabilities and prices. • Active data describe actions that can be executed, for example, the description of a purchase- or reservation. • At November 25, 2018, the TKG contained around 5 billion statements, of which 55% are explicit and 45% are inferred. Every day the Knowledge Graph grows by around 8 million statements. • http://graphdb.sti2.at:8080/ 70
  • 85. 4. The Proof Of The pudding Is In The Eating There is a world beyond leasurement: 71 UtilitiesTourismRetail Financial ServicesEducation
  • 86. 5. Key Takeaway Our aim: • Establish a maximally automated knowledge lifecycle: NLU training, Query generation, Querying and representing world knowledge, as well as Natural Language Generation • Automatically distribute knowledge into all available channels • Core are methodologies, methods, and tools to generate, host, curate, deploy, and access Knowledge Graphs containing frillions of statements from heterogeneous, distributed, and dynamic sources. amazon.com Knowledge Graph ©google.com ©slack.com ©facebook.com ... 72
  • 87. 73

Editor's Notes

  1. Understand information needs and goals of the users (Natural Language Understanding) Design intents train NLU (scaling), especially entity detection Mapping Intent & Parameters to create a query for accessing the KG Querying the Knowledge Graph Defining views KG integrates large volumes of heterogenious, distributed, dynamic, and potentially inconsistent statements Natural Language Generation to present the result to the user
  2. If it would work, we would not need it.
  3. Sub graph consistent Data Lake
  4. [Acosta et al., 2013] M. Acosta, A. Zaveri, E. Simperl, D. Kontokostas, S. Auer, and J. Lehmann: Crowdsourcing linked data quality assessment. In Proceedings of the 12th International Semantic Web Conference (ISWC2013), Springer, LNCS 8219, Sydney, Australia, October 21-25, 2013. [Fürber & Hepp, 2010a] C. Fürber and M. Hepp: Using SPARQL and SPIN for data quality management on the semantic web. In Proceedings of the 13th International Conference on Business Information Systems (BIS2010), Springer, LNBIP 47, Berlin, Germany, May 3-5, 2010. [Fürber & Hepp, 2010b] C. Fürber and M. Hepp: Using semantic web resources for data quality management. In Proceedings of the 17th International Conference on Knowledge Engineering and Management by the Masses (EKAW2010), Springer, LNCS 6317, Lisbon, Portugal, October 11-15, 2010. [Kontokostas et al., 2014] D. Kontokostas, P. Westphal, S. Auer, S. Hellmann, J. Lehmann, R. Cornelissen, and A. Zaveri: Test-driven evaluation of linked data quality. In Proceedings of the 23rd International Conference on World Wide Web (WWW '14), ACM, Seoul, Korea, April 07-11, 2014. [Paulheim & Bizer, 2014] H. Paulheim and C. Bizer: Improving the Quality of Linked Data Using Statistical Distributions, International Journal on Semantic Web and Information Systems (IJSWIS), 10(2):63-86, 2014. [Pipino et al., 2002] L. L. Pipino, Y. W. Lee, and R. Y. Wang: Data Quality Assessment, Communications of the ACM, 45(4), 2002. [Wang, 1998] R. Y. Wang: A Product Perspective on Total Data Quality Management, Communication of the ACM, 4(2), 1998. [Zaveri et al., 2013] A. Zaveri, D. Kontokostas, M. A. Sherif, L. Bühmann, M. Morsey, S. Auer, and J. Lehmann: User-driven quality evaluation of DBpedia. In Proceedings of the 9th International Conference on Semantic Systems (I-SEMANTICS '13), ACM, Graz, Austria, September 04 - 06, 2013.
  5. LINK-QA C. Guéret, P.T. Groth, C. Stadler, and J. Lehmann: Assessing linked data mappings using network measures. In Proceedings of the 9th Extended Semantic Web Conference: Research and Applications (ESWC2012), Springer, LNCS 7295, Heraklion, Greece, May 27-31, 2012. Luzzu (A Quality Assessment Framework for Linked Open Datasets) J. Debattista, S. Auer and C. Lange: Luzzu: A Methodology and Framework for Linked Data Quality Assessment, Journal of Data and Information Quality (JDIQ), 8(1), 2016. Sieve P. N. Mendes, H. Mühleisen, and C. Bizer: Sieve: Linked Data Quality Assessment and Fusion. In Proceedings of the Second International Workshop on Linked Web Data Management (LWDM 2012), in conjunction EDBT2012, Berlin, Germany, March 30, 2012. SWIQA (Semantic Web Information Quality Assessment Framework) C. Fürber and M. Hepp: SWIQA - a semantic web information quality assessment framework. In Proceedings of the 19th European Conference on Information Systems (ECIS2011), Association for Information Systems Electronic Library, ECIS 76, Helsinki, Finland, June 9-11, 2011. https://aisel.aisnet.org/ecis2011/76 Validata J.B. Hansen, A. Beveridge, R. Farmer, L. Gehrmann, A.J.G. Gray, S. Khutan, T. Robertson, and J. Val: Validata: An online tool for testing RDF data conformance. In Proceedings of the 8th International Conference on Semantic Web Applications and Tools for Life Sciences (SWAT4LS2015), CEUR Workshop Proceedings, vol. 1546, Cambridge, UK, December 7-10, 2015.
  6. SCORING FUNCTION TimeCloseness: measures the distance from the input date (obtained from the input metadata through a path expression) to the current (system) date. Dates outside the range (informed in number of days) receive value 0, and dates that are more recent receive values closer to 1. Preference: assigns decreasing, uniformly distributed, real values to each graph URI provided as space-separated list. SetMembership: assigns 1 if the value of the indicator provided as input belongs to the set informed as parameter, 0 otherwise. Threshold: assigns 1 if the value of the indicator provided as input is higher than a threshold informed as parameter, 0 otherwise. IntervalMembership: assigns 1 if the value of the indicator provided as input is within the interval informed as parameter, 0 otherwise. [Mendes et al., 2012]P. N. Mendes, H. Mühleisen, and C. Bizer: Sieve: Linked Data Quality Assessment and Fusion. In Proceedings of the Second International Workshop on Linked Web Data Management (LWDM 2012), in conjunction EDBT2012, Berlin, Germany, March 30, 2012.
  7. HoloClean [Rekatsinas et al., 2017] uses various approaches such as integrity constraints, external data, and quantitative statistics, to detect errors. HoloClean’s workflow follows three steps: First, HoloClean takes a dataset, along with a set of methods (such as denial constraints, outlier detection, external dictionaries or labeled data) for detecting erroneous data. It separates entry datasets into a noisy and clean dataset. Second, HoloClean assigns an uncertainty score over the value of noisy datasets, which is based on a probabilistic model generated using DDlog program. Third, HoloCLean computes a marginal probability for each value to be repaired, which means the confident about this repair. [Rekatsinas et al., 2017] T. Rekatsinas, X. Chu, I. F. Ilyas, and C. Ré: HoloClean: Holistic data repairs with probabilistic inference. In Proceedings of the Very Large Data Bases Endowment (PVLDB), VLDB Endowment,10(11), 2017.
  8. SDValidate [Paulheim & Bizer, 2014] uses statistical distributions to assess (assigning a confidence score to) the correctness of statements. It involves three main steps: First, it computes the relative predicate (predicate/object combination) frequency for each statement. For example, statements with a low frequency are selected for a detailed analysis. Second, for each statement selected in the first step SDValidate uses the statistical distributions of properties and types (predicate’s subject/object combination) to assign a score of confidence to each statement. Third, SDValidate applies a threshold of confidence above which statements are considered to be true. Similarly, there exist SDType which applies statistical distributions for detecting type assertion errors. [Paulheim & Bizer, 2014] H. Paulheim and C. Bizer: Improving the Quality of Linked Data Using Statistical Distributions, International Journal on Semantic Web and Information Systems (IJSWIS), 10(2):63-86, 2014.
  9. KATARA identifies correct and incorrect data and generates possible corrections for wrong data. Basically, KATARA involves three steps. First, KATARA allows the user to select the target data table and the trusted knowledge base. Second, KATARA identifies the pattern (coherence of types and relationships) of the target data in the trusted knowledge base, and the user validates the pattern. Third, KATARA annotates each value and tuple (pair of values) as correct if they have the type and relations in the trusted knowledge base respectively, or contrary as incorrect.
  10. For example, missing datatype properties, functional dependency violations, mistyping errors, unique value violation.
  11. [Volz et al., 2009] J. Volz, C. Bizer, M. Gaedke, and G. Kobilarov: Discovering and Maintaining Links on the Web of Data. In Proceedings of the 8th International Semantic Web Conference (ISWC 2009), Washington, DC, Springer, LNCS 5823, October 25-29, 2009.
  12. [Nikolov et al., 2008] A. Nikolov, V. Uren, E. Motta, and A. de Roeck: KnoFuss: Integration of Semantically Annotated Data by the KnoFuss Architecture. In Proceedings of the 16th International Conference on Knowledge Engineering and Knowledge Management (EKAW2008), Springer, LNCS 5268, Acitrezza, Italy, September 29 - October 2, 2008.
  13. [Michelfeit & Necaský, 2012] J. Michelfeit and M. Necaský: Linked open data aggregation: Conflict resolution and aggregate quality. In Proceedings of the 36th Annual IEEE Computer Software and Applications Conference Workshops (COMPSAC2012), IEEE, Izmir, Turkey, July 16-20, 2012.
  14. Fusion describes the name and description of a data fusion policy. e.g. name="Fusion strategy for DBpedia City Entities". Class defines a subset of the input that belongs to a given class, e.g. Class name="dbpedia:City". Property defines a property where a FusionFunction is applied. e.g. Property name="dbpedia:areaTotal" FusionFunction specifies the FusionFunction class used to fuse for a given property. e.g. FusionFunction class="KeepValueWithHighestScore" metric="sieve:lastUpdated". P. N. Mendes, H. Mühleisen, and C. Bizer: Sieve: Linked Data Quality Assessment and Fusion. In Proceedings of the Second International Workshop on Linked Web Data Management (LWDM 2012), in conjunction EDBT2012, Berlin, Germany, March 30, 2012.