4. 4
Onlim Overview
• The pioneer in automating customer
communication via AI chatbots and conversational
interfaces
• Enterprise solutions for making data and knowledge
available for conversational interfaces
• Team of 25+ highly experienced AI experts,
specialists in semantics and data science
• Spin-off of University of Innsbruck
• HQ in Europe (Vienna, Telfs)
Current Focus Verticals
Onlim & STI Innsbruck
STI Innsbruck Overview
• Research group at the University of Innsbruck in the
Austrian state of Tyrol
• Engaged in research and development to bring
information and communication technologies of the
future into today's world.
• Team of 20+ Semantic experts
• Main research areas: Ontologies,Semantic Web,
Knowledge Graphs
• More details at: https://www.sti-innsbruck.a
5. 5
Challenge - Automation
User
understand
Intent
+
Parameters
map Query
query
Knowledge
Graph
1. Understand information needs and goals of the
users (Natural Language Understanding)
a. Design intents
b. train NLU (scaling)
c. entity detection
2. Mapping Intent & Parameters to create a query
for accessing the KG
3. Knowledge Graph
a. Integrate large volumes of heterogenious,
distributed, potentially inconsistent
statements
4. Natural Language Generation (NLG) to present
the result to the user
1.
2.
3.
NLG1
4.
1 Natural Language Generation
6. 6
NLU
• Voice/Text recognition already good
• However require significant manual labour
Manual work
• Design intents based on schema of Knowledge Graph
• Define utterances (example questions) per intent
• Mark parameters that should be extracted from utterances
Automation
• Entity detection
• Push entities from Knowledge Graph
• Detect unanswered questions
• Use Knowledge Graph to update/extend NLU
• create utterances
• supervised-learning
• extend utterances with unanswered questions
1. Natural Language Understanding
NLU Knowledge
User
understand
Intent
+
Parameters
map Query
query
Knowledge
1.
2.
3.
NLG1
4.
7. 7
• Basis: detected intent & extracted parameters during NLU
• Map extracted information (intent & parameters) on predefined rules
• Query: Combination of rules
• Additional restriction rules
• Define a view on a relevant subgraph of the Knowledge Graphs
A chatbots may not have access to the whole Knowledge Graph (prevent trillions,
inconsistencies, and access right restrictions)
2. Query generation
User
understand
Intent
+
Parameters
map Query
query
Knowledge
1.
2.
3.
NLG1
4.
Generated query
Intent
(with parameters)
Query
generation
Predefined rules
8. 8
• Query is combination of predefined rules
• Generate rules out of Knowledge Graph
• manual
• based on used ontology (schema.org)
• rules for all possible queries
• semi-automatically
• propose rules with the help of the Knowledge Graphs
• minimize manual adaptions
3. Rule generation
User
understand
Intent
+
Parameters
map Query
query
Knowledge
1.
2.
3.
NLG1
4.
9. 9
4. Natural Language Generation
User
understand
Intent
+
Parameters
map Query
query
Knowledge
1.
2.
3.
NLG1
4.
Manual work
• Define templates based on
• structure of data
• information that should be given to the user
Automatic
• Generate
• templates out of the Knowledge Graph
• textual answers from the Knowledge Graph
• follow up questions to run dialogs
10. 10
The quality of the Intelligent Assistants depends directly on the quality of the Knowledge Graph
Problem: “Garbage in Garbage out”
Requirements for the Knowledge Graph:
• well structured (using an ontology - schema.org)
• homogenous structure/models
• accurate information (correctness)
• large and detailed coverage (completeness)
Knowledge Graph
=> “Knowledge Graph Lifecycle”
13. 13
schema.org
• to annotate websites
• as ontology for the KG (local properties (IC versus EI)
Knowledge Creation: Methods and Tools
Knowledge Creation
• manual
• using Annotation Editor
• based on Domain Specification that
restrict and extend schema.org
• semi-automatic
• Annotation Editor suggests
mappings/extracted information
• e.g. extract information from web
pages (by HTML tags)
• Manual adaptions needed
• mapping
• integrate large & fast changing data sets
• map different formats to ontology used in
Knowledge Graph
• automatic
• natural language processing (NLP)
• machine learning (ML)
• extract knowledge from text
representations & web pages
• named entity recognition, concept
mining, text mining, ...
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
14. 14
Knowledge Creation - Tools & Libraries
• GATE (text analysis & language processing)
• OpenNLP (supports most common NLP tasks)
• RapidMine (data preparation, machine learning, deep learning, text mining, predictive analysis)
15. 15
Knowledge Hosting
Annotation - Tool
(e.g. semantify.it)
Documentstore
(e.g. MongoDB)
Graph database
(e.g. GraphDB)
edit
data
crawl
data
map
data
Hosting the Knowledge Graph
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
Hosting semabtic web annotations
16. 16
Evaluation
• Overall process to determine the quality of a KG
• Select quality dimensions and metrices
• Evaluate representative subsets
Correctness
• Identify the amount of wrong assertions
Completeness
• Identify the amount of missing assertions
Knowledge Curation - Assessment
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement
Evaluation Correctness Completeness
17. 17
Methods
• Semi-automatic based on data integrity constraints
• user-driven assessment
• test-driven assessment
• manual assessment based on crowd’s experts
• statistical distribution
• SPARQL queries
• identification of functional dependency violations & missing values
Knowledge Curation - Assesment
18. 18
Knowledge Curation - Assesment
Tools
• WIQA (Web Information Quality Assessment Framework)
• filtering policies to evaluate information quality
• SWIQA (Semantic Web Information Quality Assessment Framework)
• data quality rules & quality scores for identifying wrong data
• LINK-QA
• using network metrics
• Sieve
• flexibly expressing quality assessment methods
• fusion methods
• Validata
• online tool for testing/validating RDF data against ShEx-schemas
• Luzzu (Linked Open Datasets)
• thirty data quality metrics based on Dataset Quality Ontology
19. 19
• improve correctness
• major objectives
• error detection
● wrong instance assertions: isElementOf(i, t)
● wrong property value assertions: p(i1, i2)
● wrong equality assertions: isSameAs(i1, i2)
• error correction
• of wrong instance assertion: isElementOf(i,t):
• i is not a proper instance identifier: Delete assertion or correct i
• t is not an existing type name: Delete assertion or correct t
• The instance assertion is (semantically) wrong: Delete assertion or find proper t, and do NOT:
find a proper i (would neither scale nor making sense)
Knowledge Curation - Cleaning
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement
Error Detection Error Correction
20. 20
• Error correction of wrong property value assertions: p(i1,i2):
• p is not a proper property name: Delete assertion or correct p
• i1 is not a proper instance identifier: Delete assertion or correct i1
• i1 is not in any domain of p: Delete assertion or add assertion
isElementOf(i1,t) with t is a domain of p.
• i2 is not a proper instance identifier: Delete assertion or correct i2
• i2 is not in the range of p for any domain of i1: Delete assertion or add a proper isElementOf assertion for i1
that adds a domain for which i2 is an instance of the range of the property or add a proper isElementOf
assertion for i2 that turns it into an instance of a range of the property applied to a domain of p where i1 is an
element.
• The property assertion is (semantically) wrong: delete assertion or correct it. In this case, you should most
likely define proper i2, or search for better p, or search for better i1.
• Correction of wrong equality assertions: isSameAs(i1,i2):
• i1 is not a proper instance identifier: Delete assertion or correct i1
• i2 is not a proper instance identifier: Delete assertion or correct i2
• The identity assertion is (semantically) wrong: Delete assertion or replace it by a skos operator, which
however does not come with operational semantics. The informed reader may here recognize the implicit
usage of the closed versus open world assumption.
Knowledge Curation - Cleaning
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement
Error Detection Error Correction
21. 21
Knowledge Curation - Cleaning
• Methods
• Instance assertion
● statistical distribution of types and properties
● disjointness of axioms
● supervised machine learning
● entity type dictionaries
● association rule mining
• Property value assertion
● statistical distribution
● ontology reasoners
● Wikipedia pages
● outlier detection
• Equality assertion
● outlier detection
● constraints
● logical validation
● local context of instances
22. 22
• Tools
• HoloClean
● integrity constraints
● external data
● quantitative statistics
● Steps
• separate entry datasets into noisy and clean dataset
• assign uncertainty score over the value of noisy datasets
• compute marginal probability for each value to be repaired
• SDValidate
● statistical distributions
● three steps
• compute relative predicate frequency for each statement
• each statement selected in first step -> assign score of confidence
• apply threshold of confidence
Knowledge Curation - Cleaning
23. 23
Knowledge Curation - Cleaning
• Tools
• The LOD Laundromat
● cleans Linked Open Data
● takes SPARQL endpoint/archived dataset as entry dataset
● guesses the serialisation format
● identifies syntax errors using a library while parsing RDF
● saves RDF data in canonical format
• KATARA
● identifies correct & incorrect data
● generates possible corrections for wrong data
• SPIN
● SPARQL Constraint Language
● generates SPARQL Query templates based on data quality problems
• inconsistency
• lack of comprehensibility
• heterogeneity
• redundancy
24. 24
Knowledge Source detection
• search for additional assertions to the KG
Knowledge Source integration
• integrate new assertions into the KG
• align new statements with the existing ones
Duplicate detection
• Methods
• string similarity measures
• association rule mining
• topic modelling
• Support Vector Machine
• property-based
• crowd-sourced data
• graph-oriented
• formalizing entity resolution
Property-Value-Statements correction
Knowledge Curation - Enrichment
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
KnowledgeAssessment KnowledgeCleaning Knowledge Enrichement
Knowledge Source detection Knowledge Source integration Duplicate detection
Property-Value-
Statements correction
25. 25
• Tools
• Silk
• achieving entity linking
• SERIMI
• tries to match instances between two datasets
• Legato
• linking tool based on indexing techniques
• …
Property-Value-Statements correction
• Tools
• Sieve
• Quality assessment module
• Data Fusion module (describes fusion policies)
• KnoFuss
• data fusion using different methods
• ODCleanStore
• cleaning, linking, quality assessment and fusing RDF data
• ...
Knowledge Curation - Enrichment
26. 26
Knowledge Deployment
Graph database Hosting the Knowledge Graph
Reasoning
Agent
Views defined via rules
Output
Storage
• connection of user request -> resources
• knowledge management technology (GraphDB)
• Inference engines based on deductive reasoning engines (reasoning agents)
• data accessible through personalized agents
• Restrict: define partial views (via rules)
• Enrich: provide contextual & personalized reasoning on top of these knowledge
Knowledge Graph
Knowledge Creation Knowledge Hosting Knowledge Curation Knowledge Deployment
27. 27
• The Touristic Knowledge Graph integrates and connects data from several sources including:
• touristic data sources:
• open data sources:
• It includes entities of the following types:
• LocalBusiness
• POIs, Infrastructure
• SportsActivityLocations (e.g. Trails, SkiResorts)
• Events
• Offers
• WebCams
• Mobility and Transport
Pilot: Touristic Knowledge Graph for DACH
29. 29
• The Touristic KG is used to answer
questions such as:
• “Where can I have a traditional
Tyrolean food when going cross
country skiing?”
• “Show me WebCams near Kölner
Haus”
• “How many people are leaving in
Serfaus?”
Pilot