Assessing, Creating and Using Knowledge Graph Restrictions

Assessing, Creating and Using
Knowledge Graph Restrictions
Sven Lieber, supervised by Anastasia Dimou and Ruben Verborgh
10.03.2022 - public PhD defense

Knowledge Graph Restrictions
?

? ? ?
?

Crash course about the context
PhD presentation

This PhD is about information processing
Telescope science?
=> Astronomy!
Microscope science?
=> Biology!
Computer science?
=> Information!
“Computer science involves
the study of or the practice of
computation, automation,
and information” - Wikipedia

Physical chaos - hard to find, share or use things

Digital chaos - hard to find, share or use things

“24 hours in photos”, 2011 from Erik Kessels
350k printed images uploaded to Flickr in a single day
Large amount of unconnected data
What is on these two images,
and are they connected somehow?

What is what? We need semantics!
“a separate seat for one person,
typically with a back and four legs.”
- Oxford Languages
“the person in charge of a meeting or
of an organization (used as a neutral
alternative to chairman or
chairwoman)” - Oxford Languages
Please think of “a chair”

Different definitions and understanding about data
Data Silo 1 Data Silo 3
Data Silo 2
A person is alive, has a
first and last name and
has a residence address
A person is real or
fictional
A person is the user of
the app identified via
an Email address
?
?
?
How many persons can we
reach with our marketing
campaign in Ghent?

Data and data modeling using a graph
Person
Sven
Organization
University
Supervisor
Anastasia
Ruben
is subclass
is a
is subclass is subclass
UGent
is a
is enrolled at
knows
knows
is a
is a
A Knowledge Graph
(i) real world entities in a graph structure
(ii) classes and relations in a schema
(iii) linking of arbitrary entities
(iv) covers various topical domains
“Knowledge Graph Refinement: A Survey of
Approaches and Evaluation Methods”, Semantic Web
Journal, 2016, Heiko Paulheim
PhD
student
knows
is enrolled at

Link data in a flexible way
Person
Sven
Organization
University
Supervisor
Anastasia
Ruben
is subclass
is a
UGent
is a
is enrolled at
knows
knows
is a
is a
A Knowledge Graph
PhD
student
knows
is enrolled at

Express the data model in a flexible way
Person
Sven
Organization
University
Supervisor
Anastasia
Ruben
is subclass
is a
UGent
is a
is enrolled at
knows
knows
is a
is a
A Knowledge Graph
PhD
student
knows
is enrolled at

A uniform graph representation
Person
Sven
Organization
University
Supervisor
Anastasia
Ruben
is subclass
is a
UGent
is a
is enrolled at
knows
knows
is a
is a
A Knowledge Graph
PhD
student
knows
is enrolled at

Data integration because of reused definitions of things
Data Silo 2
A person is real or
fictional
an Email address
“A vocabulary defines the concepts
and relationships describing an area
of concern” - World Wide Web
Consortium (W3C)

-> Represent data in a uniform graph structure
PhD presentation

But how can this be used by a computer?
Data Silo 2
A person is real or
fictional
an Email address

Keep the flexible graph representation
in a computer readable text format by using “triples”
Person is a Class .
PhD student is subclass Person .
Supervisor is subclass Person .
University is subclass Organization .
Anastasia is a Supervisor .
Ruben is a Supervisor .
Sven is a PhD Student .
UGent is a University .
Sven is enrolled at UGent .

Reuse the web as global information system
Person is a Class .
http://xmlns.com/foaf/0.1/Person
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://www.w3.org/2000/01/rdf-schema#Class .
foaf:Person rdf:type rdfs:Class .

Reuse the web as global information system
-> reuse of definitions for shared understanding
-> link to existing data
foaf:Person rdf:type rdf:Class .
ex:PhdStudent rdfs:subClassOf foaf:Person .
ex:Supervisor rdfs:subClassOf foaf:Person .
ex:University rdfs:subClassOf foaf:Organization .
data:anastasia rdf:type ex:Supervisor .
data:ruben rdf:type ex:Supervisor .
data:sven rdf:type ex:PhDStudent .
data:ugent rdf:type ex:University .
data:sven ex:enrolledAt data:ugent .
data:sven foaf:givenName “Sven” .
data:sven foaf:familyName “Lieber” .

-> Data in a uniform graph structure
-> Use the web to represent the graph
PhD presentation

This does not seem right to us …
UGent Sven
is enrolled at
wroteBook
Train 123
… but okay for a computer
because we did not restrict possible links

Let’s talk about semantics … again
- Oxford Languages

We can distinguish now between different things
- Oxford Languages

Without restrictions a computer cannot differentiate
Sven
knows
?
Domain and Range axioms: “knows”
connects two instances of class Person
Axioms are “statements that are asserted to be true in the
domain being described” - OWL2 Structural Specification
and Functional-Style Syntax, W3C 2012

Provide formal meaning using axioms which supports
inferring new knowledge
Sven
knows
Domain and Range axioms: “knows”
connects two instances of class Person
Person
is a
is a
new “is a” relationships inferred!

What can be inferred here?
Sven
knows
4
has legs
?
Axiom: something with 4 legs is a chair

Ups, we created a Person-Chair
Sven
knows
4
has legs
Person
Chair
Axiom: something with 4 legs is a chair
is a
is a
is a
new “is a” relationships inferred!

Use constraints to define what is valid
Data shapes express “structural constraints to
validate instance data” - SHACL Use Cases and
Requirements, W3C 2017
Person
Birth date
Last name
First name
For example: persons need a birth
date, last name and first name

Vocabulary -> Ontology
“An Ontology is a formal, explicit
specification of a shared
conceptualization” - Thomas R. Gruber
(1993)
Person
Organization
University
Supervisor
is subclass
PhD
student
knows
is enrolled at
“A Conceptualization is an intensional
semantic structure which encodes the
implicit rules constraining the
structure of a piece of reality” -
Guarino et al. (1995)
“The OWL 2 RDF-Based Semantics gives a formal meaning to
every RDF graph” - OWL2 RDF-based Semantics, W3C 2012

The use of restrictions varies in practice
only subclasses Different restrictions
defining formal
meaning
structured metadata in
websites using schema.org
Neuro Behavior
ontology (NBO)
A program to infer knowledge
(a reasoner) needs formal meaning

-> Data in a uniform graph structure
-> Use the web to represent the graph
-> We can restrict meaning using axioms or restrict what is
valid using constraints
PhD presentation

PhD presentation
Congratulations, you passed Knowledge Graphs 101

Users need support
Assessing restrictions using Montolo
Creating restrictions using visual notations
Using restrictions to enable data stewardship
Conclusion

Imagine you want to create an application (data model)
Reuse existing concepts which fit your use case
for example an event planning app

Reusing ontologies is usually a multi-step process
Discovery of reuse
candidates
Selection of
relevant
ontologies
Customization
and integration
of reused
ontologies

Imagine you want to create an application (data model)
Reuse existing concepts which fit your use case
for example an event planning app
Create your own local constraints
for example Corona measures which temporarily apply

Creating constraints
Person
Birth date
Last name
First name
USER
schema:DatedMoneySpecification
rdf:type sh:NodeShape ;
sh:closed "true"^^xsd:boolean ;
sh:ignoredProperties (
rdf:type
) ;
sh:property [
sh:path schema:amount ;
sh:datatype xsd:float ;
sh:maxCount 1 ;
sh:minCount 1 ;
] ;
sh:property [
sh:path schema:currency ;
rdfs:comment "The currency code (here) is a
mandatory property consisting of three upper-case
letters" ;
sh:datatype xsd:string ;
sh:flags "i" ;
sh:maxCount 1 ;
sh:minCount 1 ;
sh:pattern "^[A-Z]{3}$" ;
] ;
What users get!
What users want
is visual support!

Main research question
How can we support users in the assessment and in the
creation of Knowledge Graph restrictions?

Different types of restrictions are available in RDFS/OWL
only subclasses Different restrictions
defining formal
meaning
Domain
Disjoint
Properties
Literal ranges
Reasoner

only subclasses
Different restrictions
defining formal
meaning
But some restriction types come with a high
(computational) complexity … not always needed
Domain
Disjoint
Properties
Literal ranges
Reasoner

Search existing repositories to reuse common
definitions

Does this vocabulary fit our use case?
Existing statistics do not provide any
information of what restrictions exist
in the vocabulary

Currently only a manual assessment
of ontologies, one by one
Ontology documentation pages
created by Widoco
Ontology loaded into the editor tool
Protégé

Discover and assess ontologies based on restriction use
Possible ontology reuse candidates
(colors = different restriction
types)
Use case

Discover and assess ontologies based on restriction use
Possible ontology reuse candidates
(colors = different restriction
types)
Restriction type
use statistics Use case

Computing the statistics - overview
LODStats
Montolo
Stats
DataCube and PROV
annotated statistics
660
ontologies
565
ontologies
31,850
observations
18 restriction types

Created statistics are FAIR
The statistics are described using
Knowledge Graphs
Dataset available via a repository
or consultable via a website

How many ontologies use each restriction type?
A few often used
restriction types and a
long tail both
in LOV and BioPortal
Restriction types

Negligible number of literal value restrictions
Almost no literalRanges
restrictions
literalPattern not used
at all

Property and cardinality restrictions in the tail
Tail mostly consists of
property-based and
cardinality-based
restrictions expressed
using OWL terms

LOV vs BioPortal: qualified cardinalities
Qualified cardinalities
preferred in BioPortal
ontologies

LOV vs BioPortal: unqualified cardinalities
Unqualified cardinalities
preferred in LOV
ontologies

Certain restrictions slightly more used in BioPortal
BioPortal ontologies
use certain restrictions
more often

Domain and range used less in BioPortal
More domain/range
Restrictions in LOV

Commonly used constraint types and unused potential
Data shapes are relatively
new, here we could only
investigate 19 data sources

Besides assessment support we learned from the statistics
and we can ask more questions
Only half of the ontologies use OWL-based axioms
Little attention for literal values
Attention with editing tools regarding a self fulfilling prophecy

Creating constraints
USER
schema:DatedMoneySpecification
rdf:type sh:NodeShape ;
sh:closed "true"^^xsd:boolean ;
sh:ignoredProperties (
rdf:type
) ;
sh:property [
sh:path schema:amount ;
sh:datatype xsd:float ;
sh:maxCount 1 ;
sh:minCount 1 ;
] ;
sh:property [
sh:path schema:currency ;
rdfs:comment "The currency code (here) is a
mandatory property consisting of three upper-case
letters" ;
sh:datatype xsd:string ;
sh:flags "i" ;
sh:maxCount 1 ;
sh:minCount 1 ;
sh:pattern "^[A-Z]{3}$" ;
] ;
What users want
is visual support!
What users get!

Different constraint types need to be visualized
USER
Or
Disjoint
Not
What users want
is visual support!

Existing tools do not specify how to visualize
all SHACL core constraints
USER
Or
Disjoint
Existing visual tools

Based on existing cognitive theories and experiments
we can define how to systematically visualize constraint types
USER
Or
Disjoint
Moody, Daniel. "The ‘physics’ of
notations: toward a scientific basis for
constructing visual notations in software
engineering." IEEE Transactions on
software engineering 35.6 (2009): 756-
779.

Let’s reuse visual notations already familiar to users
VOWL UML

Chapter: Constraint creation
How can we support users familiar with Linked Data
in viewing RDF constraints?
Users familiar with Linked Data
can answer questions about
visually represented RDF constraints
more accurately with a VOWL-based visual notation
than with a UML-based visual notation

Systematically described visual notations
allow systematically improvements/adaptations

Compare visual notations in a user study with 12 participants
Two visual notations
to visualize the same
semantic constructs
Test case Group 1 Group 2
Test case 1 ShapeUML ShapeVOWL
Test case 2 ShapeVOWL ShapeUML
Test case 3 ShapeUML ShapeVOWL
Test case 4 ShapeVOWL ShapeUML
ShapeVOWL ShapeUML
Pre assessment (social demographics + skills)
Main questionnaire to assess
accuracy of answers to provided questions
Post assessment (opinion)

No significant error differences between answers for
ShapeUML and ShapeVOWL

Besides having 2 new visual notations,
we gained new qualitative insights!
Space efficient representation using ShapeUML
Good to have several notations because of familiarity bias
Visual features are important and can also improve ShapeUML

Data stewardship: the long-term care of valuable digital assets

Valuable information in archived records
Historic government records or early climate data,
e.g. demographics or taxes on crop yields
Invaluable data loss
NASA is unable to locate the original high quality
moon landing video.
How about 21st century data?
Social media content influences the real world,
what if Twitter and Co are gone?
Historical records
Moon landing in the 1960s
The web and social media

BESOCIAL: a cross-institutional research project to
develop a social media archiving strategy for Belgium
Follow up of a project for
general web archiving
Lead by the Royal
Library of Belgium
Research partners
with different
expertise
Funded by the Belgian
Science Policy Office

#meToo
#IchBinHanna
@SvenLieber
@UGent
#fries
A curator creates social media collections,
possibly via crowdsourcing

A curator creates social media collections,
possibly via crowdsourcing
#meToo
#IchBinHanna
@SvenLieber
@UGent
#fries
Society
Academic
Belgium

New social media posts are “harvested” daily/weekly
#meToo
#IchBinHanna
@SvenLieber
@UGent
#fries
Society
Academic
Belgium

We are preserving content, but also doing this:

Knowledge Graph-based workflow for data stewardship
Society
#meToo
#IchBinHanna
Data
format
A
Data
format
B
Heterogeneous data sources Knowledge Graph Views on the data in different formats

Quality is use-case specific and can be
systematically defined and measured
For example quality dimension
“Rich collection description”

Quality Assessment using
Knowledge Graphs and restrictions
40 user stories such as “As an archive-user, I want to see
descriptive information about the collection from the archivist,
so I can assess if the content is relevant to me.”
Derive quality requirements such as “The description of
each collection should at least have 200 characters”
Metric: Missing collection description
Metric: Number of missing descriptions
Metric: Insufficient collection description
Metric: Number of insufficient descriptions
Report
Quality
Assessment

We could identify quality issues and their root cause

A Knowledge Graph and restrictions supported data
stewardship for social media archiving
Providing an integrated view on the data (with formal
meaning)
Assisted in an automated quality assessment by using constraints
The workflow is generalizable thus helpful in other use cases

Montolo statistics support restriction assessments
with FAIR data which was not possible before
We can rethink the value we give to restrictions, why and
how do we use restrictions systematically?

There are now 2 visual notations covering all
SHACL core constraints
First steps to make Knowledge Graph constraints more
accessible to domain experts

The BESOCIAL use case demonstrated the use of restrictions
to tackle data stewardship challenges
The future is less about tools and more about workflows
and data!

A circle representing the human knowledge
“The illustrated guide to a Ph.D.” - Matt Might

Little knowledge after elementary school

More knowledge after high school

Gaining speciality with the Bachelor’s degree

Deepen speciality with the Master’s degree

Reading research papers takes you to the edge of human knowledge

You focus at the boundary

You focus at the boundary for a few years

One day the boundary gives way

The dent you have made is called PhD

The world looks different to you now

Don’t forget the bigger picture

Newly raised questions: future work
Montolo provides metrics, but what are the higher level
dimensions, tools and its usability?
Why and how are restrictions used in the first place?
How do we build our future Knowledge Graphs from a
methodological point of view?

Questions & Answers
Dissertation available as PDF at https://sven-lieber.org/phd
SvenLieber sven-lieber.org
knows.idlab.ugent.be

Assessing, Creating and Using Knowledge Graph Restrictions

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Assessing, Creating and Using Knowledge Graph Restrictions

Similar to Assessing, Creating and Using Knowledge Graph Restrictions (20)

Recently uploaded

Recently uploaded (20)

Assessing, Creating and Using Knowledge Graph Restrictions