Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Semantic Web Foundations for
Representing, Reasoning, and Traversing
Contextualized Knowledge Graphs
By: Vinh Nguyen
Commi...
2
Semantic Web Layer Cake
3
2973 datasets with 149 billion triples
Linked Data principles
Use URIs as names
Use HTTP URLs to be looked up
URI provid...
4
Semantic Web & IBM Watson
5
Semantic Web & Enterprise
6
Knowledge Graphs
• Knowledge representation for more complex knowledge
– Time: a triple holds true during a time interval, not forever
– Pr...
• Syntax
– RDF triple representation
– Variety of metadata types
• Semantics
– Unambiguous interpretation
– Enable reasoni...
• Definition
– Context of a triple represents an n-ary relationship, a propositional
attitude, or metadata such as time, l...
Semantic Web Foundations for
(1) Representing, (2) Reasoning, and (3) Traversing
Contextualized Knowledge Graphs
10
Our go...
• Part 1: Representing
 a compact and formal representation
o SP representation
• Part 2: Reasoning
 a sound and complet...
12
Part 1
Semantic Web Foundations for
Representing
Contextualized Knowledge Graphs
• Motivating example
• Comparing existing approaches vs. singleton property
• Model-theoretic semantics
• Querying
• Use c...
Subject Predicate Object Starts Ends
Bob Dylan marriedTo Sarah Lownds 1965-11-22 1977-06-29
Bob Dylan marriedTo Carolyn De...
Subject Predicate Object Starts Ends
Bob Dylan marriedTo Sarah Lownds 1965-11-22 1977-06-29
RDF Reification
Form of Triple...
Subject Predicate Object Starts Ends
Bob Dylan marriedTo Sarah Lownds 1965-11-22 1977-06-29
RDF
Reification
RDF Reificatio...
Subject Predicate Object Source DateExtracted
Bob Dylan marriedTo Sarah Lownds wikipage:Bob_Dylan 2009-06-07
Form of Tripl...
Subject Predicate Object Source DateExtracted
Bob Dylan marriedTo Sarah Lownds wikipage:Bob_Dylan 2009-06-07
Provenance-aw...
Form of Quadruples: Named Graph
Pros:
1. Intuitive --creating # named graphs
for # sources
2. Attach metadata for a set of...
Named Graph
Subject Predicate Object NG
Bob Dylan marriedTo Sarah Lownds ng_1
ng_1 starts 1965-11-22 Prov_graph
ng_2 ends ...
RDF+
:Subject Predicate Object Meta Property Meta value
Bob Dylan marriedTo Sarah Lownds starts 1965-11-22
Bob Dylan marri...
Overall Goal
3. Scalable, e.g., to LOD
A mechanism to make statements about statements
should meet these requirements:
2. ...
Generic Property vs. Singleton Property
Subject Predicate Object Source MarriageDate
Bob Dylan marriedTo Sarah Lownds wiki...
• Given a vocabulary V,
Model-Theoretic Semantics
Original* Simple Interpretation I :
satisfies additional criteria as fol...
IR = {α, β, γ, δ, θ, λ, σ, ϕ}
IP = {δ, θ, λ, σ, ϕ}
LV = {1965-11-22, 1977-06-29,
1986-06-##, 1992-10-##}
IEXT = θ → {⟨α, β...
Querying Meta Triples Using SPARQL
Triple Type Subject Predicate Object
Instantiating singleton property predicate_i rdf:s...
27
Use Case: Temporal and Spatial YAGO2S
FactID Subject Predicate Object
#1 GratefulDead performed TheClosingOfWinterLand
...
Experiment: BKR with Provenance
All datasets are available at http://wiki.knoesis.org/index.php/Singleton_Property 28
• Fi...
Experiment Results
(A) random-value queries vs. fixed-value queries in msec.
(B) query length and execution time in msec. ...
– Gang Fu, Evan Bolton, Núria Queralt Rosinach, Laura I Furlong, Vinh Nguyen, Amit
Sheth, Olivier Bodenreider, Michel Dumo...
Model I Model II Model III Model IV Model V
22,787,218 21,445,348 19,575,298 17,239,427 27,605,782
31
PubChem
• Five data ...
32
PubChem
• Query performance in secs
 SP models (III and IV) outperforms other models in Virtuoso
33
PubChem (cont)
34
WikiData
• Four data sets generated from the same seed
 Standard Reification (SR)
 N-ary relation (NR)
 Singleton pr...
35
WikiData
• Query performance in 4store and GraphDB
 SP models are not supported by 4store and GraphDB
• Query performa...
36
WikiData
• Six data sets generated from the same seed
 Standard Reification (stdreif)
 N-ary relation (naryrel)
 Sin...
37
Experimental Comparison
• Dataset size
 SP offers the most concise representation in all cases
• Query performance
 S...
38
Optimizing the SP syntax
Temporal fact:
Subject Predicate Object Starts Ends
Bob Dylan marriedTo Sarah Lownds 1965-11-2...
• Borrowed from REST API
http://example.com/property?key1=value1&key2=value2
– The URIs are de-referencable
– Can be crawl...
• Add the new parameter ds to the URIs of SPs
– Grouping triples from the same datasets
– No extra triple or quad
• Coordi...
Time-aware Facts:
Subject Predicate Object Starts Ends
Bob Dylan marriedTo Sarah Lownds 1965-11-22 1977-06-29
Named Graph ...
42
RDF-Contextualizer
• Transform existing knowledge bases into SP
 Named Graph
 NanoPublication
 Reification
• Creatin...
Dataset # Quads # Optimal SP Triples # SP Triples
NCBI-NG 4,043,516,408 4,043,516,408 12,130,549,224
NCBI-NG-NoDup 2,010,2...
Overall Goal
3. Scalable, e.g., to LOD
Does SP representation meet these requirements?
2. Formal semantics defined1. Simpl...
45
Part 2
Semantic Web Foundations for
Reasoning with
Contextualized Knowledge Graphs
Subject Predicate Object Starts Ends
Bob Dylan marriedTo Sarah Lownds 2005 2010
Bob Dylan marriedTo Carolyn Dennis 2006 20...
47
Distinguish Property Types
Property
Generic Property Singleton Property
Property Type:
Singleton property
Generic prope...
48
Distinguish Triple Types
Property Type:
Singleton property
Generic property
Regular property
Context:
Context-
associat...
• A property mapping function IEXTis a binary relation
that maps one property to a set of ordered pairs of
resources.
• Fo...
• A singleton mapping function IS_EXTis a binary relation that
maps one singleton property to one ordered pair of
resource...
• A generic property function IG is a binary relation that
maps a generic property to a set of its singleton
properties.
•...
• A generic mapping function IG_EXTis a binary relation that
maps a generic property to a set of ordered pairs of
resource...
53
Simple Interpretation
• Given a vocabulary V,
• IR: a non-empty set of resources,called domain or universe of discourse...
54
RDF Interpretation
Singleton Property vs. Generic Property
• if <xs, xg>∈ IEXT (rdf:singletonPropertyOf),
then xs ∈ IPs...
u rdf:singletonPropertyOf v .
u rdf:type SingletonProperty
55
Syntax-based Inference Rules: sp-1
ceo?id=1 rdf:singletonPro...
u rdf:singletonPropertyOf v .
v rdf:type GenericProperty
56
Syntax-based Inference Rules: sp-2
ceo?id=1 rdf:singletonPrope...
57
RDF Interpretation
Generic Mapping Extension
• if <xs, xg>∈ IEXT (rdf:singletonPropertyOf),
then IS_EXT(xs)∈ IG_EXT(xg)...
58
RDF Interpretation
Generic Triple Deriving
• if <xs, x>∈ IS_EXT (rdf:singletonPropertyOf), and
<u, v> = IS_EXT(xs)
then...
u rdf:singletonPropertyOf v .
x u y .
x v y .
59
Syntax-based Inference Rules: sp-3
ceo?id=1 rdf:singletonPropertyOf ceo ....
60
Syntax-based Inference Rules: sp-4
SubProperty
• if <xs, x>∈ IS_EXT (rdf:singletonPropertyOf),
<x, y>∈ IEXT (rdfs:subPr...
61
Syntax-based Inference Rules: sp-4
Domain
• if <xs, x>∈ IS_EXT (rdf:singletonPropertyOf),
<x, y>∈ IEXT (rdfs:domain), <...
62
Syntax-based Inference Rules : sp-6
Range
• if <xs, x>∈ IS_EXT (rdf:singletonPropertyOf),
<x, y>∈ IEXT (rdfs:range), <u...
Subject Predicate Object Starts Ends
Bob Dylan marriedTo Sarah Lownds 2005 2010
Bob Dylan marriedTo Carolyn Dennis 2006 20...
Subject Predicate Object Starts Ends
Bob Dylan marriedTo Sarah Lownds 2005 2010
Bob Dylan marriedTo Carolyn Dennis 2006 20...
65
SP Inference Chain
chadHurley type?id=1 youtubeEmp . type?id=1 singleton type .
youtubeEmp subClassOf?id=2 googleEmp . ...
66
Context-based Reasoning
( X, A, Y) : v1, (X, A, Y): v2
(X, A, Y): v1 ⨁ v2
( X, A, Y) : v1, (X, A, Y): v2
(X, A, Y): v1 ...
67
Context-based Reasoning with Time
( X, A, Y) : v1, (X, A, Y): v2
(X, A, Y): v1 ⨂ v2
Ltime: partially ordered set of tim...
68
Context-based Inference Rule with SP Chain
Ltime: partially ordered set of time intervals
[t5, t6], [t1, t2], [t3, t4] ...
69
Context-based Inference Rule with SP Chain
A type?id=i B . type?id=i singleton type ; from t1 ; to t2 .
B subClassOf?id...
70
Context-based Inference Rule with SP Chain
chadHurley type?id=1 youtubeEmp . type?id=1 singleton type ; from 2005 ; to ...
Implementation
71
• rdf-contextualizer
– Transformer: Reification2SP, NamedGraph2SP,
NanoPub2SP
– Reasoner: compute inferr...
72
Evaluation
• Total number of triples transformed by RDF-
contextualizer
73
Evaluation
• Total number of singleton triples computed by
RDF-contextualizer
74
Evaluation
• SP inference rules in Oracle 12c
75
76
Part 3
Semantic Web Foundations for
Traversing
Contextualized Knowledge Graphs
Bill Clinton holds the political position President of the USA from 1993-01-20
to 2001-01-20, and is succeeded by GeorgeW....
78
# Subject Predicate Object # Subject Predicate Object
T1 BillClinton holdsPos#1 U.S.President T4 BillClinton holdsPos#2...
79
# Subject Predicate Object
T1 BillClinton holdsPos#1 U.S.President
T3 holdsPos#1 hasSuccessor GeorgeW.Bush
Bill
Clinton...
# Subject Predicate Object # Subject Predicate Object
T1 BillClinton holdsPos#1 U.S.President T4 BillClinton holdsPos#2 Ar...
81
Labeled Directed Multigraph with Triple Nodes
(LDM-3N)
Frank
White
hold
s
Pos#
2
e6
I
George
W.Bush
hold
s
Pos#
1
hasSu...
Proposition 1. (Forward transformation).
Any set of RDF triples can be transformed into a labeled
directed multigraph with...
Proposition 2. (Backward transformation).
Given the graph GRDF (N, E, ε, τ, μ) transformed by Proposition 1,
a set of RDF ...
85
Traversing RDF Graph: Triple Path
Example: (T1, T3) is a triple path.
T1 = (BillClinton, holdsPos#1, U.S.President),
T3...
86
Traversing RDF Graph: Resource Path
A resource path is defined as a sequence of nodes such that
(1) every two adjacent ...
• Given a vocabulary V,
Model-Theoretic Semantics
Simple Interpretation I :
• IN: a set of nodes, IE: a set of directed ed...
Model-Theoretic Semantics
Simple Interpretation I (cont):
• IP, a set of property nodes, also a subset of IN, IP ⊂ IN.
• I...
• Engines
– GraphKE: on top of BerkeleyDB, written in C
– RDF-3X extension
• Dataset
– YAGO2S-SP: contains 267,161,278 tri...
91
Reachability Queries
92
Shortest Path Queries
93
Shortest Path Queries
(a) CS Policitian Group (b) SS Policitian Group
(c) HR Policitian Group
Features N-ary Reification NG SP
Triple store
Quad store
Formal semantics
Inference rules
Metadata variety
Compact DS size...
Semantic Web Foundations for
(1) Representing, (2) Reasoning, and (3) Traversing
Contextualized Knowledge Graphs
95
Our go...
96
Acknowledgment
Committee Members
Amit Sheth
(advisor)
Krishnaprasad
Thirunarayan
Olivier
Bodenreider (NLM)
Kemafor
Anya...
97
Thank you!
98
Extra slides
99
BarackObama marriedTo MichelleObama
Original statement
Questions
1. Is it true that Barack Obama married to Michelle Ob...
100
Barack Obama married to Michelle Obama in 1992 and in Chicago
Original statement
Questions
1. Is it true that Barack O...
101
Barack Obama married to Michelle Obama in 1992 and in Chicago
Original statement
Inferred statements
1. Barack Obama m...
102
Barack Obama married to Michelle Obama in Chicago
Barack Obama becomes a spouse of Michelle Obama in Illinois
Which ru...
Inference types
Primary triple Context 1 Context 2
Barack Obama married to Michelle Obama in Chicago in 1992
Barack Obama ...
u rdf:singletonPropertyOf v .
x u y .
x v y .
104
RDF Deduction Rules: rdf-sp-3
marriedTo#1 rdf:singletonPropertyOf marrie...
105
RDFS Deduction Rules: rdfs-sp-3
Sub-property
marriedTo#1 rdf:singletonPropertyOf marriedTo .
marriedTo rdfs:subPropert...
106
Barack Obama married to Michelle Obama in Chicago
Barack Obama married to Michelle Obama in Illinois
BarackObama marri...
107
Barack Obama married to Michelle Obama in Chicago
Barack Obama becomes a spouse of Michelle Obama in Chicago
BarackOba...
108
Barack Obama married to Michelle Obama in Chicago
Barack Obama becomes a spouse of Michelle Obama in Illinois
BarackOb...
Model-Theoretic Semantics
RDF Interpretation IRDF:
• p ∈ IP if ∃e1,e2 ∈ IE :
IE (e1) = (p, rdf:typeI),
IE (e2) = (rdf:type...
Model-Theoretic Semantics
RDF Interpretation IRDF : (cont.)
• ps ∈ IPs if ∃e1,e2 ∈ IE :
IE (e1) = (ps, rdf:singletonProper...
Model-Theoretic Semantics
RDFS Interpretation I RDFS:
• ICEXT : IP → 2IN , a function assigning to each class a set of nod...
Model-Theoretic Semantics
RDFS Interpretation I RDFS: (cont)
• if ∃e1,e2 ∈ IE: IT (e1) = e2,
IE(e1) = (x, rdfs:subProperty...
Upcoming SlideShare
Loading in …5
×

Semantic Web Foundations for Representing, Reasoning, and Traversing Contextualized Knowledge Graphs: Vinh Nguyen's Dissertation Defense

440 views

Published on

Semantic Web technologies such as RDF and OWL have become World Wide Web Consortium (W3C) standards for knowledge representation and reasoning. RDF triples about triples, or meta triples, form the basis for a contextualized knowledge graph. They represent the contextual information about individual triples such as the source, the occurring time or place, or the certainty.

However, an efficient RDF representation for such meta-knowledge of triples remains a major limitation of the RDF data model. The existing reification approach allows such meta-knowledge of RDF triples to be expressed in RDF by using four triples per reified triple. While reification is simple and intuitive, this approach does not have a formal foundation and is not commonly used in practice as described in the RDF Primer.

This dissertation presents the foundations for representing, querying, reasoning and traversing the contextualized knowledge graphs (CKG) using Semantic Web technologies.

A triple-based compact representation for CKGs. We propose a principled approach and construct RDF triples about triples by extending the current RDF data model with a new concept, called singleton property (SP), as a triple identifier. The SP representation needs two triples to the RDF datasets and can be queried with SPARQL.

A formal model-theoretic semantics for CKGs. We formalize the semantics of the singleton property and its relationships with the triple it represents. We extend the current RDF model-theoretic semantics to capture the semantics of the singleton properties and provide the interpretation at three levels: simple, RDF, and RDFS. It provides a single interpretation of the singleton property semantics across applications and systems.

A sound and complete inference mechanism for CKGs. Based on the semantics we propose, we develop a set of inference rules for validating and inferring new triples based on the SP syntax. We also develop different sets of context-based inference rules for provenance, time, and uncertainty.

A graph-based formalism for CKGs. We propose a formal contextualized graph model for the SP representation. We formalize the RDF triples as a mathematical graph by combining the model theory and the graph theory into a hybrid RDF formal semantics. The unified semantics allows the RDF formal semantics to be leveraged in the graph-based algorithms.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Semantic Web Foundations for Representing, Reasoning, and Traversing Contextualized Knowledge Graphs: Vinh Nguyen's Dissertation Defense

  1. 1. Semantic Web Foundations for Representing, Reasoning, and Traversing Contextualized Knowledge Graphs By: Vinh Nguyen Committee Members Amit Sheth (advisor) Krishnaprasad Thirunarayan Olivier Bodenreider (NLM) Kemafor Anyanwu (NCSU) Ramanathan V. Guha (Schema.org) PhD Dissertation
  2. 2. 2 Semantic Web Layer Cake
  3. 3. 3 2973 datasets with 149 billion triples Linked Data principles Use URIs as names Use HTTP URLs to be looked up URI provides useful info using standard Include links to other URIs to discover more
  4. 4. 4 Semantic Web & IBM Watson
  5. 5. 5 Semantic Web & Enterprise
  6. 6. 6 Knowledge Graphs
  7. 7. • Knowledge representation for more complex knowledge – Time: a triple holds true during a time interval, not forever – Provenance: origins of a triple – N-ary relationships – Propositional attitude • Not easily represented in the triple form • Reification is not acceptable 7 Research Problems
  8. 8. • Syntax – RDF triple representation – Variety of metadata types • Semantics – Unambiguous interpretation – Enable reasoning 8 What do we need?
  9. 9. • Definition – Context of a triple represents an n-ary relationship, a propositional attitude, or metadata such as time, location, provenance, or certainty of that triple. – A contextualized triple is a triple that is associated with contextual information – A contextualized knowledge graph is a knowledge graph in which every triple is qualified with a set of contextual properties. 9 Contextualized Knowledge Graph
  10. 10. Semantic Web Foundations for (1) Representing, (2) Reasoning, and (3) Traversing Contextualized Knowledge Graphs 10 Our goal is to develop Thesis statement It is possible to develop (1) a compact and formal representation, (2) a sound and complete inference mechanism, and (3) a model-theoretic graph formalism for contextualized knowledge graphs that can be efficiently implemented.
  11. 11. • Part 1: Representing  a compact and formal representation o SP representation • Part 2: Reasoning  a sound and complete inference mechanism o Syntax-based inference rules o Context-based inference rules • Part 3: Traversing  a model-theoretic graph formalism o Contextualized paths 11 Overview
  12. 12. 12 Part 1 Semantic Web Foundations for Representing Contextualized Knowledge Graphs
  13. 13. • Motivating example • Comparing existing approaches vs. singleton property • Model-theoretic semantics • Querying • Use cases • Experimental evaluation • External evaluation • Optimized SP representation 13 Outline
  14. 14. Subject Predicate Object Starts Ends Bob Dylan marriedTo Sarah Lownds 1965-11-22 1977-06-29 Bob Dylan marriedTo Carolyn Dennis 1986-06-## 1992-10-## Motivation Scenario Facts: Meta Queries: Query type Sample query Provenance P1. Where is this fact from? P2. When was it created? P3. Who created this fact? Time T1. When did this fact occur? T2. What is the time span of this fact? T3. Which events happened in the same year? Location L1. What is the location associated with this fact? L2. Which events happened at the same place? Certainty C1. What is the author confidence of this fact? 14 Subject Predicate Object Bob Dylan marriedTo Sarah Lownds Bob Dylan marriedTo Carolyn Dennis
  15. 15. Subject Predicate Object Starts Ends Bob Dylan marriedTo Sarah Lownds 1965-11-22 1977-06-29 RDF Reification Form of Triples: RDF Reification Pros: 1. Intuitive, easy to understand Cons: 1. Takes 3N triples (4N if including Statement typing) to represent a statement => Not scalable 2. No formal semantics defined => Semantics is unclear 3. Discouraged in LOD! Time-aware Facts: 15 Subject Predicate Object #stmt1 type Statement #stmt1 hasSubject BobDylan #stmt1 hasProperty marriedTo #stmt1 hasObject Sara Lownds Bob Dylan marriedTo Sarah Lownds #stmt1 starts 1965-11-22 #stmt1 ends 1977-06-29
  16. 16. Subject Predicate Object Starts Ends Bob Dylan marriedTo Sarah Lownds 1965-11-22 1977-06-29 RDF Reification RDF Reification vs. Singleton Property Time-aware Facts: Subject Predicate Object #stmt1 type Statement #stmt1 hasSubject BobDylan #stmt1 hasProperty marriedTo #stmt1 hasObject Sara Lownds Bob Dylan marriedTo Sarah Lownds #stmt1 starts 1965-11-22 #stmt1 ends 1977-06-29 Subject Predicate Object marriedTo#1 rdf:sp marriedTo BobDylan marriedTo#1 Sarah Lownds marriedTo#1 starts 1965-11-22 marriedTo#1 ends 1977-06-29 Singleton Property 16 Vinh Nguyen, Olivier Bodenreider, and Amit Sheth. "Don't like RDF reification?: making statements about statements using singleton property." In Proceedings of the 23rd international conference on World wide web, pp. 759-770. ACM, 2014.
  17. 17. Subject Predicate Object Source DateExtracted Bob Dylan marriedTo Sarah Lownds wikipage:Bob_Dylan 2009-06-07 Form of Triples: PaCE Pros: 1. Save ~50% number of triples compared to reification thanks to the repeated subject, predicate, and object. Cons: 1. Not intuitive, hard to understand 2. Limited expressiveness Provenance-aware Facts: 17 Provenance-aware Context Entity Subject Predicate Object BobDylan_wp rdf:type Bob Dylan SaraLownds_wp rdf:type Sara Lownds BobDylan_wp marriedTo SaraLownds_wp BobDylan_wp hasSource wiki:Bob_Dylan BobDylan_wp hasDateExt 2009-06-07 Satya S. Sahoo, Olivier Bodenreider, Pascal Hitzler, Amit Sheth, and Krishnaprasad Thirunarayan. 2010. Provenance context entity (PaCE): scalable provenance tracking for scientific RDF data. In Proceedings of the 22nd international conference on Scientific and statistical database management (SSDBM'10),
  18. 18. Subject Predicate Object Source DateExtracted Bob Dylan marriedTo Sarah Lownds wikipage:Bob_Dylan 2009-06-07 Provenance-aware Context Entity Subject Predicate Object BobDylan_wp rdf:type Bob Dylan SaraLownds_wp rdf:type Sara Lownds BobDylan_wp marriedTo SaraLownds_wp BobDylan_wp hasSource wiki:Bob_Dylan BobDylan_wp hasDateExt 2009-06-07 Facts and Provenance: 18 PaCE vs. Singleton Property Subject Predicate Object marriedTo#1 rdf:sp marriedTo BobDylan marriedTo#1 Sarah Lownds marriedTo#1 hasSource wp:Bob_Dylan marriedTo#1 hasDateExt 2009-06-07 Singleton Property
  19. 19. Form of Quadruples: Named Graph Pros: 1. Intuitive --creating # named graphs for # sources 2. Attach metadata for a set of triples 3. SPARQL supported Cons: 1. Defined for provenance only 2. Ambiguous semantics while associating different types of metadata at triple level Time-aware Facts: * Carroll, Jeremy J., et al. "Named graphs, provenance and trust." Proceedings of the 14th international conference on World Wide Web. ACM, 2005. 19 Subject Predicate Object Starts Ends Bob Dylan marriedTo Sarah Lownds 1965-11-22 1977-06-29 Named Graph Subject Predicate Object NG Bob Dylan marriedTo Sarah Lownds ng_1 ng_1 starts 1965-11-22 Prov_graph ng_2 ends 1977-06-29 Prov_graph
  20. 20. Named Graph Subject Predicate Object NG Bob Dylan marriedTo Sarah Lownds ng_1 ng_1 starts 1965-11-22 Prov_graph ng_2 ends 1977-06-29 Prov_graph Time-aware Facts: Subject Predicate Object Starts Ends Bob Dylan marriedTo Sarah Lownds 1965-11-22 1977-06-29 Named Graph vs. Singleton Property Subject Predicate Object marriedTo#1 rdf:sp marriedTo Bob Dylan marriedTo#1 Sarah Lownds marriedTo#1 starts 1965-11-22 marriedTo#1 ends 1977-06-29 20 Singleton Property
  21. 21. RDF+ :Subject Predicate Object Meta Property Meta value Bob Dylan marriedTo Sarah Lownds starts 1965-11-22 Bob Dylan marriedTo Sarah Lownds ends 1977-06-29 Form of Quintuples: RDF+ Cons: 1. The representation is not in the form of RDF. Statement identifiers are used internally. Require the mappings from RDF to RDF+ and vice versa. 2. The SPARQL query syntax and semantics need to be extended to support RDF+ Facts and Temporal Information: * Dividino, Renata, et al. "Querying for provenance, trust, uncertainty and other meta knowledge in RDF." Web Semantics: Science, Services and Agents on the World Wide Web 7.3 (2009): 204-219. 21 Subject Predicate Object Starts Ends Bob Dylan marriedTo Sarah Lownds 1965-11-22 1977-06-29
  22. 22. Overall Goal 3. Scalable, e.g., to LOD A mechanism to make statements about statements should meet these requirements: 2. Formal semantics defined1. Simple, easy to understand 4. Compatible with existing standards 5. Multiple types of metadata 22
  23. 23. Generic Property vs. Singleton Property Subject Predicate Object Source MarriageDate Bob Dylan marriedTo Sarah Lownds wikipage:Bob_Dylan 1965-11-22 BarackObama marriedTo MichelleObama wikipage:Barack_Obama 1992-10-03 Facts and Provenance: Generic Property: 1. marriedTo is an RDF property 2. marriedTo => { (Bob Dylan, Sarah Dylan), (Barack Obama, Michelle Obama), … … } 3. Any assertion to marriedTo is applicable to all pairs of entities! Singleton Property: 1. marriedTo#1, marriedTo#2 are RDF property 2. Different property instances: marriedTo#1, marriedTo#2, … marriedTo#n 3. Any assertion to marriedTo#1/marriedTo#2/…/marriedTo#n is applicable to only ONE pair <= KEY instanceOf 23
  24. 24. • Given a vocabulary V, Model-Theoretic Semantics Original* Simple Interpretation I : satisfies additional criteria as follows: • IPS: a subset of IR, called the set of singleton properties of I, New simple Interpretation I : satisfies additional criteria as follows: • xs ∈ IPs if ⟨xs, rdf:SingletonPropertyI⟩ ∈ IEXT (rdf:typeI) New RDF Interpretation I : • IR: a non-empty set of resources, alternatively called domain or universe of discourse of I. • IP: the set of generic properties of I • IEXT: a function assigning to each property a set of pairs from IR where IEXT (p) is called the extension of property p • IS: a function, mapping URIs from V into the union set of IR and IP, • IL: a function from the typed literals from V into the set of resources IR, • LV: a subset of IR, called the set of literal values. • IEXT : IP → 2IR X IR IS_EXT : IPS→ IR X IR. • IS_EXT (ps): is a function assigning to each singleton property a pair of entities from IR. • xs ∈ IPs if ⟨xs, xI⟩ ∈ IEXT (rdf:singletonPropertyOfI), and x∈IP, IS_EXT (xs) = <s1, s2> 24
  25. 25. IR = {α, β, γ, δ, θ, λ, σ, ϕ} IP = {δ, θ, λ, σ, ϕ} LV = {1965-11-22, 1977-06-29, 1986-06-##, 1992-10-##} IEXT = θ → {⟨α, β⟩} λ → {⟨α, γ⟩} σ → {⟨θ, 1965-11-22 ⟩, ⟨λ, 1986-06-## ⟩} φ → {⟨θ, 1977-06-29⟩, ⟨λ, 1992-10-## ⟩} rdf:sp → {⟨θ, δ⟩, ⟨λ, δ⟩} δ → {⟨α, β⟩, ⟨α, γ⟩} IPS = {θ, λ} IS_EXT = θ→⟨α,β⟩ λ → ⟨α,γ⟩ Model-Theoretic Semantics: Example Example of vocabulary VEX: RDF Interpretation of VEX: Subject Predicate Object BobDylan isMarriedTo Sarah Lownds BobDylan isMarriedTo#1 SaraLownds isMarriedTo#1 rdf:sp isMarriedTo isMarriedTo#1 hasStart 1965-11-22 isMarriedTo#1 hasEnd 1977-06-29 BobDylan isMarriedTo CarolynDennis BobDylan isMarriedTo#2 CarolynDennis isMarriedTo#2 rdf:sp isMarriedTo isMarriedTo#2 hasStart 1986-06-## isMarriedTo#2 hasEnd 1992-10-## BobDylan → α SaraLownds → β CarolynDennis → γ isMarriedTo → δ isMarriedTo#1 → θ isMarriedTo#2 → λ hasStart → σ hasEnd → φ IS: 25
  26. 26. Querying Meta Triples Using SPARQL Triple Type Subject Predicate Object Instantiating singleton property predicate_i rdf:sp predicate Singleton triple subject predicate_i object Meta triple predicate_i meta-predicate_j meta-value_j Singleton Graph Pattern Data Query: 1. Who married whom? 2. SPARQL query SELECT ?person1 ?person2 WHERE { ?person1 ?married_sp ?person2 . ?married_sp rdf:sp :marriedTo . } Meta Query: 1. Who married whom and when? 2. SPARQL query SELECT ?person1 ?person2 ?time WHERE { ?person1 ?married_sp ?person2 . ?married_sp rdf:sp :marriedTo . ?married_sp :happenedOn ?date . } 26
  27. 27. 27 Use Case: Temporal and Spatial YAGO2S FactID Subject Predicate Object #1 GratefulDead performed TheClosingOfWinterLand #2 #1 occursIn SanFrancisco #3 #1 occursOn 1978-12-31 Subject Predicate Object performed_12345 rdf:singletonPropertyOf performed GratefulDead performed_12345 TheClosingOfWinterLand performed_12345 occursIn SanFrancisco performed_12345 occursOn 1978-12-31 FactID in Yago2s Singleton Property
  28. 28. Experiment: BKR with Provenance All datasets are available at http://wiki.knoesis.org/index.php/Singleton_Property 28 • Five data sets generated from the same seed BKR  Singleton Property (SP)  Reification (R)  PaCE C1 (C1)  PaCE C2 (C2)  PaCE C3 (C3)
  29. 29. Experiment Results (A) random-value queries vs. fixed-value queries in msec. (B) query length and execution time in msec. 29
  30. 30. – Gang Fu, Evan Bolton, Núria Queralt Rosinach, Laura I Furlong, Vinh Nguyen, Amit Sheth, Olivier Bodenreider, Michel Dumontier. Exposing provenance metadata using different RDF models. In Proceedings of Semantic Web Applications and Tools for Life Science (SWAT4LS), 2016. https://pubchem.ncbi.nlm.nih.gov/ – Hernández, Daniel, Aidan Hogan, and Markus Krötzsch. "Reifying RDF: What works well with wikidata?." SSWS@ ISWC 1457 (2015): 32-47. – Frey, Johannes, Kay Müller, Sebastian Hellmann, Erhard Rahm, and Maria-Esther Vidal. "Evaluation of Metadata Representations in RDF stores.” – Daniel Hernández, Aidan Hogan, Cristian Riveros, Carlos Rojas, Enzo Zerega: Querying Wikidata: Comparing SPARQL, Relational and Graph Databases. International Semantic Web Conference (2) 2016: 88-103 30 External Evaluation
  31. 31. Model I Model II Model III Model IV Model V 22,787,218 21,445,348 19,575,298 17,239,427 27,605,782 31 PubChem • Five data sets generated from the same seed  N-ary with cardinal assertion (Model I)  N-ary without cardinal assertion (Model II)  Singleton property with cardinal assertion (Model III)  Singleton property without cardinal assertion (Model IV)  NanoPublication (Model V) • Comparing sizes of generated datasets  SP datasets are the most compact ones Gang Fu, Evan Bolton, Núria Queralt Rosinach, Laura I Furlong, Vinh Nguyen, Amit Sheth, Olivier Bodenreider, Michel Dumontier. Exposing provenance metadata using different RDF models. In Proceedings of Semantic Web Applications and Tools for Life Science (SWAT4LS), 2016.
  32. 32. 32 PubChem • Query performance in secs  SP models (III and IV) outperforms other models in Virtuoso
  33. 33. 33 PubChem (cont)
  34. 34. 34 WikiData • Four data sets generated from the same seed  Standard Reification (SR)  N-ary relation (NR)  Singleton property (SP)  Named Graph (NG) • Comparing sizes of generated datasets  SP dataset is the most compact one Hernández, Daniel, Aidan Hogan, and Markus Krötzsch. "Reifying RDF: What works well with wikidata?." SSWS@ ISWC 1457 (2015): 32-47.
  35. 35. 35 WikiData • Query performance in 4store and GraphDB  SP models are not supported by 4store and GraphDB • Query performance in Virtuoso and BlazeGraph  Reification and NG are well-supported by Virtuoso and BlazeGraph  SP is little faster than NR in Virtuoso, slower in BlazeGraph
  36. 36. 36 WikiData • Six data sets generated from the same seed  Standard Reification (stdreif)  N-ary relation (naryrel)  Singleton property (sgprop)  Companion property (cpprop)  Named Graph (ngraphs)  RDF* (rdr) • Comparing sizes of generated datasets  SP dataset is the most compact triple representation  Fastest in loading time for WikiData  Best query performance for StarDog in all cases  Slowest in Virtuoso but not by much for WikiData queries  Not encounter performance issues with SP Frey, Johannes, Kay Müller, Sebastian Hellmann, Erhard Rahm, and Maria-Esther Vidal. "Evaluation of Metadata Representations in RDF stores."
  37. 37. 37 Experimental Comparison • Dataset size  SP offers the most concise representation in all cases • Query performance  SP performs reasonably well in Virtuoso, best in StarDog, OK in BlazeGraph  SP may have the potential for the performance gain if supported and optimized by the query engines Is SP representation optimal?
  38. 38. 38 Optimizing the SP syntax Temporal fact: Subject Predicate Object Starts Ends Bob Dylan marriedTo Sarah Lownds 1965-11-22 1977-06-29 Subject Predicate Object marriedTo#1 rdf:sp marriedTo BobDylan marriedTo#1 Sarah Lownds marriedTo#1 hasSource wp:Bob_Dylan marriedTo#1 hasDateExt 2009-06-07 Singleton Property Subject Predicate Object BobDylan marriedTo?id=1 Sarah Lownds marriedTo?id=1 hasSource wp:Bob_Dylan marriedTo?id=1 hasDateExt 2009-06-07 Optimal Singleton Property
  39. 39. • Borrowed from REST API http://example.com/property?key1=value1&key2=value2 – The URIs are de-referencable – Can be crawled and indexed – Well-known concept – Easy to adopt – Compatible with existing apps 39 Parameterized URIs http://yago.org/marriedTo?id=1
  40. 40. • Add the new parameter ds to the URIs of SPs – Grouping triples from the same datasets – No extra triple or quad • Coordination among publishers is easy – Only required for a unique ds name – Identifier id is local to each ds 40 Parameterized URIs for grouping triples http://yago.org/marriedTo?id=1&ds=yago2s
  41. 41. Time-aware Facts: Subject Predicate Object Starts Ends Bob Dylan marriedTo Sarah Lownds 1965-11-22 1977-06-29 Named Graph vs. Singleton Property Subject Predicate Object marriedTo#1 rdf:sp marriedTo Bob Dylan marriedTo#1 Sarah Lownds marriedTo#1 starts 1965-11-22 marriedTo#1 ends 1977-06-29 41 Singleton PropertyNamed Graph Subject Predicate Object NG Bob Dylan marriedTo Sarah Lownds ng_1 ng_1 starts 1965-11-22 Prov_graph ng_2 ends 1977-06-29 Prov_graph Subject Predicate Object Bob Dylan marriedTo?id=1&ds=ex1&start=1965-11-22 &ends=1977-06-29 Sarah Lownds Optimal Singleton Property
  42. 42. 42 RDF-Contextualizer • Transform existing knowledge bases into SP  Named Graph  NanoPublication  Reification • Creating IRIs for singleton properties  Appending parameters to the generic property IRI • Pipeline implementation  Using Jena API 3.0
  43. 43. Dataset # Quads # Optimal SP Triples # SP Triples NCBI-NG 4,043,516,408 4,043,516,408 12,130,549,224 NCBI-NG-NoDup 2,010,283,374 2,010,283,374 6,030,850,122 DBPedia-NG 1,039,275,891 1,039,275,891 3,117,827,673 DBPedia-NG-NoDup 784,508,538 784,508,538 2,353,525,614 CTD-NG 644,147,853 644,147,853 1,932,443,312 CTD-NG-NoDup 327,648,659 327,648,659 982,945,977 PHARMGB-NG 462,682,871 462,682,871 1,388,048,613 PHARMGB-NG-NoDup 339,058,720 339,058,720 1,017,176,160 GOA-NG 159,255,577 159,255,577 477,766,709 GOA-NG-NoDup 97,522,988 97,522,988 292,568,964 43 RDF-Contextualizer • Transforming existing RDF datasets into SP: Bio2RDF and DBpedia
  44. 44. Overall Goal 3. Scalable, e.g., to LOD Does SP representation meet these requirements? 2. Formal semantics defined1. Simple, easy to understand 4. Compatible with existing standards 5. Multiple types of metadata 44
  45. 45. 45 Part 2 Semantic Web Foundations for Reasoning with Contextualized Knowledge Graphs
  46. 46. Subject Predicate Object Starts Ends Bob Dylan marriedTo Sarah Lownds 2005 2010 Bob Dylan marriedTo Carolyn Dennis 2006 2017 2005 2010 ceo sssss Motivation Scenario Facts: 46 Subject Predicate Object chadHurley rdf:type youtubeEmp youtubeEmp subClassOf googleEmp chadHurley ceo youtube ceo subPropertyOf worksFor Subject Predicate Object Starts Ends Bob Dylan marriedTo Sarah Lownds 2006 2010 Bob Dylan marriedTo Carolyn Dennis 2006 2010 Subject Predicate Object chadHurley rdf:type googleEmp chadHurley worksFor youtube Inferred Facts: Which rules
  47. 47. 47 Distinguish Property Types Property Generic Property Singleton Property Property Type: Singleton property Generic property Regular property Context: Context-associated Context-free Context-agnostic
  48. 48. 48 Distinguish Triple Types Property Type: Singleton property Generic property Regular property Context: Context- associated Context-free Context-agnostic Triple Type: Singleton triple Generic triple Regular triple Subject Predicate Object Singleton triple chadHurley ceo?id=1 youtube Generic triple chadHurley ceo youtube Regular triple chadHurley type Person Examples:
  49. 49. • A property mapping function IEXTis a binary relation that maps one property to a set of ordered pairs of resources. • Formally, let IP be the set of properties, IR be the set of resources. Then IEXT: IP → 2IR X IR 49 Property Mapping Function larryPage chadHurley alphabet youtube Person Company ceo
  50. 50. • A singleton mapping function IS_EXTis a binary relation that maps one singleton property to one ordered pair of resources. • Formally, let IR be the set of resources, IPs be the set of singleton properties. Then IS_EXT: IPs→ IR × IR. 50 Singleton Mapping Function larryPage chadHurley alphabet youtube Person Company ceo?id=2 ceo?id=1
  51. 51. • A generic property function IG is a binary relation that maps a generic property to a set of its singleton properties. • Formally, IG: IPg →2IPs such that IG(pg) = {ps | ⟨ps, pg⟩ ∈ IEXT (rdf:singletonPropertyOf)}. 51 Generic Property Function worksFor ceo?id=1 ceo?id=2 Generic Property Singleton Property
  52. 52. • A generic mapping function IG_EXTis a binary relation that maps a generic property to a set of ordered pairs of resources. • Formally, let IPg be the set of generic properties. • IG_EXT : IPg → 2IR X IR such that IG_EXT (pg) = {IS_EXT (ps)|ps ∈IG(pg)}. 52 Generic Property Mapping larryPage chadHurley alphabet youtube Person Company ceo?id=2 ceo?id=1 ceo
  53. 53. 53 Simple Interpretation • Given a vocabulary V, • IR: a non-empty set of resources,called domain or universe of discourse of I. • IP: the set of all properties of I • IEXT: a function assigning to each property a set of pairs from IR where IEXT (p) is called the extension of property p • IEXT : IP → 2IR X IR • IPs: the set of singleton properties of I • IPg: the set of generic properties of I • IPr: the set of regular properties of I • IS_EXT: a function assigning to each property a pair from IR • IG_EXT: a function assigning to each property a set of pairs from IR • IS_EXT : IPs → IR x IR • IG_EXT (p) = {IS_EXT (ps)|ps ∈ IG (pg)}
  54. 54. 54 RDF Interpretation Singleton Property vs. Generic Property • if <xs, xg>∈ IEXT (rdf:singletonPropertyOf), then xs ∈ IPs is a singleton property, and xg ∈ IPg is a generic property • ceo?id=1 rdf:singletonPropertyOf ceo . Singleton Property Generic Property
  55. 55. u rdf:singletonPropertyOf v . u rdf:type SingletonProperty 55 Syntax-based Inference Rules: sp-1 ceo?id=1 rdf:singletonPropertyOf ceo . ceo?id=1 rdf:type SingletonProperty
  56. 56. u rdf:singletonPropertyOf v . v rdf:type GenericProperty 56 Syntax-based Inference Rules: sp-2 ceo?id=1 rdf:singletonPropertyOf ceo. ceo rdf:type GenericProperty
  57. 57. 57 RDF Interpretation Generic Mapping Extension • if <xs, xg>∈ IEXT (rdf:singletonPropertyOf), then IS_EXT(xs)∈ IG_EXT(xg) • ceo?id=1 rdf:singletonPropertyOf ceo. • chadHurley ceo?id=1 youtube. IS_EXT(ceo?id=1) = {chadHurley, youtube}, IS_EXT(ceo?id=1)∈ IG_EXT(ceo)
  58. 58. 58 RDF Interpretation Generic Triple Deriving • if <xs, x>∈ IS_EXT (rdf:singletonPropertyOf), and <u, v> = IS_EXT(xs) then <u, v> ∈ IG_EXT(x) • ceo?id=1 rdf:singletonPropertyOf ceo . • chadHurley ceo?id=1 youtube . IS_EXT(ceo?id=1) = {chadHurley, youtube}, IS_EXT(ceo?id=1)∈ IG_EXT(ceo) {chadHurley, youtube}∈ IG_EXT(ceo) • chadHurley ceo youtube .
  59. 59. u rdf:singletonPropertyOf v . x u y . x v y . 59 Syntax-based Inference Rules: sp-3 ceo?id=1 rdf:singletonPropertyOf ceo . chadHurley ceo?id=1 youtube . chadHurley ceo youtube .
  60. 60. 60 Syntax-based Inference Rules: sp-4 SubProperty • if <xs, x>∈ IS_EXT (rdf:singletonPropertyOf), <x, y>∈ IEXT (rdfs:subPropertyOf) then <xs, y>∈ IS_EXT (rdf:singletonPropertyOf) ceo?id=1 rdf:singletonPropertyOf ceo . ceo rdfs:subPropertyOf worksFor . ceo?id=1 rdf:singletonPropertyOf worksFor .
  61. 61. 61 Syntax-based Inference Rules: sp-4 Domain • if <xs, x>∈ IS_EXT (rdf:singletonPropertyOf), <x, y>∈ IEXT (rdfs:domain), <u, v> = IS_EXT(xs) then u∈ ICEXT(y) ceo?id=1 rdf:singletonPropertyOf ceo . chadHurley ceo?id=1 youtube . ceo rdfs:domain Person . ceo?id=1 rdfs:domain Person .
  62. 62. 62 Syntax-based Inference Rules : sp-6 Range • if <xs, x>∈ IS_EXT (rdf:singletonPropertyOf), <x, y>∈ IEXT (rdfs:range), <u, v> = IS_EXT(xs) then u∈ ICEXT(y) ceo?id=1 rdf:singletonPropertyOf ceo . chadHurley ceo?id=1 youtube . ceo rdfs:range Company . ceo?id=1 rdfs:range Company .
  63. 63. Subject Predicate Object Starts Ends Bob Dylan marriedTo Sarah Lownds 2005 2010 Bob Dylan marriedTo Carolyn Dennis 2006 2017 Motivating Scenario Facts: 63 Subject Predicate Object chadHurley rdf:type youtubeEmp youtubeEmp subClassOf googleEmp Subject Predicate Object Starts Ends Bob Dylan marriedTo Sarah Lownds 2006 2010 Subject Predicate Object chadHurley rdf:type googleEmp Inferred Facts: Which rules
  64. 64. Subject Predicate Object Starts Ends Bob Dylan marriedTo Sarah Lownds 2005 2010 Bob Dylan marriedTo Carolyn Dennis 2006 2017 Motivation Scenario Facts: 64 Subject Predicate Object chadHurley rdf:type youtubeEmp youtubeEmp subClassOf googleEmp Subject Predicate Object chadHurley type?id=1 youtubeEmp type?id=1 singleton type type?id=1 starts 2005 Type?id=1 ends 2010 youtubeEmp sc?id=2 googleEmp subClassOf?id=2 singleton subClassOf subClassOf?id=2 starts 2006 subClassOf?id=2 ends 2017 Singleton Property
  65. 65. 65 SP Inference Chain chadHurley type?id=1 youtubeEmp . type?id=1 singleton type . youtubeEmp subClassOf?id=2 googleEmp . subClassOf?id=2 singleton subClassOf . ∃! type?id=10: chadHurley type?id=10 googleEmp . type?id=10 singleton type . type?id=10 derivedFrom type?id=1 . type?id=10 derivedFrom subClassOf?id=2. . A type?id=i B . type?id=i singleton type . B subClassOf?id=j C . subClassOf ?id=j singleton subClassOf . ∃! type?id=k: A type?id=k C . type?id=k singleton type . type?id=k derivedFrom type?id=i , subClassOf?id=j .
  66. 66. 66 Context-based Reasoning ( X, A, Y) : v1, (X, A, Y): v2 (X, A, Y): v1 ⨁ v2 ( X, A, Y) : v1, (X, A, Y): v2 (X, A, Y): v1 ⨂ v2 X, A, Y: meta variables L: partially ordered set of annotation values v1, v2 ∈L
  67. 67. 67 Context-based Reasoning with Time ( X, A, Y) : v1, (X, A, Y): v2 (X, A, Y): v1 ⨂ v2 Ltime: partially ordered set of time intervals [t5, t6], [t1, t2], [t3, t4] ∈ Ltime: [t5, t6] = [t1, t2] ∩ [t3, t4] ( X, A, Y) : [t1, t2], (X, A, Y): [t3, t4] (X, A, Y): [t5, t6]
  68. 68. 68 Context-based Inference Rule with SP Chain Ltime: partially ordered set of time intervals [t5, t6], [t1, t2], [t3, t4] ∈ Ltime: [t5, t6] = [t1, t2] ∩ [t3, t4] ( X, A, Y) : [t1, t2], (X, A, Y): [t3, t4] (X, A, Y): [t5, t6] A type?id=i B . type?id=i singleton type ; from t1 ; to t2 . B subClassOf?id=j C . subClassOf ?id=j singleton type ; from t3 ; to t4 . ∃! type?id=k: A type?id=k C . type?id=k singleton type ; from t5 ; to t6 . type?id=k derivedFrom type?id=i , subClassOf?id=j .
  69. 69. 69 Context-based Inference Rule with SP Chain A type?id=i B . type?id=i singleton type ; from t1 ; to t2 . B subClassOf?id=j C . subClassOf ?id=j singleton type ; from t3 ; to t4 . ∃! type?id=k: A type?id=k C . type?id=k singleton type ; from t5 ; to t6 . type?id=k derivedFrom type?id=i , subClassOf?id=j . chadHurley type?id=1 youtubeEmp . type?id=1 singleton type ; from 2005 ; to 2010 . youtubeEmp sc?id=2 googleEmp . sc?id=2 singleton subClassOf; from 2006 ; to 2017 . ∃! type?id=10: chadHurley type?id=10 googleEmp . type?id=10 singleton. type . type?id=10 from 2006 ; to 2017 . type?id=10 derivedFrom type?id=1 , sc?id=2 .
  70. 70. 70 Context-based Inference Rule with SP Chain chadHurley type?id=1 youtubeEmp . type?id=1 singleton type ; from 2005 ; to 2010 . youtubeEmp sc?id=2 googleEmp . sc?id=2 singleton subClassOf; from 2006 ; to 2017 . ∃! type?id=10: chadHurley type?id=10 googleEmp . type?id=10 singleton. type . type?id=10 from 2006 ; to 2017 . type?id=10 derivedFrom type?id=1 , sc?id=2 . Extending to RDFS rules involving class/property hierarchy Extending to different types of annotation values, e.g. uncertainty
  71. 71. Implementation 71 • rdf-contextualizer – Transformer: Reification2SP, NamedGraph2SP, NanoPub2SP – Reasoner: compute inferred triples for SP rules • Datasets – DBPedia – Bio2RDF: PubMed, PharmGKB, NCBI Genes, NCBO BioPortal, GO Annotations, etc.
  72. 72. 72 Evaluation • Total number of triples transformed by RDF- contextualizer
  73. 73. 73 Evaluation • Total number of singleton triples computed by RDF-contextualizer
  74. 74. 74 Evaluation • SP inference rules in Oracle 12c
  75. 75. 75
  76. 76. 76 Part 3 Semantic Web Foundations for Traversing Contextualized Knowledge Graphs
  77. 77. Bill Clinton holds the political position President of the USA from 1993-01-20 to 2001-01-20, and is succeeded by GeorgeW.Bush (extracted from the wiki page of Bill Clinton on 2009-06-07). Bill Clinton holds the political position Governor of Arkansas from 1979-01- 09 to 1981-01-19, and is succeeded by Frank White (extracted from the wiki page of Bill Clinton on 2009-08-08). 77 Subject Predicate Object Subject Predicate Object holdsPos#1 rdf:singleton holdsPos holdsPos#2 rdf:singleton holdspos BillClinton holdsPos#1 U.S.President BillClinton holdsPos#2 ArkansasGovernor holdsPos#1 starts 1993-01-20 holdsPos#2 starts 1979-01-09 holdsPos#1 ends 2001-01-20 holdsPos#2 ends 1981-01-19 holdsPos#1 hasSuccesso r GeorgeW.Bus h holdsPos#2 hasSuccesso r FrankWhite holdspos# 1 hasSource wk:Bill_Clinto n holdspos# 2 hasSource wk:Bill_Clinton holdsPos#1 hasExtDate 2009-06-07 holdsPos#2 hasExtDate 2009-06-07
  78. 78. 78 # Subject Predicate Object # Subject Predicate Object T1 BillClinton holdsPos#1 U.S.President T4 BillClinton holdsPos#2 ArkansasGovernor T2 holdsPos#1 rdf:singleton holdsPos T5 holdsPos#2 rdf:singleton holdsPos T3 holdsPos#1 hasSuccessor GeorgeW.Bush T6 holdsPos#2 hasSuccessor FrankWhite Bill Clinton Arkansas Governor U.S. President holdsPos#1 holdsPos#2 holdsPos#1 George W.Bush holdsPos holdsPos#2 Frank White rdf:singleton rdf:singleton hasSuccessor hasSuccessor Node-Labeled Arc-Node Diagram
  79. 79. 79 # Subject Predicate Object T1 BillClinton holdsPos#1 U.S.President T3 holdsPos#1 hasSuccessor GeorgeW.Bush Bill Clinton U.S. Presiden t holdsPos#1 holds Pos#1 George W.Bush hasSuccessor Node-Labeled Arc-Node Labeled Directed Multigraph with 3-Node Bill Clinton U.S. President GeorgeW.Bush holds Pos#1 e1 I e1 T hasSuccessor e3 T e3 I 1. Mapping predicates to nodes 2. Add initial edge: subject to predicate 3. Add terminal edge: predicate to object
  80. 80. # Subject Predicate Object # Subject Predicate Object T1 BillClinton holdsPos#1 U.S.President T4 BillClinton holdsPos#2 ArkansasGovernor T2 holdsPos#1 rdf:singleton holdsPos T5 holdsPos#2 rdf:singleton holdsPos T3 holdsPos#1 hasSuccessor GeorgeW.Bush T6 holdsPos#2 hasSuccessor FrankWhite 80 Bill Clinton Arkansas Governor Frank White holds Pos#2 e4 I e4 T e6 T e6 I U.S. President George W.Bush holds Pos#1 e1 T e1 I hasSuccessor e2 T e2 I holdsPos rdf:singleton e3 T e3 I e5 T e5 I Labeled Directed Multigraph with Three Node (LDM-3N)
  81. 81. 81 Labeled Directed Multigraph with Triple Nodes (LDM-3N) Frank White hold s Pos# 2 e6 I George W.Bush hold s Pos# 1 hasSuccesso r e3 I e6 T e3 T
  82. 82. Proposition 1. (Forward transformation). Any set of RDF triples can be transformed into a labeled directed multigraph with triple nodes GRDF. 82 Formalizing RDF Graph • Let V be the set of RDF terms in T . • Let N and E be the set of nodes and the set of directed edges in the graph, respectively. • The bijective function μ : V → N maps an RDF term in V to a node in N. • Let ti be a triple in T: • Let Ni ⊂N such that then . • The function ε : E → N × N is defined to map every edge in E to an ordered-pair of nodes. Let • The bijective function τ : E → E maps an initial edge to a terminal edge of the same triple. Then . Let Ei ⊂ E be the set of two edges representing ti, , and . . Therefore, Gi = (Ni, Ei, ε, τ, μ) is the labeled directed graph with triple nodes of the triple ti. • Finally, the graph GRDF = (N,E,ε,τ,μ) is a labeled directed multigraph with triple nodes for all of the triples in T . ei I ,ei T Î E :e(ei I )= (nsi,npi ),e(ei T )= (npi,noi ) ti = si, pi,oi( ) Î T,0 £ i £ n. Ni = {nsi, npi, noi | nsi = m(si ), npi = m(pi ), noi = m(oi )} t(ei I ) = ei T Ei ={ei I ,ei T }
  83. 83. Proposition 2. (Backward transformation). Given the graph GRDF (N, E, ε, τ, μ) transformed by Proposition 1, a set of RDF triples can be derived from GRDF. 83 Formalizing RDF Graph • Let be any edge in E with 0 ≤ i ≤ n, the corresponding terminal edge of eIi in E is . • From this pair of edges , we find the nodes connected by the two edges by using the ε function. • Let • Let μ−1 be the reverse function of μ, then μ−1 : N → V returns the RDF term mapped to a graph node. • Let • The three nodes form the original triple . • The set T of all RDF triples ti derived from GRDF is as follows: ei I ei T =t(ei I ) (ei I ,ei T ) nsi,npi,noi Î N :(nsi,npi ) =e(ei I ),(npi,noi )=e(ei T ) si, pi,oi ÎV :si = m-1 (nsi ), pi = m-1 (npi ),oi = m-1 (noi ) ti = (si, pi,oi ) T ={ti | "ti = (si, pi,oi )
  84. 84. 85 Traversing RDF Graph: Triple Path Example: (T1, T3) is a triple path. T1 = (BillClinton, holdsPos#1, U.S.President), T3 = (holdsPos#1, hasSuccessor, GeorgeW.Bush) Bill Clinton U.S. President George W.Bush holds Pos#1 e1 I e1 T hasSuccessor e3 T e3 I
  85. 85. 86 Traversing RDF Graph: Resource Path A resource path is defined as a sequence of nodes such that (1) every two adjacent nodes are connected by an edge, (2) every three nodes connected by a pair of initial and terminal edges should form a triple Example: (BillClinton, holdsPos#1, hasSuccessor, GeorgeW.Bush) Bill Clinton U.S. President George W.Bush holds Pos#1 e1 I e1 T hasSuccessor e3 T e3 I
  86. 86. • Given a vocabulary V, Model-Theoretic Semantics Simple Interpretation I : • IN: a set of nodes, IE: a set of directed edges of I • IE : IE → IN × IN, mapping each edge to an ordered pair of nodes. • IT : IE → IE, mapping an initial edge to a terminal edge of a triple. • IS: a function mapping URIs from V into the union set of IR and IP • IL: a function mapping typed literals of V into the set of nodes IN, • LV: a subset of IN, called the set of literal values. 87 • EL, a set of distinct labels to be assigned to the edges in IE, • IEL : EL → IE mapping labels from EL into the set IE of edges. Let ·I be the interpretation function that maps all the URIs and literals in V to the set of nodes IN. A ground triple (s p o .)I is assigned true if all s,p,o ∈ V and ∃e1,e2 ∈ IE : IE(e1) = (sI,pI), IE(e2) = (pI,oI), and IT (e1) = e2.
  87. 87. Model-Theoretic Semantics Simple Interpretation I (cont): • IP, a set of property nodes, also a subset of IN, IP ⊂ IN. • IEXT(p) = {(s,o)|∃e1,e2∈IE: IE(e1) = (s,p), IE (e2) = (p,o), and IT (e1)=e2} • IS_EXT: IPs →IN ×IN, a function mapping a singleton property to a pair of nodes. Particularly, IS_EXT (ps) = (s,o) such that∃e1,e2 ∈IE: IE (e1) = (s,ps), IE(e2) = (ps, o), and IT (e1) = e2. 88 • IEXT , a function assigning to each property a set of node pairs. IEXT : IP → 2IN×IN where IEXT (p) is the extension of generic property p. • IPs, a set of singleton property nodes, also a subset of IN,
  88. 88. • Engines – GraphKE: on top of BerkeleyDB, written in C – RDF-3X extension • Dataset – YAGO2S-SP: contains 267,161,278 triples with 77,895,604 URI nodes and 31,110,161 literals – Extracted 3 politician groups • CS: White House Chief of Staff • SS: Secrectary of State • HR: Speakers of the U.S. House of Representatives 90 Evaluation
  89. 89. 91 Reachability Queries
  90. 90. 92 Shortest Path Queries
  91. 91. 93 Shortest Path Queries (a) CS Policitian Group (b) SS Policitian Group (c) HR Policitian Group
  92. 92. Features N-ary Reification NG SP Triple store Quad store Formal semantics Inference rules Metadata variety Compact DS size SPARQL compliance Comparison with Existing Work 94
  93. 93. Semantic Web Foundations for (1) Representing, (2) Reasoning, and (3) Traversing Contextualized Knowledge Graphs 95 Our goal Thesis statement It is possible to develop (1) a compact and formal representation, (2) a sound and complete inference mechanism, and (3) a model-theoretic graph formalism for contextualized knowledge graphs that can be efficiently implemented.
  94. 94. 96 Acknowledgment Committee Members Amit Sheth (advisor) Krishnaprasad Thirunarayan Olivier Bodenreider (NLM) Kemafor Anyanwu (NCSU) Ramanathan V. Guha (Schema.org) Satya Sahoo (Case Western) Clare Paul (AFRL)
  95. 95. 97 Thank you!
  96. 96. 98 Extra slides
  97. 97. 99 BarackObama marriedTo MichelleObama Original statement Questions 1. Is it true that Barack Obama married to Michelle Obama? 2. Who is Barack Obama married to? Inferred statement BarackObama hasSpouse MichelleObama
  98. 98. 100 Barack Obama married to Michelle Obama in 1992 and in Chicago Original statement Questions 1. Is it true that Barack Obama married to Michelle Obama in Illinois? 2. When did Barack Obama become the spouse of Michelle Obama? 3. Did Barack Obama become the spouse of Michelle Obama before 2008 in Chicago? 4. Did Barack Obama become the spouse of Michelle Obama before 2008 in Illinois?
  99. 99. 101 Barack Obama married to Michelle Obama in 1992 and in Chicago Original statement Inferred statements 1. Barack Obama married to Michelle Obama before 2008 and in Chicago 2. Barack Obama married to Michelle Obama in 1992 and in Illinois 3. Barack Obama married to Michelle Obama before 2008 and in Illinois 4. Barack Obama becomes a spouse of Michelle Obama in 1992 and in Chicago 5. Barack Obama becomes a spouse of Michelle Obama before 2008 and in Chicago 6. Barack Obama becomes a spouse of Michelle Obama before 2008 and in Illinois
  100. 100. 102 Barack Obama married to Michelle Obama in Chicago Barack Obama becomes a spouse of Michelle Obama in Illinois Which rules
  101. 101. Inference types Primary triple Context 1 Context 2 Barack Obama married to Michelle Obama in Chicago in 1992 Barack Obama married to Michelle Obama in Chicago before 2008 Barack Obama married to Michelle Obama in Illinois before 2008 Barack Obama becomes a spouse of Michelle Obama in Chicago in 1992 Barack Obama becomes a spouse of Michelle Obama in Chicago before 2008 Barack Obama becomes a spouse of Michelle Obama in Illinois before 2008 Primary triple Context Type Barack Obama married to Michelle Obama in Chicago Asserted vs. asserted Barack Obama married to Michelle Obama in Illinois Asserted vs. inferred Barack Obama becomes a spouse of Michelle Obama in Chicago Inferred vs. asserted Barack Obama becomes a spouse of Michelle Obama in Illinois Inferred vs. inferred
  102. 102. u rdf:singletonPropertyOf v . x u y . x v y . 104 RDF Deduction Rules: rdf-sp-3 marriedTo#1 rdf:singletonPropertyOf marriedTo . BarackObama marriedTo#1 MichelleObama . BarackObama marriedTo MichelleObama . Statement derivation
  103. 103. 105 RDFS Deduction Rules: rdfs-sp-3 Sub-property marriedTo#1 rdf:singletonPropertyOf marriedTo . marriedTo rdfs:subPropertyOf hasSpouse . marriedTo#1 rdf:singletonPropertyOf hasSpouse . u rdf:singletonPropertyOf x . x rdfs:subPropertyOf y . u rdf:singletonPropertyOf y .
  104. 104. 106 Barack Obama married to Michelle Obama in Chicago Barack Obama married to Michelle Obama in Illinois BarackObama marriedTo#1 MichelleObama . marriedTo#1 rdf:singletonPropertyOf marriedTo . marriedTo#1 happenedIn Chicago . BarackObama marriedTo#1 MichelleObama . marriedTo#1 rdf:singletonPropertyOf marriedTo . marriedTo#1 happenedIn Illinois . Location rule: if ?u happenedIn ?x and ?x partOf ?y then ?u happenedIn ?y marriedTo#1 happenedIn Chicago . Chicago partOf Illinois . marriedTo#1 happenedIn Illinois .
  105. 105. 107 Barack Obama married to Michelle Obama in Chicago Barack Obama becomes a spouse of Michelle Obama in Chicago BarackObama marriedTo#1 MichelleObama . marriedTo#1 rdf:singletonPropertyOf marriedTo . marriedTo#1 happenedIn Chicago . BarackObama marriedTo#1 MichelleObama . marriedTo#1 rdf:singletonPropertyOf hasSpouse . marriedTo#1 happenedIn Chicago . marriedTo#1 rdf:singletonPropertyOf marriedTo . marriedTo rdfs:subPropertyOf hasSpouse . marriedTo#1 rdf:singletonPropertyOf hasSpouse .
  106. 106. 108 Barack Obama married to Michelle Obama in Chicago Barack Obama becomes a spouse of Michelle Obama in Illinois BarackObama marriedTo#1 MichelleObama . marriedTo#1 rdf:singletonPropertyOf marriedTo . marriedTo#1 happenedIn Chicago . BarackObama marriedTo#1 MichelleObama . marriedTo#1 rdf:singletonPropertyOf hasSpouse . marriedTo#1 happenedIn Illinois . Sub-property rule Location rule
  107. 107. Model-Theoretic Semantics RDF Interpretation IRDF: • p ∈ IP if ∃e1,e2 ∈ IE : IE (e1) = (p, rdf:typeI), IE (e2) = (rdf:typeI, rdf:PropertyI), and IT (e1) = e2. A generic property is an instance of the rdf:Property class. • ps ∈ IPs if ∃e1,e2 ∈ IE : IE (e1) = (ps, rdf:typeI), IE (e2) = (rdf:typeI, rdf:SingletonProperty), and IT (e1) = e2. A singleton property is an instance of rdf:SingletonProperty class. 109
  108. 108. Model-Theoretic Semantics RDF Interpretation IRDF : (cont.) • ps ∈ IPs if ∃e1,e2 ∈ IE : IE (e1) = (ps, rdf:singletonPropertyOfI), IE (e2) = (rdf:singletonPropertyOfI, p), and IT (e1) = e2. A singleton property is connected to a generic property via the rdf:singletonPropertyOf. • If ps ∈ IPs then ∃!(e1,e2) : IE (e1) = (s,ps), IE(e2) = (ps,o), IE(e1) = e2, with s,o ∈ IN and e1,e2 ∈ IE. This ensures only one occurrence of a singleton property as a predicate of a triple. 110
  109. 109. Model-Theoretic Semantics RDFS Interpretation I RDFS: • ICEXT : IP → 2IN , a function assigning to each class a set of nodes from IN. ICEXT(c) is called the class extension of class c. Particularly, ICEXT(c) = {s|s ∈ IN, ∃e1,e2 ∈ IE : IE(e1) = (s, rdf:typeI), IE(e2) = (rdf:typeI,c), and IT (e1) = e2}. • if ∃e1,e2,e3,e4 ∈ IE: IT (e1) = e2, IT(e3) = e4, IE (e1) = (x, rdfs:domainI), IE(e2) = (rdfs:domainI,y), IE(e3) = (u,x), IE(e4) = (x,v), then∃e5,e6 ∈IE:IT (e5)=e6, IE (e5) = (u, rdf:typeI), IE(e6) = (rdf:typeI,y). If one class is a domain of a property, then the class extension includes all subjects in the same triples with the property. 111
  110. 110. Model-Theoretic Semantics RDFS Interpretation I RDFS: (cont) • if ∃e1,e2 ∈ IE: IT (e1) = e2, IE(e1) = (x, rdfs:subPropertyOfI), and IE (e2) = (rdfs:subPropertyOfI , y), then x, y ∈IP and IEXT (x) ⊆ IEXT (y). The extension of a property is a subset of the extension of its super property. • if ∃e1, e2 ∈ IE: IT (e1) = e2, IE(e1) = (x, rdfs:subClassOfI ), and IE (e2) = (rdfs:subClassOfI , y), then ICEXT (x) ⊆ ICEXT (y). The extension of a class is a subset of its super class extension. 112

×