Previously at KMWorld 2021, EK joined JPL to share the vision, approach, and delivery of the Institutional Knowledge Graph (IKG), a centrally maintained, ever-evolving knowledge graph identifying and describing JPL’s enterprise-wide concepts, such as people, organizations, projects, and facilities, and the relationships between them. Since August 2020, the IKG has offered a single source of enterprise information that other JPL applications can leverage to reduce redundancy and out-of-date or inaccurate data. In production for 2 years and now with several releases under its belt, the IKG is beginning to fulfill its promise as a foundational layer in the semantic pyramid for additional taxonomies and knowledge graphs to build upon.
At KM World 2022, Bess Schrader, Senior Solutions Consultant at EK, and Ann Bernath, Software Systems Engineer at JPL, shared a follow-up to the IKG journey including a description of the Enterprise Semantic Platform, a look at new taxonomies and knowledge graphs at JPL (enterprise-wide, others specific to engineering, technical, or science domains) and how they are beginning to leverage the IKG’s foundation of JPL concepts to enrich their dataset into a broader context. This presentation discussed different techniques to federate or synchronize multiple knowledge graphs and how these diverse integrations benefit not only the new datasets, but also the IKG as it continues to pursue its overarching dream--providing answers to questions such as, “Who did what when?”, “Who should you call?”, and “Where is the Robotics Lab?”
JPL’s Institutional Knowledge Graph II: A Foundation for Constructing Enterprise and Domain-Specific Semantic Data Sets
1. Ann Bernath, Software Systems Engineer, JPL
Bess Schrader, Senior Consultant,
Enterprise Knowledge
Approved for Public Release: JPL CL#22-5900
JPL’s Institutional Knowledge Graph II
A Foundation for Constructing
Enterprise and Domain-Specific
Semantic Data Sets
3. JPL: A NASA FFRDC Owned and Managed by Caltech
Approved for Public Release: JPL CL#22-5900 3
11/9/2022
4. Ann Bernath, Jet Propulsion Laboratory
1978 2015 2022
Taxonomies
Ontologies
Software
Systems
Engineering
Document
Management
Word
Processing
Information
Modeling
Approved for Public Release: JPL CL#22-5900 4
11/9/2022
5. Bess Schrader, Enterprise Knowledge
employed by
Information Science
degree
had internship
focus
focus
taxonomy
Linked data
blew mind
of
Approved for Public Release: JPL CL#22-5900 5
11/9/2022
6. Last Year at (virtual) KMWorld
Approved for Public Release: JPL CL#22-5900 6
11/9/2022
Leveraging semantic technologies to add context to structured and unstructured data
JPL’s Institutional
Knowledge Graph
7. Person info
Institutional data
§ OCO 2
§ OCO2
§ OCO-2
§ Orbiting Carbon Observatory-2
§ Orbiting Carbon Observatory 2
We talked about our problem: multiple systems that maintain their
own copies of enterprise data often resulting in out-of-sync data
Approved for Public Release: JPL CL#22-5900 7
11/9/2022
8. Institutional info & expertise was hard to find
Approved for Public Release: JPL CL#22-5900
Who has a background in chemistry?
Where’s the Robotics Lab?
Who do I call if there’s a hazard?
Who has experience on recent
Mars lander missions? …
8
11/9/2022
9. Desired state
MSL
Jane Brown
Mars 2020
John Brown
Mars Sample Return
Red Green
Who worked on recent Mars lander
missions?
Topics
q Projects
q Mars 2020
q Mission Targets
q Mars
q Organizations
q Engineering and Science
q Applications
q Issue Tracking
q Publications
q Science
q People
…
Institutional
Knowledge
Graph
“One-stop shopping” Answers to questions
Institutional Context
Approved for Public Release: JPL CL#22-5900 9
11/9/2022
10. JPL Domains
(Business, Eng, Science)
Institutional Layer
(Organizations, People, Projects,
Facilities, …)
Institutional Model/Ontology
W3C Syntax and Protocol Standards
(OWL, RDF, RDFS, SHACL, SKOS, SPARQL, …)
We realized we needed to build an institutional layer in
our semantic pyramid
Institutional
layer was
missing
Approved for Public Release: JPL CL#22-5900 10
11/9/2022
14. JPL Domains
(Business, Eng, Science)
Institutional Knowledge Graph
(Organizations, People, Projects,
Facilities, …)
Institutional Model/Ontology
(Unique identifiers for institutional concepts)
W3C Syntax and Protocol Standards
(OWL, RDF, RDFS, SHACL, SKOS, SPARQL, …)
Our semantic pyramid takes shape
Domain specific
knowledge graphs
can build upon it
Approved for Public Release: JPL CL#22-5900 14
11/9/2022
Now that we’ve
laid this
foundation
15. Domain data sets are beginning to
leverage our semantic foundation,
enriching IKG in return
11/9/2022 Approved for Public Release: JPL CL#22-5900 15
Institutional
Knowledge
Graph
Labs
• Responsibilities
• Facilities
Master List of
Controlled
Records
• Retention Schedules
• Roles
Business Event
Transactions
• Work Activity
• Experience
Infrastructure
Technical
Architecture
Description
• Functions
• Systems
• Services
• Roles
JPL Taxonomy
• Real-World Concepts
Science
Taxonomy
• Projects
• Expertise
16. For instance, the Labs data set enriches what the IKG
knows about people, organizations, and locations
11/9/2022 Approved for Public Release: JPL CL#22-5900 16
Robotics
Lab
Jane Doe
John
Smith
Robotics
Design
Organization
Building 555
Room 000
Building 555
Room 000B
responsible for
accountable for
managed by
located in
located in
17. The IKG enriches
lifecycle event
data while
lifecycle event
data enriches
what the IKG
knows about a
person’s
expertise
11/9/2022 Approved for Public Release: JPL CL#22-5900 17
19. Why bother connecting data
sets?
11/9/2022 Approved for Public Release: JPL CL#22-5900 19
20. Connecting data from multiple systems is important
not only does it
allow us to connect
the dots …
11/9/2022 Approved for Public Release: JPL CL#22-5900 20
21. Connecting data from multiple systems is important
…it also allows us to find discrepancies between different systems.
For example, the IKG has found:
Buildings appearing in one location application but not
another
People incorrectly listed as organization managers in
HR system
People that are no longer at JPL assigned to active
roles
11/9/2022 Approved for Public Release: JPL CL#22-5900 21
23. Show of hands:
who knows what a URI is?
11/9/2022 Approved for Public Release: JPL CL#22-5900 23
a lot of us only a few of us
24. How do we encode this in a way that machines can understand it?
Approved for Public Release: JPL CL#22-5900 24
11/9/2022
Let’s Pause for Some RDF Basics
25. Resource Description Framework
RDF – Resource Description
Framework
● “Things not strings”
● W3C standard
● Model for data interchange on the web
● Allows integration of differing schemas or
representations of data
Approved for Public Release: JPL CL#22-5900 25
11/9/2022
26. Robotics
Lab
John
Smith
accountable for
Uniform Resource Identifiers (URI)
To be machine readable, all of
our bubbles and lines (i.e. the
elements of our triple) need a
Uniform Resource Identifier
(URI). URIs are also known as
IRIs (Internationalized
Resource Identifier)
URIs are unique identifiers that
look like URLs (although they
don’t actually have to go
anywhere)
Approved for Public Release: JPL CL#22-5900 26
11/9/2022
29. lab:Robotics_Lab
person:jsmith
ikg:accountableFor
Uniform Resource Identifiers (URI)
Using prefixes,
our Robotics Lab
example would
look like this:
Approved for Public Release: JPL CL#22-5900 29
11/9/2022
@prefix ikg: <http://example.jpl.nasa.gov/ontologies/ikg#>
@prefix lab: <http://example.jpl.nasa.gov/ontologies/ikg/Lab/>
@prefix person: <http://example.jpl.nasa.gov/ontologies/ikg/Person/>
30. Namespaces
In addition to making URIs easier for humans to read,
namespaces they can also help with establishing data ownership
and governance. For example:
ikg:accountableFor
The “ikg” prefix indicates this relationship is owned by the Institutional
Knowledge Graph (IKG). It may have a specific meaning in that context, and
changes are controlled by the IKG team.
hr:accountableFor
The “hr” prefix indicates this relationship is owned by Human Resources. It
may have a specific meaning in that context, and changes are controlled by
the HR team.
Approved for Public Release: JPL CL#22-5900 30
11/9/2022
31. Uniform Resource Identifiers (URI)
URIs are critical to building knowledge graphs, especially for ensuring that
different semantic data sets can talk to each other.
Reusing URIs across datasets that refer to the same concept helps ensure
that:
Approved for Public Release: JPL CL#22-5900 31
11/9/2022
entities only need to be defined once
entities have a clear owner
semantic data sets can be linked
32. How to link data sets
11/9/2022 Approved for Public Release: JPL CL#22-5900 32
33. Linking Methods
11/9/2022 Approved for Public Release: JPL CL#22-5900 33
• Re-use URIs for institutional entities
enabling federated queries
Ideal
• Match on “hooks” (important properties)
such as key identifiers (employee
numbers, usernames, …)
Pretty good
• Alternative labels/educated guesses
(matching rules)
Probable
• Manual review
Sometimes
required
How do we match or link entities across different semantic
data sets?
34. Linking Methods – Reusing URIs
11/9/2022 Approved for Public Release: JPL CL#22-5900 34
• Use URIs for institutional entities
enabling federated queries
Ideal
• Hooks such as key identifiers
(employee numbers, usernames, …)
Pretty good
• Alternative labels/educated guesses
(matching rules)
Probable
• Manual review
Sometimes
required
Data Set 1
person:bschrader a ikg:Person ;
rdfs:label “Bess P Schrader” .
Data Set 2
doc:1234 a jpl:Document ;
rdfs:label “KM World 2022 Presentation” ;
jpl:createdBy person:bschrader .
In the best case scenario, owners/creators of semantic data sets reuse
URIs between data sets at the time of creation, so there’s no guess
work involved in matching entities across data sets.
35. Linking Methods – Using Hooks
11/9/2022 Approved for Public Release: JPL CL#22-5900 35
Data Set 1
person:bschrader a ikg:Person ;
rdfs:label “Bess P Schrader” ;
ikg:username “bschrader” .
Data Set 2
doc:1234 a jpl:Document ;
rdfs:label “KM World 2022 Presentation” ;
jpl:createdBy jpl:Person_123456 .
jpl:Person_123456 jpl:username “bschrader” .
• Hooks such as key identifiers
(employee numbers, usernames,
…)
Pretty good
If the same URIs aren’t used across data sets, commonly used
institutional identifiers (like usernames, department codes, etc.) can be
another good option for finding entity matches.
36. Linking Methods – Matching Rules
11/9/2022 Approved for Public Release: JPL CL#22-5900 36
Data Set 1
person:bschrader a ikg:Person ;
rdfs:label “Bess P Schrader” ;
ikg:username “bschrader” ;
ikg:firstName “Bess” ;
ikg:lastName “Schrader” ;
ikg:memberOf org:1234 .
org:1234 a ikg:Organization ;
ikg:organizationCode “1234” .
Data Set 2
doc:1234 a jpl:Document ;
rdfs:label “KM World 2022 Presentation” ;
jpl:createdBy jpl:Person_123456 ;
jpl:organization “1234” .
jpl:Person_123456 rdfs:label “B. Schrader” .
• Alternative labels/educated
guesses (matching rules)
Probable
Lacking re-used URIs or institutional identifiers, we often have to make
up our own matching logic to determine if two entities are the same.
37. Linking Methods – Matching Rules
11/9/2022 Approved for Public Release: JPL CL#22-5900 37
Data Set 1
person:bschrader a ikg:Person ;
rdfs:label “Bess P Schrader” ;
ikg:username “bschrader” ;
ikg:firstName “Bess” ;
ikg:lastName “Schrader” ;
ikg:memberOf org:1234 .
org:1234 a ikg:Organization ;
ikg:organizationCode “1234” .
Data Set 2
doc:1234 a jpl:Document ;
rdfs:label “KM World 2022 Presentation” ;
jpl:createdBy jpl:Person_123456 ;
jpl:organization “1234” .
jpl:Person_123456 rdfs:label “B. Schrader” .
IF
The first initial from data set 2 matches the first character of the first name in data set 1
AND
The last name from data set 2 matches the last name from data set 1
AND
The organization value from data set 2 matches the organization code of the
organization of which the person is a member
THEN The two entities are a match
• Alternative labels/educated
guesses (matching rules)
Probable
38. Linking Methods – Matching Rules
Extraction, or label matching, against the data already in the graph helps
with the transformation, allowing us to standardize/match references to
projects in one system to our existing URI for that project.
• Alternative labels/educated
guesses (matching rules)
Probable
Data Set 2
doc:5678 a jpl:Document ;
rdfs:label “OCO-2 Meeting Notes” ;
jpl:relatedProject project:OCO-2 .
project:OCO-2 rdfs:label “OCO-2” .
11/9/2022 Approved for Public Release: JPL CL#22-5900 38
Data Set 1
mission:OCO2 skos:prefLabel
“Orbiting Carbon Observatory 2” ;
skos:altLabel “Orbiting Carbon Observatory-2”,
“OCO 2”, “OCO2”, “OCO-2” .
39. Linking Methods – Manual Review
11/9/2022 Approved for Public Release: JPL CL#22-5900 39
Data Set 1
person:bschrader a ikg:Person ;
rdfs:label “Bess P Schrader” ;
ikg:username “bschrader” ;
ikg:firstName “Bess” ;
ikg:lastName “Schrader” ;
ikg:memberOf org:1234 .
person:schraderb a ikg:Person ;
rdfs:label “Bess X Schrader” ;
ikg:username “schraderb” ;
ikg:firstName “Bess” ;
ikg:lastName “Schrader” ;
ikg:memberOf org:5678 .
Data Set 2
doc:1234 a jpl:Document ;
rdfs:label “KM World 2022 Presentation” ;
jpl:createdBy jpl:Person_123456 .
jpl:Person_123456 rdfs:label “B. Schrader” ;
jpl:organization “1234” .
• Manual review
Sometimes
required
In some cases, the only option is
to manually reconcile entities.
40. Linking Methods – Manual Review
11/9/2022 Approved for Public Release: JPL CL#22-5900 40
Data Set 1
person:bschrader a ikg:Person ;
rdfs:label “Bess P Schrader” ;
ikg:username “bschrader” ;
ikg:firstName “Bess” ;
ikg:lastName “Schrader” ;
ikg:memberOf org:1234 .
person:schraderb a ikg:Person ;
rdfs:label “Bess X Schrader” ;
ikg:username “schraderb” ;
ikg:firstName “Bess” ;
ikg:lastName “Schrader” ;
ikg:memberOf org:5678 .
Data Set 2
doc:1234 a jpl:Document ;
rdfs:label “KM World 2022 Presentation” ;
jpl:createdBy jpl:Person_123456 .
jpl:Person_123456 rdfs:label “B. Schrader” ;
jpl:organization “1234” .
• Manual review
Sometimes
required
jpl:Person_123456 in data set 2 is
probably person:bschrader in data
set 1, not person:schraderb
41. Data Access Methods – Where does the linking happen?
11/9/2022 Approved for Public Release: JPL CL#22-5900 41
So we’ve found entity matches across our
data sets using our various linking
methods…now what?
We usually link data sets one of two ways
Linking
Methods
Copy data between
data sets
Leave data in place
and run federated
queries
42. Copy Data Method Example
11/9/2022 Approved for Public Release: JPL CL#22-5900 42
Data Set 1
person:bschrader a ikg:Person ;
rdfs:label “Bess P Schrader” .
Data Set 2
doc:1234 a jpl:Document ;
rdfs:label “KM World 2022 Presentation” ;
jpl:createdBy person:bschrader .
Data Set 2 - Augmented
doc:1234 a jpl:Document ;
rdfs:label “KM World 2022 Presentation” ;
jpl:createdBy person:bschrader .
person:bschrader a ikg:Person ;
rdfs:label “Bess P Schrader” .
43. Copying Data Based on Hooks – Labs and IKG
11/9/2022 Approved for Public Release: JPL CL#22-5900 43
Labs
44. Federated Query Example
11/9/2022 Approved for Public Release: JPL CL#22-5900 44
Data Set 1
person:bschrader a ikg:Person ;
rdfs:label “Bess P Schrader” .
Data Set 2
doc:1234 a jpl:Document ;
rdfs:label “KM World 2022 Presentation” ;
jpl:createdBy person:bschrader .
who created the KM World 2022 Presentation?
Data Set 2 KM World 2022 Presentation was created by
person:bschrader.
Data Set 1 person:bschrader has the name Bess P
Schrader.
45. Data Access Methods
– Where & when does the linking happen?
11/9/2022 Approved for Public Release: JPL CL#22-5900 45
Method Pros Cons
Copy data Ø Faster, straightforward queries
Ø More integration options
§ May be too much data for some tools
§ Requires synchronization to keep it up-to-
date (unless you want a snapshot in time)
§ Possibilities for failure/downtime - usual
processing risks
Federated
queries
Ø Access permissions can be separated
Ø Queries are real-time
§ Queries could be slower and more
complex
§ Could limit integration options with other
applications
47. Enabling applications to enrich their data sets
11/9/2022 Approved for Public Release: JPL CL#22-5900 47
Connected data sets allow applications
to pull data in from across a variety of
data sets on demand
101
102
103 104
105
106
LAB INFO
Lab Name Lab X
Managing Org. Organization 1234
Lab Lead Sally Smith
Safety Coordinator William Safety
Facility Search
App
Enterprise Search
48. Enabling the ability to answer more complex queries
11/9/2022 Approved for Public Release: JPL CL#22-5900 48
By Org
• Which members of org 123 were active in
quality assurance activities over the past 3
months?
By Role
• Which mechanical engineers with
experience on rover missions left the Lab
this year?
By Project
• Which applications saw the most activity
leading up to the Critical Design Review
and in which applications?
By Person • Which lifecycle activities was J. Engineer
participating in that are still in progress?
By Topic
• What activity is currently in progress
involving robotics?
SPARQL
select *
{
service <repository:IKG> {?s ?p ?o}
service <repository:activity> {?s ?p ?o}
}
49. Enabling simple question-answer capability - natural
language queries so anyone can ask simple questions to
get answers
11/9/2022 Approved for Public Release: JPL CL#22-5900 49
50. Before we go
Approved for Public Release: JPL CL#22-5900 50
11/9/2022
skos:altLabel “In Conclusion”
51. Linking data sets takes effort …
11/9/2022 Approved for Public Release: JPL CL#22-5900 51
Thinking
ahead
Data
cleanup or
mitigation
steps
Semantic
momentum
but it’s worth it
52. “Standard” URIs are important – we might even say critical
11/9/2022 Approved for Public Release: JPL CL#22-5900 52
• Avoids name clashes - …/Records#Liaison vs …/Public_Relations#Liaison
• Identifies domain and data owner
Establish namespace owners for different data sets
• Enables ideal concept matching right away
• Enables federated queries
Use standard URIs from the get-go
• Keeps URI generation simple
• Usernames/employee IDs numbers
• Organization codes/identifiers
• Building numbers/location identifiers
• Matches consumer expectations
Reuse existing identifiers where possible
53. “Knowledge graphs are awesome.”
11/9/2022 Approved for Public Release: JPL CL#22-5900 53
enables to answer