SlideShare a Scribd company logo
1 of 42
Download to read offline
Open Source Infrastructure for Linked Data
CIDOC-CRM 2023, Mexico City
The LOD Gateway
David Newbury
Assistant Director, Software and User Experience, Getty
Hi! I’m David.
I lead the software and user experience teams at Getty.
Getty is a big museum/research hub in Los Angeles. We do lots of things with
data.
All of the actual work here was done by my fabulously talented team. I just talk.
And we’re not Getty Images. Same rich family, same last name, no connection.
2
Introduction
Getty has been doing Linked Data since 2014,
starting with the Getty Vocabularies.
It’s a collection of concepts, people, and places
deeply relevant to the study of art and
architecture.
3
Getty’s Linked Data: Getty Vocabularies
Since then, we’ve moved most of our major
systems to use Linked Data—including our
archives…
4
Getty’s Linked Data: Archival Records
Since then, we’ve moved most of our major
systems to use Linked Data—including our
archives…
… and our museum collection.
5
Getty’s Linked Data: Archival Records
We’ve also built a complex, powerful
infrastructure to support doing this across
our application landscape.
It’s been fun. We’ve learned a lot.
6
Getty’s Linked Data: APIs
Behind the scenes, all of these applications
are powered by a utility called The LOD
Gateway.
We’ve recently open-sourced this tool, and I’d
like to share it with you today.
7
Getty’s Linked Data: The LOD Gateway
This API system was designed to help Getty
manage one of the fundamental complications
that comes with using Linked Data:
Graphs vs. Documents.
8
Getty’s Linked Data: The LOD Gateway
Let’s take a basic JSON-LD record:
"@context": "https://linked.art/ns/v1/linked-art.json",
"id": "object/1",
"type": "HumanMadeObject",
"identified_by": {
"id": "object/1/name",
"type": "Name",
"content": "Irises"
},
"produced_by": {
"id": "object/1/production",
"carried_out_by": {"id":"person/1"}
}
9
Getty’s Linked Data: The LOD Gateway
And a second, related one:
"@context": "https://linked.art/ns/v1/linked-art.json",
"id": "person/1",
"type": "Person",
"identified_by": {
"id": "person/1/name",
"type": "Name",
"content": "Vincent Van Gogh"
}
10
Getty’s Linked Data: The LOD Gateway
These could be seen as two separate documents:
11
Getty’s Linked Data: The LOD Gateway
"@context":
"https://linked.art/ns/v1/linked-art.json",
"id": "person/1",
"type": "Person",
"identified_by": {
"id": "person/1/name",
"type": "Name",
"content": "Vincent Van Gogh"
}
"@context":
"https://linked.art/ns/v1/linked-art.json",
"id": "object/1",
"type": "HumanMadeObject",
"identified_by": {
"id": "object/1/name",
"type": "Name",
"content": "Irises"
},
"produced_by": {
"id": "object/1/production",
"carried_out_by": {"id":"person/1"}
}
Or as a single graph.
12
Getty’s Linked Data: The LOD Gateway
From the point of view of the data, these
two structures are equivalent—they contain
the same facts.
But from a usability perspective, they make
different things easy or hard.
13
Getty’s Linked Data: The LOD Gateway
"@context":
"https://linked.art/ns/v1/linked-
art.json",
"id": "person/1",
"type": "Person",
"identified_by": {
"id": "person/1/name",
"type": "Name",
"content": "Vincent
Van Gogh"
}
"@context":
"https://linked.art/ns/v1/linked-art.js
on",
"id": "object/1",
"type": "HumanMadeObject",
"identified_by": {
"id": "object/1/name",
"type": "Name",
"content": "Irises"
},
"produced_by": {
"id": "object/1/production",
"carried_out_by": {"id":"person/1"}
}
Documents are optimized for Access:
They provide a specific set of data bundled
together by the data creator that provide all
the facts you need…given a specific context.
14
Documents: For Access and Discovery
"@context":
"https://linked.art/ns/v1/linked-art.json",
"id": "object/1",
"type": "HumanMadeObject",
"identified_by": {
"id": "object/1/name",
"type": "Name",
"content": "Irises"
},
"produced_by": {
"id": "object/1/production",
"carried_out_by": {"id":"person/1"}
}
Graphs, alternately, are optimized for querying:
Allowing a user to define a specific context based
on novel criteria of interest, and returning that
subset of facts.
15
Graphs: For Queries
“What objects does Getty have that have images larger than
1200px on the longest side that have been exhibited in both New
York and Paris and were created by artists who lived before 1850?”
and
“What is the tombstone data about Irises?”
16
Imagine two Questions:
At the Getty, we have never asked:
“What objects does Getty have that have images larger than
1200px on the longest side that have been exhibited in both New
York and Paris and were created by artists who lived before 1850?
…but we ask
What is the tombstone data about Irises?
Several thousand times a day.
17
Imagine two Questions:
Having an interface for documents lets us
provide a simple, easily understandable
record that maps well to known contexts.
This is important, because people usually
expect these contexts. It makes answering
common questions simple.
18
Documents: For Access and Discovery
It also maps nicely to the sort of affordances
that work well on the internet—REST APIs,
cache control, JSON documents, webpages.
This is also important, because using these
well-known systems helps us make our
systems fast and easy to build.
19
Documents: For Access and Discovery
Research is different—each scholar brings
their own question and their own context.
Meeting their need means empowering
them to draw their own boundaries within
the data.
20
Graphs: For Asking Questions
Doing so is complex—it moves the burden of
defining the relevant context to the user of
the data, not the creator of the data.
But it makes asking new questions possible,
even if it might be inefficient or complicated.
21
Graphs: For Asking Questions
The LOD Gateway is a tool designed to
allow for both use cases..
It allows you to create, update, and delete
JSON-LD documents, and behind the
scenes it will keep a triplestore in sync with
those changes.
22
Meeting Both Needs
This works for Linked.Art records, IIIF
Manifest, Web Annotations: any JSON-LD
document.
If you POST it to the LOD Gateway, that
record will be available at the URL defined
in that document’s id property.
23
Meeting Both Needs
It’s also RDF-aware: If there are nested children included the main document,
it automatically makes those dereferencable, too.
24
LOD Gateway: RDF Aware
https://example.com/object/1
https://example.com/object/1/identifier/1
You can also request documents in other RDF formats.
25
LOD Gateway: RDF Aware
https://example.com/object/1?format=turtle
It also provides both a full SPARQL API and
an embedded GUI for testing queries.
It can be configured to use any SPARQL
Triplestore—we use Fuseki in testing and
Amazon Neptune in production.
26
LOD Gateway: SPARQL-Enabled
This flexibility makes it simple to write and retrieve
data in a form that matches your primary use case,
but still allows you the flexibility to go beyond
that—either for research or for unexpected
features—without needing to rewrite your API.
27
Two Views: One Set of Facts
You can also configure it to run without the
RDF integration as a JSON document store.
We do this all the time, because of another
feature of the LOD Gatway: Change Logs!
28
LOD Gateway: now SPARQL-Free
The third critical use for our data is synchronization across systems.
A editor changes a record, which means the API needs updated, which means the website
needs updated, and the search interfaces, and third-party systems…
29
LOD Gateway: Tracking Changes
Every time you create, update, or delete a
record in the LOD Gateway, it adds a entry
to an Activity Stream.
This lets a consuming system identify
only the records that have been changed
since the last time they synced.
30
LOD Gateway: Tracking Changes
You can do this for the whole dataset, for a
given entity type, or even for a single entity.
This happens automatically, every time you
update a record in the LOD Gateway.
It’s even smart enough to not generate a
change event if the data didn’t change.
31
LOD Gateway: Tracking Changes
These change logs follow the
W3C ActivityStream standard and are
implemented using the patterns from the
IIIF Change Discovery API.
Using standards makes it easy for external
consumers to build integrations against
these flows.
32
LOD Gateway: ActivityStreams and Standards
The change log only describes which
records changed. But for some kinds of
data, it's valuable to also be able to see what
has changed over time for a given record.
To do so, the LOD Gateway also supports
Memento, the standard underneath the
Internet Archive.
33
LOD Gateway: ActivityStreams and Standards
This feature lets you automatically open older
versions of the record—providing an audit log
and the ability for scholars to understand
how knowledge changes over time.
34
LOD Gateway: ActivityStreams and Standards
How do we use this?
How can you use this?
35
36
Getty’s Data Infrastructure: Managing Complexity
37
Getty’s Data Infrastructure: 14 Instances
One tool, many needs.
Building this tool has let a small team support 14 different APIs—and put in place
new ones whenever we need.
Our smallest instance is 250 records. Our largest is over 1 million.
38
LOD Gateway: Consistent Patterns, Consistent Tools
Critical Infrastructure.
The only way we’ve built what we have is using this tool.
Every research tool, every API.
39
LOD Gateway: Consistent Patterns, Consistent Tools
And now you can, too.
As of today, we’ve released this tool as open source software under the BSD-3
license.
https://github.com/thegetty/lod-gateway
40
LOD Gateway: Consistent Patterns, Consistent Tools
This is a “Third System”:
This is heavily-tested infrastructure, built because we have made so many
mistakes.
It’s not perfect, but our hope is that it helps you avoid at least the mistakes we know
about—and allows the brilliant modeling ecosystem CIDOC builds be used in
production by others around the world.
41
LOD Gateway: Built on top of our mistakes
Thank you!
Find me or ask me questions at:
dnewbury@getty.edu
42

More Related Content

Similar to The LOD Gateway: Open Source Infrastructure for Linked Data

Opensocial Haifa Seminar - 2008.04.08
Opensocial Haifa Seminar - 2008.04.08Opensocial Haifa Seminar - 2008.04.08
Opensocial Haifa Seminar - 2008.04.08
Ari Leichtberg
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh Platform
Sanjay Padhi, Ph.D
 

Similar to The LOD Gateway: Open Source Infrastructure for Linked Data (20)

History of NoSQL and Azure Documentdb feature set
History of NoSQL and Azure Documentdb feature setHistory of NoSQL and Azure Documentdb feature set
History of NoSQL and Azure Documentdb feature set
 
What to Expect of the LSST Archive: The LSST Science Platform
What to Expect of the LSST Archive: The LSST Science PlatformWhat to Expect of the LSST Archive: The LSST Science Platform
What to Expect of the LSST Archive: The LSST Science Platform
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
 
Linked Data to Improve the OER Experience
Linked Data to Improve the OER ExperienceLinked Data to Improve the OER Experience
Linked Data to Improve the OER Experience
 
Information sharing pipeline
Information sharing pipelineInformation sharing pipeline
Information sharing pipeline
 
Mongo Internal Training session by Soner Altin
Mongo Internal Training session by Soner AltinMongo Internal Training session by Soner Altin
Mongo Internal Training session by Soner Altin
 
Node-RED Interoperability Test
Node-RED Interoperability TestNode-RED Interoperability Test
Node-RED Interoperability Test
 
LODLAM Landscape NOTES
LODLAM Landscape NOTESLODLAM Landscape NOTES
LODLAM Landscape NOTES
 
Tell Me Quality Documentation
Tell Me Quality DocumentationTell Me Quality Documentation
Tell Me Quality Documentation
 
Session 2 - NGSI-LD primer & Smart Data Models | Train the Trainers Program
Session 2 - NGSI-LD primer & Smart Data Models | Train the Trainers ProgramSession 2 - NGSI-LD primer & Smart Data Models | Train the Trainers Program
Session 2 - NGSI-LD primer & Smart Data Models | Train the Trainers Program
 
Opensocial Haifa Seminar - 2008.04.08
Opensocial Haifa Seminar - 2008.04.08Opensocial Haifa Seminar - 2008.04.08
Opensocial Haifa Seminar - 2008.04.08
 
Linked Data Generation for the University Data From Legacy Database
Linked Data Generation for the University Data From Legacy Database  Linked Data Generation for the University Data From Legacy Database
Linked Data Generation for the University Data From Legacy Database
 
Data Infrastructure at LinkedIn
Data Infrastructure at LinkedIn Data Infrastructure at LinkedIn
Data Infrastructure at LinkedIn
 
Building Linked Data Applications
Building Linked Data ApplicationsBuilding Linked Data Applications
Building Linked Data Applications
 
Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011 Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh Platform
 
Semantic web browser
Semantic web browser Semantic web browser
Semantic web browser
 
Ajaxworld Opensocial Presentation
Ajaxworld Opensocial PresentationAjaxworld Opensocial Presentation
Ajaxworld Opensocial Presentation
 
Linked data presentation for libraries (COMO)
Linked data presentation for libraries (COMO)Linked data presentation for libraries (COMO)
Linked data presentation for libraries (COMO)
 
Building a Real-Time Data Pipeline: Apache Kafka at LinkedIn
Building a Real-Time Data Pipeline: Apache Kafka at LinkedInBuilding a Real-Time Data Pipeline: Apache Kafka at LinkedIn
Building a Real-Time Data Pipeline: Apache Kafka at LinkedIn
 

More from David Newbury

IIIF For Small Projects
IIIF  For Small ProjectsIIIF  For Small Projects
IIIF For Small Projects
David Newbury
 

More from David Newbury (20)

Linked Data on a Budget
Linked Data on a BudgetLinked Data on a Budget
Linked Data on a Budget
 
USE ME: progressive integration of IIIF with new software services at the Getty
USE ME: progressive integration of IIIF with new software services at the GettyUSE ME: progressive integration of IIIF with new software services at the Getty
USE ME: progressive integration of IIIF with new software services at the Getty
 
IIIF Across Platforms | IIIF Community Call, January 2021
IIIF Across Platforms | IIIF Community Call, January 2021IIIF Across Platforms | IIIF Community Call, January 2021
IIIF Across Platforms | IIIF Community Call, January 2021
 
IIIF Canvases as First Class Citizens
IIIF Canvases as First Class CitizensIIIF Canvases as First Class Citizens
IIIF Canvases as First Class Citizens
 
IIIF and Linked Open Data: LODLAM 2020
IIIF and Linked Open Data: LODLAM 2020IIIF and Linked Open Data: LODLAM 2020
IIIF and Linked Open Data: LODLAM 2020
 
How to Fail Interdisciplinarily
How to Fail InterdisciplinarilyHow to Fail Interdisciplinarily
How to Fail Interdisciplinarily
 
Sharing Data Across Memory Institutions
Sharing Data Across Memory InstitutionsSharing Data Across Memory Institutions
Sharing Data Across Memory Institutions
 
Extending IIIF 3.0
Extending IIIF 3.0Extending IIIF 3.0
Extending IIIF 3.0
 
NDSR Learning Enrichment: Data Models and Linked Data
NDSR Learning Enrichment: Data Models and Linked DataNDSR Learning Enrichment: Data Models and Linked Data
NDSR Learning Enrichment: Data Models and Linked Data
 
Fuzzy Dates & the Digital Humanities
Fuzzy Dates & the Digital HumanitiesFuzzy Dates & the Digital Humanities
Fuzzy Dates & the Digital Humanities
 
Telling Stories with Data: Class Notes 2
Telling Stories with Data:  Class Notes 2Telling Stories with Data:  Class Notes 2
Telling Stories with Data: Class Notes 2
 
Telling Stories With Data: Class 1
Telling Stories With Data: Class 1Telling Stories With Data: Class 1
Telling Stories With Data: Class 1
 
21st Century Provenance: Lessons Learned Building Art Tracks
21st Century Provenance:  Lessons Learned Building Art Tracks21st Century Provenance:  Lessons Learned Building Art Tracks
21st Century Provenance: Lessons Learned Building Art Tracks
 
Art Tracks: From Provenance to Structured Data
Art Tracks: From Provenance to Structured DataArt Tracks: From Provenance to Structured Data
Art Tracks: From Provenance to Structured Data
 
Linked Data: Worse is Better
Linked Data:  Worse is BetterLinked Data:  Worse is Better
Linked Data: Worse is Better
 
Understanding D3
Understanding D3Understanding D3
Understanding D3
 
Art Tracks: A technical deep dive.
Art Tracks:  A technical deep dive.Art Tracks:  A technical deep dive.
Art Tracks: A technical deep dive.
 
Using Linked Data: American Art Collaborative, Oct. 3, 2016
Using Linked Data:  American Art Collaborative, Oct. 3, 2016Using Linked Data:  American Art Collaborative, Oct. 3, 2016
Using Linked Data: American Art Collaborative, Oct. 3, 2016
 
Data 101: Making Charts from Spreadsheets
Data 101: Making Charts from SpreadsheetsData 101: Making Charts from Spreadsheets
Data 101: Making Charts from Spreadsheets
 
IIIF For Small Projects
IIIF  For Small ProjectsIIIF  For Small Projects
IIIF For Small Projects
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 

The LOD Gateway: Open Source Infrastructure for Linked Data

  • 1. Open Source Infrastructure for Linked Data CIDOC-CRM 2023, Mexico City The LOD Gateway David Newbury Assistant Director, Software and User Experience, Getty
  • 2. Hi! I’m David. I lead the software and user experience teams at Getty. Getty is a big museum/research hub in Los Angeles. We do lots of things with data. All of the actual work here was done by my fabulously talented team. I just talk. And we’re not Getty Images. Same rich family, same last name, no connection. 2 Introduction
  • 3. Getty has been doing Linked Data since 2014, starting with the Getty Vocabularies. It’s a collection of concepts, people, and places deeply relevant to the study of art and architecture. 3 Getty’s Linked Data: Getty Vocabularies
  • 4. Since then, we’ve moved most of our major systems to use Linked Data—including our archives… 4 Getty’s Linked Data: Archival Records
  • 5. Since then, we’ve moved most of our major systems to use Linked Data—including our archives… … and our museum collection. 5 Getty’s Linked Data: Archival Records
  • 6. We’ve also built a complex, powerful infrastructure to support doing this across our application landscape. It’s been fun. We’ve learned a lot. 6 Getty’s Linked Data: APIs
  • 7. Behind the scenes, all of these applications are powered by a utility called The LOD Gateway. We’ve recently open-sourced this tool, and I’d like to share it with you today. 7 Getty’s Linked Data: The LOD Gateway
  • 8. This API system was designed to help Getty manage one of the fundamental complications that comes with using Linked Data: Graphs vs. Documents. 8 Getty’s Linked Data: The LOD Gateway
  • 9. Let’s take a basic JSON-LD record: "@context": "https://linked.art/ns/v1/linked-art.json", "id": "object/1", "type": "HumanMadeObject", "identified_by": { "id": "object/1/name", "type": "Name", "content": "Irises" }, "produced_by": { "id": "object/1/production", "carried_out_by": {"id":"person/1"} } 9 Getty’s Linked Data: The LOD Gateway
  • 10. And a second, related one: "@context": "https://linked.art/ns/v1/linked-art.json", "id": "person/1", "type": "Person", "identified_by": { "id": "person/1/name", "type": "Name", "content": "Vincent Van Gogh" } 10 Getty’s Linked Data: The LOD Gateway
  • 11. These could be seen as two separate documents: 11 Getty’s Linked Data: The LOD Gateway "@context": "https://linked.art/ns/v1/linked-art.json", "id": "person/1", "type": "Person", "identified_by": { "id": "person/1/name", "type": "Name", "content": "Vincent Van Gogh" } "@context": "https://linked.art/ns/v1/linked-art.json", "id": "object/1", "type": "HumanMadeObject", "identified_by": { "id": "object/1/name", "type": "Name", "content": "Irises" }, "produced_by": { "id": "object/1/production", "carried_out_by": {"id":"person/1"} }
  • 12. Or as a single graph. 12 Getty’s Linked Data: The LOD Gateway
  • 13. From the point of view of the data, these two structures are equivalent—they contain the same facts. But from a usability perspective, they make different things easy or hard. 13 Getty’s Linked Data: The LOD Gateway "@context": "https://linked.art/ns/v1/linked- art.json", "id": "person/1", "type": "Person", "identified_by": { "id": "person/1/name", "type": "Name", "content": "Vincent Van Gogh" } "@context": "https://linked.art/ns/v1/linked-art.js on", "id": "object/1", "type": "HumanMadeObject", "identified_by": { "id": "object/1/name", "type": "Name", "content": "Irises" }, "produced_by": { "id": "object/1/production", "carried_out_by": {"id":"person/1"} }
  • 14. Documents are optimized for Access: They provide a specific set of data bundled together by the data creator that provide all the facts you need…given a specific context. 14 Documents: For Access and Discovery "@context": "https://linked.art/ns/v1/linked-art.json", "id": "object/1", "type": "HumanMadeObject", "identified_by": { "id": "object/1/name", "type": "Name", "content": "Irises" }, "produced_by": { "id": "object/1/production", "carried_out_by": {"id":"person/1"} }
  • 15. Graphs, alternately, are optimized for querying: Allowing a user to define a specific context based on novel criteria of interest, and returning that subset of facts. 15 Graphs: For Queries
  • 16. “What objects does Getty have that have images larger than 1200px on the longest side that have been exhibited in both New York and Paris and were created by artists who lived before 1850?” and “What is the tombstone data about Irises?” 16 Imagine two Questions:
  • 17. At the Getty, we have never asked: “What objects does Getty have that have images larger than 1200px on the longest side that have been exhibited in both New York and Paris and were created by artists who lived before 1850? …but we ask What is the tombstone data about Irises? Several thousand times a day. 17 Imagine two Questions:
  • 18. Having an interface for documents lets us provide a simple, easily understandable record that maps well to known contexts. This is important, because people usually expect these contexts. It makes answering common questions simple. 18 Documents: For Access and Discovery
  • 19. It also maps nicely to the sort of affordances that work well on the internet—REST APIs, cache control, JSON documents, webpages. This is also important, because using these well-known systems helps us make our systems fast and easy to build. 19 Documents: For Access and Discovery
  • 20. Research is different—each scholar brings their own question and their own context. Meeting their need means empowering them to draw their own boundaries within the data. 20 Graphs: For Asking Questions
  • 21. Doing so is complex—it moves the burden of defining the relevant context to the user of the data, not the creator of the data. But it makes asking new questions possible, even if it might be inefficient or complicated. 21 Graphs: For Asking Questions
  • 22. The LOD Gateway is a tool designed to allow for both use cases.. It allows you to create, update, and delete JSON-LD documents, and behind the scenes it will keep a triplestore in sync with those changes. 22 Meeting Both Needs
  • 23. This works for Linked.Art records, IIIF Manifest, Web Annotations: any JSON-LD document. If you POST it to the LOD Gateway, that record will be available at the URL defined in that document’s id property. 23 Meeting Both Needs
  • 24. It’s also RDF-aware: If there are nested children included the main document, it automatically makes those dereferencable, too. 24 LOD Gateway: RDF Aware https://example.com/object/1 https://example.com/object/1/identifier/1
  • 25. You can also request documents in other RDF formats. 25 LOD Gateway: RDF Aware https://example.com/object/1?format=turtle
  • 26. It also provides both a full SPARQL API and an embedded GUI for testing queries. It can be configured to use any SPARQL Triplestore—we use Fuseki in testing and Amazon Neptune in production. 26 LOD Gateway: SPARQL-Enabled
  • 27. This flexibility makes it simple to write and retrieve data in a form that matches your primary use case, but still allows you the flexibility to go beyond that—either for research or for unexpected features—without needing to rewrite your API. 27 Two Views: One Set of Facts
  • 28. You can also configure it to run without the RDF integration as a JSON document store. We do this all the time, because of another feature of the LOD Gatway: Change Logs! 28 LOD Gateway: now SPARQL-Free
  • 29. The third critical use for our data is synchronization across systems. A editor changes a record, which means the API needs updated, which means the website needs updated, and the search interfaces, and third-party systems… 29 LOD Gateway: Tracking Changes
  • 30. Every time you create, update, or delete a record in the LOD Gateway, it adds a entry to an Activity Stream. This lets a consuming system identify only the records that have been changed since the last time they synced. 30 LOD Gateway: Tracking Changes
  • 31. You can do this for the whole dataset, for a given entity type, or even for a single entity. This happens automatically, every time you update a record in the LOD Gateway. It’s even smart enough to not generate a change event if the data didn’t change. 31 LOD Gateway: Tracking Changes
  • 32. These change logs follow the W3C ActivityStream standard and are implemented using the patterns from the IIIF Change Discovery API. Using standards makes it easy for external consumers to build integrations against these flows. 32 LOD Gateway: ActivityStreams and Standards
  • 33. The change log only describes which records changed. But for some kinds of data, it's valuable to also be able to see what has changed over time for a given record. To do so, the LOD Gateway also supports Memento, the standard underneath the Internet Archive. 33 LOD Gateway: ActivityStreams and Standards
  • 34. This feature lets you automatically open older versions of the record—providing an audit log and the ability for scholars to understand how knowledge changes over time. 34 LOD Gateway: ActivityStreams and Standards
  • 35. How do we use this? How can you use this? 35
  • 36. 36 Getty’s Data Infrastructure: Managing Complexity
  • 38. One tool, many needs. Building this tool has let a small team support 14 different APIs—and put in place new ones whenever we need. Our smallest instance is 250 records. Our largest is over 1 million. 38 LOD Gateway: Consistent Patterns, Consistent Tools
  • 39. Critical Infrastructure. The only way we’ve built what we have is using this tool. Every research tool, every API. 39 LOD Gateway: Consistent Patterns, Consistent Tools
  • 40. And now you can, too. As of today, we’ve released this tool as open source software under the BSD-3 license. https://github.com/thegetty/lod-gateway 40 LOD Gateway: Consistent Patterns, Consistent Tools
  • 41. This is a “Third System”: This is heavily-tested infrastructure, built because we have made so many mistakes. It’s not perfect, but our hope is that it helps you avoid at least the mistakes we know about—and allows the brilliant modeling ecosystem CIDOC builds be used in production by others around the world. 41 LOD Gateway: Built on top of our mistakes
  • 42. Thank you! Find me or ask me questions at: dnewbury@getty.edu 42