Presented at the CIDOC conference in Mexico City, 2023, this talk provides a walkthrough of the digital infrastructure behind the LOD Gateway, a critical part of Getty's digital API infrastructure.
It discusses the difference between graphs, documents, and how both are important for different use cases.
Six Myths about Ontologies: The Basics of Formal Ontology
The LOD Gateway: Open Source Infrastructure for Linked Data
1. Open Source Infrastructure for Linked Data
CIDOC-CRM 2023, Mexico City
The LOD Gateway
David Newbury
Assistant Director, Software and User Experience, Getty
2. Hi! I’m David.
I lead the software and user experience teams at Getty.
Getty is a big museum/research hub in Los Angeles. We do lots of things with
data.
All of the actual work here was done by my fabulously talented team. I just talk.
And we’re not Getty Images. Same rich family, same last name, no connection.
2
Introduction
3. Getty has been doing Linked Data since 2014,
starting with the Getty Vocabularies.
It’s a collection of concepts, people, and places
deeply relevant to the study of art and
architecture.
3
Getty’s Linked Data: Getty Vocabularies
4. Since then, we’ve moved most of our major
systems to use Linked Data—including our
archives…
4
Getty’s Linked Data: Archival Records
5. Since then, we’ve moved most of our major
systems to use Linked Data—including our
archives…
… and our museum collection.
5
Getty’s Linked Data: Archival Records
6. We’ve also built a complex, powerful
infrastructure to support doing this across
our application landscape.
It’s been fun. We’ve learned a lot.
6
Getty’s Linked Data: APIs
7. Behind the scenes, all of these applications
are powered by a utility called The LOD
Gateway.
We’ve recently open-sourced this tool, and I’d
like to share it with you today.
7
Getty’s Linked Data: The LOD Gateway
8. This API system was designed to help Getty
manage one of the fundamental complications
that comes with using Linked Data:
Graphs vs. Documents.
8
Getty’s Linked Data: The LOD Gateway
10. And a second, related one:
"@context": "https://linked.art/ns/v1/linked-art.json",
"id": "person/1",
"type": "Person",
"identified_by": {
"id": "person/1/name",
"type": "Name",
"content": "Vincent Van Gogh"
}
10
Getty’s Linked Data: The LOD Gateway
11. These could be seen as two separate documents:
11
Getty’s Linked Data: The LOD Gateway
"@context":
"https://linked.art/ns/v1/linked-art.json",
"id": "person/1",
"type": "Person",
"identified_by": {
"id": "person/1/name",
"type": "Name",
"content": "Vincent Van Gogh"
}
"@context":
"https://linked.art/ns/v1/linked-art.json",
"id": "object/1",
"type": "HumanMadeObject",
"identified_by": {
"id": "object/1/name",
"type": "Name",
"content": "Irises"
},
"produced_by": {
"id": "object/1/production",
"carried_out_by": {"id":"person/1"}
}
12. Or as a single graph.
12
Getty’s Linked Data: The LOD Gateway
13. From the point of view of the data, these
two structures are equivalent—they contain
the same facts.
But from a usability perspective, they make
different things easy or hard.
13
Getty’s Linked Data: The LOD Gateway
"@context":
"https://linked.art/ns/v1/linked-
art.json",
"id": "person/1",
"type": "Person",
"identified_by": {
"id": "person/1/name",
"type": "Name",
"content": "Vincent
Van Gogh"
}
"@context":
"https://linked.art/ns/v1/linked-art.js
on",
"id": "object/1",
"type": "HumanMadeObject",
"identified_by": {
"id": "object/1/name",
"type": "Name",
"content": "Irises"
},
"produced_by": {
"id": "object/1/production",
"carried_out_by": {"id":"person/1"}
}
14. Documents are optimized for Access:
They provide a specific set of data bundled
together by the data creator that provide all
the facts you need…given a specific context.
14
Documents: For Access and Discovery
"@context":
"https://linked.art/ns/v1/linked-art.json",
"id": "object/1",
"type": "HumanMadeObject",
"identified_by": {
"id": "object/1/name",
"type": "Name",
"content": "Irises"
},
"produced_by": {
"id": "object/1/production",
"carried_out_by": {"id":"person/1"}
}
15. Graphs, alternately, are optimized for querying:
Allowing a user to define a specific context based
on novel criteria of interest, and returning that
subset of facts.
15
Graphs: For Queries
16. “What objects does Getty have that have images larger than
1200px on the longest side that have been exhibited in both New
York and Paris and were created by artists who lived before 1850?”
and
“What is the tombstone data about Irises?”
16
Imagine two Questions:
17. At the Getty, we have never asked:
“What objects does Getty have that have images larger than
1200px on the longest side that have been exhibited in both New
York and Paris and were created by artists who lived before 1850?
…but we ask
What is the tombstone data about Irises?
Several thousand times a day.
17
Imagine two Questions:
18. Having an interface for documents lets us
provide a simple, easily understandable
record that maps well to known contexts.
This is important, because people usually
expect these contexts. It makes answering
common questions simple.
18
Documents: For Access and Discovery
19. It also maps nicely to the sort of affordances
that work well on the internet—REST APIs,
cache control, JSON documents, webpages.
This is also important, because using these
well-known systems helps us make our
systems fast and easy to build.
19
Documents: For Access and Discovery
20. Research is different—each scholar brings
their own question and their own context.
Meeting their need means empowering
them to draw their own boundaries within
the data.
20
Graphs: For Asking Questions
21. Doing so is complex—it moves the burden of
defining the relevant context to the user of
the data, not the creator of the data.
But it makes asking new questions possible,
even if it might be inefficient or complicated.
21
Graphs: For Asking Questions
22. The LOD Gateway is a tool designed to
allow for both use cases..
It allows you to create, update, and delete
JSON-LD documents, and behind the
scenes it will keep a triplestore in sync with
those changes.
22
Meeting Both Needs
23. This works for Linked.Art records, IIIF
Manifest, Web Annotations: any JSON-LD
document.
If you POST it to the LOD Gateway, that
record will be available at the URL defined
in that document’s id property.
23
Meeting Both Needs
24. It’s also RDF-aware: If there are nested children included the main document,
it automatically makes those dereferencable, too.
24
LOD Gateway: RDF Aware
https://example.com/object/1
https://example.com/object/1/identifier/1
25. You can also request documents in other RDF formats.
25
LOD Gateway: RDF Aware
https://example.com/object/1?format=turtle
26. It also provides both a full SPARQL API and
an embedded GUI for testing queries.
It can be configured to use any SPARQL
Triplestore—we use Fuseki in testing and
Amazon Neptune in production.
26
LOD Gateway: SPARQL-Enabled
27. This flexibility makes it simple to write and retrieve
data in a form that matches your primary use case,
but still allows you the flexibility to go beyond
that—either for research or for unexpected
features—without needing to rewrite your API.
27
Two Views: One Set of Facts
28. You can also configure it to run without the
RDF integration as a JSON document store.
We do this all the time, because of another
feature of the LOD Gatway: Change Logs!
28
LOD Gateway: now SPARQL-Free
29. The third critical use for our data is synchronization across systems.
A editor changes a record, which means the API needs updated, which means the website
needs updated, and the search interfaces, and third-party systems…
29
LOD Gateway: Tracking Changes
30. Every time you create, update, or delete a
record in the LOD Gateway, it adds a entry
to an Activity Stream.
This lets a consuming system identify
only the records that have been changed
since the last time they synced.
30
LOD Gateway: Tracking Changes
31. You can do this for the whole dataset, for a
given entity type, or even for a single entity.
This happens automatically, every time you
update a record in the LOD Gateway.
It’s even smart enough to not generate a
change event if the data didn’t change.
31
LOD Gateway: Tracking Changes
32. These change logs follow the
W3C ActivityStream standard and are
implemented using the patterns from the
IIIF Change Discovery API.
Using standards makes it easy for external
consumers to build integrations against
these flows.
32
LOD Gateway: ActivityStreams and Standards
33. The change log only describes which
records changed. But for some kinds of
data, it's valuable to also be able to see what
has changed over time for a given record.
To do so, the LOD Gateway also supports
Memento, the standard underneath the
Internet Archive.
33
LOD Gateway: ActivityStreams and Standards
34. This feature lets you automatically open older
versions of the record—providing an audit log
and the ability for scholars to understand
how knowledge changes over time.
34
LOD Gateway: ActivityStreams and Standards
38. One tool, many needs.
Building this tool has let a small team support 14 different APIs—and put in place
new ones whenever we need.
Our smallest instance is 250 records. Our largest is over 1 million.
38
LOD Gateway: Consistent Patterns, Consistent Tools
39. Critical Infrastructure.
The only way we’ve built what we have is using this tool.
Every research tool, every API.
39
LOD Gateway: Consistent Patterns, Consistent Tools
40. And now you can, too.
As of today, we’ve released this tool as open source software under the BSD-3
license.
https://github.com/thegetty/lod-gateway
40
LOD Gateway: Consistent Patterns, Consistent Tools
41. This is a “Third System”:
This is heavily-tested infrastructure, built because we have made so many
mistakes.
It’s not perfect, but our hope is that it helps you avoid at least the mistakes we know
about—and allows the brilliant modeling ecosystem CIDOC builds be used in
production by others around the world.
41
LOD Gateway: Built on top of our mistakes