So…how does anyone do this stuff, for real?
MW 2023, Washington DC
Linked Data on a Budget
David Newbury
Assistant Director, Software and User Experience,
Getty Digital
Hi! I’m David.
I lead the software and user experience teams at Getty.
Getty is a big museum/research hub in Los Angeles. We do lots of things with
data.
All of the actual work here was done by my fabulously talented team. I just talk.
2
Introduction
Part 1:
Linked Data is amazing!
3
● Linked Data is another name for the
Semantic Web, a good idea by Tim
Berners-Lee, whose previous good idea
turned out to be very good.
4
What is Linked Data: The Standard Story
There are three main concepts in Linked Data:
1. Data is represented as a graph.
2. Meaning is determined by ontologies.
3. IDs are dereferencable URLs.
5
What is Linked Data: The Standard Story
A Graph is a way to represent data.
Think of a fact.
6
What is Linked Data: Data as a Graph
Favorite Drink Coffee
David
A Graph is a way to represent data.
Think of a fact.
Think of another fact.
7
What is Linked Data: Data as a Graph
Favorite Drink Coffee
Favorite Drink Beer
David
John
A Graph is a way to represent data.
Think of a fact.
Think of another fact.
And another.
8
What is Linked Data: Data as a Graph
Favorite Drink Coffee
Favorite Drink Beer
Favorite Drink Chai
David
John
Betsy
You could imagine these as a table of data:
9
What is Linked Data: Data as a Graph
David
John
Betsy
Fav Drink
Coffee
Beer
Chai
You could imagine these as a table of data:
…and add other information about
the people involved.
10
What is Linked Data: Data as a Graph
David
John
Betsy
Fav Drink
Coffee
Beer
Chai
Hometown
Pittsburgh
Boston
Pittsburgh
This does get duplicative, though,
if you want to add additional
information about a different
column.
11
What is Linked Data: Data as a Graph
David
Fav Drink
Coffee
John Beer
Betsy Chai
Hometown
Pittsburgh
Boston
Pittsburgh
State
PA
MA
PA
You can solve this with a relational
database…
12
What is Linked Data: Data as a Graph
David
Fav Drink
Coffee
John Beer
Betsy Chai
Place ID
1
2
1
Hometown
Pittsburgh
Boston
State
PA
MA
Place ID
1
2
You can solve this with a relational
database…
…or with with a graph.
13
What is Linked Data: Data as a Graph
David
Fav Drink Coffee
State
Hometown
Pittsburgh PA
Betsy
Fav Drink Chai
Hometown
John
Fav Drink Beer
State
Hometown Boston MA
Tables are great for lots of data
about “a thing”, with a limited
number of kinds of things with
consistent links between things.
14
What is Linked Data: Data as a Graph
Graphs are great when the number
of kinds of things and number of
links between them is high and
inconsistent.
15
What is Linked Data: Data as a Graph
Another problem is meaning:
Words are great, but they require a
shared understanding of what’s
being described.
16
What is Linked Data: Data as a Ontology
David State PA
David State Solid
David State Confused
Linked Data uses ontologies to
include, as data, context and
definition around the terms used to
define how things are connected.
17
What is Linked Data: Data as a Ontology
David State PA
David State Solid
David State Confused
defined as
Geographical region
within a country
defined as
Distinct form of
matter
defined as
Emotional or mental
condition
It also assumes that each of
these concepts is represented
by a unique identifier, which lets
people—and computers—be
unambiguous.
18
What is Linked Data: Data as a Ontology
geo_state
State
matter_state
State
mental_state
State
defined as
Geographical region
within a country
defined as
Distinct form of
matter
defined as
Emotional or mental
condition
label
label
label
By making these identifiers into
URLs, they can be made globally
unique—and can also carry with
them the identity of the
concept’s creator.
19
What is Linked Data: Data as a Ontology
getty.edu/geo_state
getty.edu/matter_state
getty.edu/mental_state
defined as
Geographical region
within a country
defined as
Distinct form of
matter
defined as
Emotional or mental
condition
And it also means that if you
dereference that URL, you can
provide access to the data!
20
What is Linked Data: Dereferencable Data
getty.edu/matter_state
defined as
Distinct form of
matter
It also means that the
information can come from
outside of our own ecosystem.
21
What is Linked Data: Dereferencable Data
getty.edu/matter_state
defined as
Distinct form of
matter
spanish
label
materia
same as
wikidata.org/Q35758
Linked Data is Amazing!
22
Part 1: Summary
Linked Data is Amazing!
But…
23
Part 1: Summary
Part 2:
Linked Data is annoying.
24
Relational databases are optimized
for performance and data locality.
If you keep all the information
about a person in one place—it’s
very fast to pull it back.
25
Annoyances: Performance
David
Fav Drink
Coffee
John Beer
Betsy Chai
Place ID
1
2
1
Hometown
Pittsburgh
Boston
State
PA
MA
Place ID
1
2
It’s also easy to understand
“What is a person” from the
perspective of the application:
It’s the information in the
“Person” table.
26
Annoyances: Concept Boundaries
David
Fav Drink
Coffee
John Beer
Betsy Chai
Place ID
1
2
1
It also makes it easy to include metadata
about the record.
27
Annoyances: Metadata
David
Fav Drink
Coffee
John Beer
Betsy Chai
Place ID
1
2
1
Updated
2022-01-05
1970-01-01
2023-04-01
This idea of a “record” is a construct—
remember, these are just facts,
organized into a table.
But we’re trained to think about data as
collections of grouped facts, relevant
within a specific context.
28
Annoyances: Record Boundaries
David
Fav Drink
Coffee
John Beer
Betsy Chai
Place ID
1
2
1
Updated
2022-01-05
1970-01-01
2023-04-01
Graphs don’t provide clear
boundaries the same way—they
don’t have the concept of a record.
Each triple is a stand-alone
record—and often collecting all
the information you want requires
many hops across the graph.
29
Annoyances: Graph Structure
David
Fav Drink Coffee
State
Hometown
Pittsburgh PA
Betsy
Fav Drink Chai
Hometown
Graphs are optimized for querying:
Defining a query-specific context
that includes a set of facts based on
novel criteria of interest, and
returning that subset of
information.
30
Annoyances: Queries
David
Fav Drink Coffee
State
Hometown
Pittsburgh PA
Betsy
Fav Drink Chai
Hometown
“What objects does Getty have that have images larger than
1200px on the longest side that have been exhibited in both New
York and Paris and were created by artists who lived before 1850?”
is just as easy to ask as
“What is the tombstone data about Irises?”
31
Annoyances: Queries
“What objects does Getty have that have images larger than
1200px on the longest side that have been exhibited in both New
York and Paris and were created by artists who lived before 1850?”
is just as easy absurdly difficult to ask as
“What is the tombstone data about Irises?”
32
Annoyances: Queries
Doing so moves the burden of defining the
relevant context to the user of the data, not
the creator of the data.
This is great for research, but not so great
for ease of use.
33
Annoyances: Queries
We have never asked:
“What objects does Getty have that have images larger than
1200px on the longest side that have been exhibited in both New
York and Paris and were created by artists who lived before 1850?
…but we ask
What is the tombstone data about Irises?
Several thousand times a day.
34
Annoyances: Queries
Dereferencability could solve
this…but it requires network
requests.
35
Annoyances: Queries
David
Fav Drink Coffee
Hometown
Pittsburgh
Dereferencability could solve
this…but it requires network
requests.
Annoyances: Queries
David
Fav Drink Coffee
geo_state
Hometown
Pittsburgh PA
Dereferencability could solve
this…but it requires network
requests.
So many requests.
37
Annoyances: Queries
David
Fav Drink Coffee
geo_state
Hometown
Pittsburgh PA
State
defined as
Geographical region
within a country
label
same as
wikidata.org/Q35758
Dereferencability could solve
this…but it requires network
requests.
So many requests.
…when do you stop?
38
Annoyances: Queries
David
Fav Drink Coffee
geo_state
Hometown
Pittsburgh PA
State
defined as
Geographical region
within a country
label
same as
wikidata.org/Q106458883
spanish
label
división administrativa de
primer nivel en varios países
Dereferencability could solve
this…but it requires network
requests.
So many requests.
…when do you stop?
…and can you rely on other
systems?
39
Annoyances: Queries
David
Fav Drink Coffee
geo_state
Hometown
Pittsburgh PA
State
defined as
Geographical region
within a country
label
same as
wikidata.org/Q106458883
spanish
label
división administrativa de
primer nivel en varios países
Linked Data is annoying.
None of these are theoretical concerns about Linked Data.
They’re just practical concerns when you try and build something on top of it.
40
Part 2: Summary
Part 3:
Getty builds stuff on linked data.
41
Getty has been doing Linked Data since 2014,
starting with the Getty Vocabularies.
It’s a collection of concepts, people, and places
deeply relevant to the study of art and
architecture.
42
Getty’s Linked Data: Getty Vocabularies
Since then, we’ve moved most of our major
systems to use Linked Data—including our
archives…
43
Getty’s Linked Data: Archival Records
Since then, we’ve moved most of our major
systems to use Linked Data—including our
archives…
… and our museum collection.
44
Getty’s Linked Data: Archival Records
We’ve also built a complex, powerful
infrastructure to support doing this across
our application landscape.
It’s been fun. We’ve learned a lot.
45
Getty’s Linked Data: APIs
A Hard-won lesson:
No application that we’ve built required Linked Data.
46
Getty’s Linked Data: What we learned
A Hard-won lesson:
No application that we’ve built required Linked Data.
Which, if you think about it, makes sense. Each application has
a specific, known context with clear record boundaries.
47
Getty’s Linked Data: What we learned
Why keep doing it?
The value is in the ecosystem—when we present information in multiple contexts.
It’s also in the community—allowing our data to be used beyond our
organization’s boundaries.
48
Getty’s Linked Data: What we learned
Why should YOU do it?
Because what makes cultural data interesting is not contained within the walls of
any one institution.
It’s shared across our entire, world-wide community. We should work together.
That’s the reason—not any particular data structure or ontology.
49
Getty’s Linked Data: What we learned
Part 4:
So…what can YOU do?
50
You don’t need to do what we’ve done.
Enabling connections across silos and organizations doesn’t mean that you need a
triplestore with Linked.Art data provided via JSON-LD documents reconciled to
ULAN and Wikidata, queryable via SPARQL and ElasticSearch, with
cross-references via Web Annotations, associated with IIIF Manifests.
51
Linking Data: The Six Levels
You don’t need to do what we’ve done.
Enabling connections across silos and organizations doesn’t mean that you need a
triplestore with Linked.Art data provided via JSON-LD documents reconciled to
ULAN and Wikidata, queryable via SPARQL and ElasticSearch, with
cross-references via Web Annotations, associated with IIIF Manifests.
That would just be showing off.
52
Linking Data: The Six Levels
You just need to make it easy for people to
understand what you have done.
There are, in our experience, six levels of Linked Data that build on one another,
but all provide value—both within an organization and across the community.
53
Linking Data: The Six Levels
#1: Authority
Provide a consistent way to identify both entities and the institution providing
information in your data.
54
Level 1: Authority
Give everything an identifier.
Other people can’t talk about your data without a way to unambiguously refer to
the record that you’re talking about.
URLs as IDs are great for this—they’re both unique—and they let others know
who produced the data.
55
Level 1: Authority
Give everything an human-friendly identifier.
https://data.getty.edu/research/collections/object/97e8fd22-92a4-4831-aa63-33255c1aaefe
This is not friendly.
56
Level 1: Authority
Give everything an human-friendly identifier.
This is friendly:
https://data.getty.edu/archives/AK3098
57
Level 1: Authority
Identifiers are for other PEOPLE to use.
Identifiers are most commonly used by machines—but most of the effort around
identifiers is done by humans typing them.
Optimize for people, not for machines.
58
Level 1: Authority
Identifiers Identify Documents.
You have the best sense of what “relevant context” might be. It’s wonderful to
provide query capabilities—but you should determine what information is usually
relevant for a given identifier.
Make easy things easy, and hard things possible.
59
Level 1: Authority
#2: Reconciliation
Use authorities and thesauri to disambiguate between similar real world entities.
60
Level 2: Reconciliation
Reference, even if you can’t link.
Give people a sense of how your data might be connected to others by adding in
pointer to a shared, common point of reference.
61
Level 2: Reconciliation
Reference, even if you can’t link.
The Getty Vocabularies are great for this. So is Wikidata. So is VIAF. Doesn’t
matter—just give us a way to confirm that what we’re thinking is what you’re
thinking.
62
Level 2: Reconciliation
Publish that reference.
It only works, though, if you let people KNOW.
63
Level 2: Reconciliation
If this is all you can do, you’ve done enough.
Almost all the value of linked data is present at this point. If you publish data,
provide identifiers, and you include links to others—you’ve done linked data.
Everything after this is extra credit.
64
Level 2: Reconciliation
#3: Bidirectional Linking
Establish and publish connections between systems or institutions.
65
Level 3: Bidirectional Linking
Links go both ways.
It’s valuable to know that a given artwork is mentioned in a book—but it’s just as
valuable to know that a book mentions an artwork!
66
Level 3: Bidirectional Linking
Sync is hard.
We’ve learned that trying to keep this in sync within systems is hard. Most of our
applications are not designed to deal with information outside of their own sphere
of control.
Instead, we maintain these references outside systems of record, and look them
up when needed for presentation.
67
Level 3: Bidirectional Linking
Links are often surprising!
Publishing bidirectional crosswalks between linked things creates
networks of information—and helps people discover unexpected relationships.
68
Level 3: Bidirectional Linking
#4: Aggregation
Enhance discovery by providing search and access to information across
collections.
69
Level 4: Aggregation
This is where you start doing things for other people.
The previous levels are about what you do in your data, often for yourself—
but now, you’re doing things explicitly to help other people do things with your
data.
70
Level 4: Aggregation
The best place for data is where people are looking for it.
Often, that’s not with you.
Share your data with other people, and let them point back to you.
71
Level 4: Aggregation
But: Change Discovery.
If other people are using your data, they’re going to cache it.
They don’t trust you.
72
Level 4: Aggregation
Change Discovery.
If other people are using your data, they’re going to cache it.
I don’t trust you.
73
Level 4: Aggregation
Change Discovery.
If other people are using your data, they’re going to cache it.
I don’t trust you.
Please don’t trust my systems.
74
Level 4: Aggregation
Change Discovery.
Cache our data: We’ll let you know if the data changes.
75
Level 4: Aggregation
Change Discovery.
It doesn’t matter what the change is—just letting someone know to look for
changes provides most of the value.
Recaching everything is hard, but pulling just the changed records is easy.
76
Level 4: Aggregation
#5: Interoperability
Develop interfaces that present information from many sources in a single way.
77
Level 5: Interoperability
Data Standards matter now.
Up to this point in the process, I haven’t mentioned anything about linked.art, or
CIDOC-CRM, or Schema.org, or SKOS-XL.
They don’t matter until you want to create an automatically interoperable
application.
78
Level 5: Interoperability
Data Standards have other value, of course.
Standards are great for consistency and ensuring quality—
and for letting other people write the documentation.
79
Level 5: Interoperability
Externally, they’re for robots.
The external value of standards means that I can write code that consumes your
data without needing to talk to you—or even know you exist.
This is why Schema.org is so widely used—Google doesn’t know I exist, but they
can still extract my event data and share it.
80
Level 5: Interoperability
IIIF is our community’s shining example of this.
A standard widely-enough used that there are multiple applications that can be
used across the field to show other people’s data in yet other people’s applications.
81
Level 5: Interoperability
Linked.art is just beginning to demonstrate this.
We’re on the precipice of having enough data at this level for it to be worth
building applications for artwork. Stay tuned!
82
Level 5: Interoperability
This is Level 5.
A reminder here. This is my penultimate level of linked data.
You don’t need to start here, and you don’t need to get to here to provide value.
83
Level 5: Interoperability
#6: Reuse
Allowing one institution to import information from another while maintaining the
provenance of the data.
84
Level 6: Reuse
We haven’t gotten here.
The final goal here would be if I could use your data in my application—and have it
still be your data.
This is the dream.
I’m still dreaming about this.
85
Level 6: Reuse
We will get here.
An ecosystem of shared, reusable, linked data will open potential beyond what we
can do at any organization—even Getty.
But it can’t be done without others. Without you.
86
Level 6: Reuse
Start Small.
Each of these levels provides value.
Decide what you can do—and do that—it’s enough, and it helps us build the
community.
87
Level 6: Reuse
Work Together—and complain!
The only way we’ll know what works—and, more importantly, what doesn’t, is if we
hear from others that things don’t work!
Linked Data is not valuable outside of a community—and if it’s not working for the
community, it’s not working.
We’re making mistakes—let us know when, so we can learn—and we can share.
88
Level 6: Reuse
Thank you! Complaints go here:
dnewbury@getty.edu
89

Linked Data on a Budget

  • 1.
    So…how does anyonedo this stuff, for real? MW 2023, Washington DC Linked Data on a Budget David Newbury Assistant Director, Software and User Experience, Getty Digital
  • 2.
    Hi! I’m David. Ilead the software and user experience teams at Getty. Getty is a big museum/research hub in Los Angeles. We do lots of things with data. All of the actual work here was done by my fabulously talented team. I just talk. 2 Introduction
  • 3.
    Part 1: Linked Datais amazing! 3
  • 4.
    ● Linked Datais another name for the Semantic Web, a good idea by Tim Berners-Lee, whose previous good idea turned out to be very good. 4 What is Linked Data: The Standard Story
  • 5.
    There are threemain concepts in Linked Data: 1. Data is represented as a graph. 2. Meaning is determined by ontologies. 3. IDs are dereferencable URLs. 5 What is Linked Data: The Standard Story
  • 6.
    A Graph isa way to represent data. Think of a fact. 6 What is Linked Data: Data as a Graph Favorite Drink Coffee David
  • 7.
    A Graph isa way to represent data. Think of a fact. Think of another fact. 7 What is Linked Data: Data as a Graph Favorite Drink Coffee Favorite Drink Beer David John
  • 8.
    A Graph isa way to represent data. Think of a fact. Think of another fact. And another. 8 What is Linked Data: Data as a Graph Favorite Drink Coffee Favorite Drink Beer Favorite Drink Chai David John Betsy
  • 9.
    You could imaginethese as a table of data: 9 What is Linked Data: Data as a Graph David John Betsy Fav Drink Coffee Beer Chai
  • 10.
    You could imaginethese as a table of data: …and add other information about the people involved. 10 What is Linked Data: Data as a Graph David John Betsy Fav Drink Coffee Beer Chai Hometown Pittsburgh Boston Pittsburgh
  • 11.
    This does getduplicative, though, if you want to add additional information about a different column. 11 What is Linked Data: Data as a Graph David Fav Drink Coffee John Beer Betsy Chai Hometown Pittsburgh Boston Pittsburgh State PA MA PA
  • 12.
    You can solvethis with a relational database… 12 What is Linked Data: Data as a Graph David Fav Drink Coffee John Beer Betsy Chai Place ID 1 2 1 Hometown Pittsburgh Boston State PA MA Place ID 1 2
  • 13.
    You can solvethis with a relational database… …or with with a graph. 13 What is Linked Data: Data as a Graph David Fav Drink Coffee State Hometown Pittsburgh PA Betsy Fav Drink Chai Hometown John Fav Drink Beer State Hometown Boston MA
  • 14.
    Tables are greatfor lots of data about “a thing”, with a limited number of kinds of things with consistent links between things. 14 What is Linked Data: Data as a Graph
  • 15.
    Graphs are greatwhen the number of kinds of things and number of links between them is high and inconsistent. 15 What is Linked Data: Data as a Graph
  • 16.
    Another problem ismeaning: Words are great, but they require a shared understanding of what’s being described. 16 What is Linked Data: Data as a Ontology David State PA David State Solid David State Confused
  • 17.
    Linked Data usesontologies to include, as data, context and definition around the terms used to define how things are connected. 17 What is Linked Data: Data as a Ontology David State PA David State Solid David State Confused defined as Geographical region within a country defined as Distinct form of matter defined as Emotional or mental condition
  • 18.
    It also assumesthat each of these concepts is represented by a unique identifier, which lets people—and computers—be unambiguous. 18 What is Linked Data: Data as a Ontology geo_state State matter_state State mental_state State defined as Geographical region within a country defined as Distinct form of matter defined as Emotional or mental condition label label label
  • 19.
    By making theseidentifiers into URLs, they can be made globally unique—and can also carry with them the identity of the concept’s creator. 19 What is Linked Data: Data as a Ontology getty.edu/geo_state getty.edu/matter_state getty.edu/mental_state defined as Geographical region within a country defined as Distinct form of matter defined as Emotional or mental condition
  • 20.
    And it alsomeans that if you dereference that URL, you can provide access to the data! 20 What is Linked Data: Dereferencable Data getty.edu/matter_state defined as Distinct form of matter
  • 21.
    It also meansthat the information can come from outside of our own ecosystem. 21 What is Linked Data: Dereferencable Data getty.edu/matter_state defined as Distinct form of matter spanish label materia same as wikidata.org/Q35758
  • 22.
    Linked Data isAmazing! 22 Part 1: Summary
  • 23.
    Linked Data isAmazing! But… 23 Part 1: Summary
  • 24.
    Part 2: Linked Datais annoying. 24
  • 25.
    Relational databases areoptimized for performance and data locality. If you keep all the information about a person in one place—it’s very fast to pull it back. 25 Annoyances: Performance David Fav Drink Coffee John Beer Betsy Chai Place ID 1 2 1 Hometown Pittsburgh Boston State PA MA Place ID 1 2
  • 26.
    It’s also easyto understand “What is a person” from the perspective of the application: It’s the information in the “Person” table. 26 Annoyances: Concept Boundaries David Fav Drink Coffee John Beer Betsy Chai Place ID 1 2 1
  • 27.
    It also makesit easy to include metadata about the record. 27 Annoyances: Metadata David Fav Drink Coffee John Beer Betsy Chai Place ID 1 2 1 Updated 2022-01-05 1970-01-01 2023-04-01
  • 28.
    This idea ofa “record” is a construct— remember, these are just facts, organized into a table. But we’re trained to think about data as collections of grouped facts, relevant within a specific context. 28 Annoyances: Record Boundaries David Fav Drink Coffee John Beer Betsy Chai Place ID 1 2 1 Updated 2022-01-05 1970-01-01 2023-04-01
  • 29.
    Graphs don’t provideclear boundaries the same way—they don’t have the concept of a record. Each triple is a stand-alone record—and often collecting all the information you want requires many hops across the graph. 29 Annoyances: Graph Structure David Fav Drink Coffee State Hometown Pittsburgh PA Betsy Fav Drink Chai Hometown
  • 30.
    Graphs are optimizedfor querying: Defining a query-specific context that includes a set of facts based on novel criteria of interest, and returning that subset of information. 30 Annoyances: Queries David Fav Drink Coffee State Hometown Pittsburgh PA Betsy Fav Drink Chai Hometown
  • 31.
    “What objects doesGetty have that have images larger than 1200px on the longest side that have been exhibited in both New York and Paris and were created by artists who lived before 1850?” is just as easy to ask as “What is the tombstone data about Irises?” 31 Annoyances: Queries
  • 32.
    “What objects doesGetty have that have images larger than 1200px on the longest side that have been exhibited in both New York and Paris and were created by artists who lived before 1850?” is just as easy absurdly difficult to ask as “What is the tombstone data about Irises?” 32 Annoyances: Queries
  • 33.
    Doing so movesthe burden of defining the relevant context to the user of the data, not the creator of the data. This is great for research, but not so great for ease of use. 33 Annoyances: Queries
  • 34.
    We have neverasked: “What objects does Getty have that have images larger than 1200px on the longest side that have been exhibited in both New York and Paris and were created by artists who lived before 1850? …but we ask What is the tombstone data about Irises? Several thousand times a day. 34 Annoyances: Queries
  • 35.
    Dereferencability could solve this…butit requires network requests. 35 Annoyances: Queries David Fav Drink Coffee Hometown Pittsburgh
  • 36.
    Dereferencability could solve this…butit requires network requests. Annoyances: Queries David Fav Drink Coffee geo_state Hometown Pittsburgh PA
  • 37.
    Dereferencability could solve this…butit requires network requests. So many requests. 37 Annoyances: Queries David Fav Drink Coffee geo_state Hometown Pittsburgh PA State defined as Geographical region within a country label same as wikidata.org/Q35758
  • 38.
    Dereferencability could solve this…butit requires network requests. So many requests. …when do you stop? 38 Annoyances: Queries David Fav Drink Coffee geo_state Hometown Pittsburgh PA State defined as Geographical region within a country label same as wikidata.org/Q106458883 spanish label división administrativa de primer nivel en varios países
  • 39.
    Dereferencability could solve this…butit requires network requests. So many requests. …when do you stop? …and can you rely on other systems? 39 Annoyances: Queries David Fav Drink Coffee geo_state Hometown Pittsburgh PA State defined as Geographical region within a country label same as wikidata.org/Q106458883 spanish label división administrativa de primer nivel en varios países
  • 40.
    Linked Data isannoying. None of these are theoretical concerns about Linked Data. They’re just practical concerns when you try and build something on top of it. 40 Part 2: Summary
  • 41.
    Part 3: Getty buildsstuff on linked data. 41
  • 42.
    Getty has beendoing Linked Data since 2014, starting with the Getty Vocabularies. It’s a collection of concepts, people, and places deeply relevant to the study of art and architecture. 42 Getty’s Linked Data: Getty Vocabularies
  • 43.
    Since then, we’vemoved most of our major systems to use Linked Data—including our archives… 43 Getty’s Linked Data: Archival Records
  • 44.
    Since then, we’vemoved most of our major systems to use Linked Data—including our archives… … and our museum collection. 44 Getty’s Linked Data: Archival Records
  • 45.
    We’ve also builta complex, powerful infrastructure to support doing this across our application landscape. It’s been fun. We’ve learned a lot. 45 Getty’s Linked Data: APIs
  • 46.
    A Hard-won lesson: Noapplication that we’ve built required Linked Data. 46 Getty’s Linked Data: What we learned
  • 47.
    A Hard-won lesson: Noapplication that we’ve built required Linked Data. Which, if you think about it, makes sense. Each application has a specific, known context with clear record boundaries. 47 Getty’s Linked Data: What we learned
  • 48.
    Why keep doingit? The value is in the ecosystem—when we present information in multiple contexts. It’s also in the community—allowing our data to be used beyond our organization’s boundaries. 48 Getty’s Linked Data: What we learned
  • 49.
    Why should YOUdo it? Because what makes cultural data interesting is not contained within the walls of any one institution. It’s shared across our entire, world-wide community. We should work together. That’s the reason—not any particular data structure or ontology. 49 Getty’s Linked Data: What we learned
  • 50.
  • 51.
    You don’t needto do what we’ve done. Enabling connections across silos and organizations doesn’t mean that you need a triplestore with Linked.Art data provided via JSON-LD documents reconciled to ULAN and Wikidata, queryable via SPARQL and ElasticSearch, with cross-references via Web Annotations, associated with IIIF Manifests. 51 Linking Data: The Six Levels
  • 52.
    You don’t needto do what we’ve done. Enabling connections across silos and organizations doesn’t mean that you need a triplestore with Linked.Art data provided via JSON-LD documents reconciled to ULAN and Wikidata, queryable via SPARQL and ElasticSearch, with cross-references via Web Annotations, associated with IIIF Manifests. That would just be showing off. 52 Linking Data: The Six Levels
  • 53.
    You just needto make it easy for people to understand what you have done. There are, in our experience, six levels of Linked Data that build on one another, but all provide value—both within an organization and across the community. 53 Linking Data: The Six Levels
  • 54.
    #1: Authority Provide aconsistent way to identify both entities and the institution providing information in your data. 54 Level 1: Authority
  • 55.
    Give everything anidentifier. Other people can’t talk about your data without a way to unambiguously refer to the record that you’re talking about. URLs as IDs are great for this—they’re both unique—and they let others know who produced the data. 55 Level 1: Authority
  • 56.
    Give everything anhuman-friendly identifier. https://data.getty.edu/research/collections/object/97e8fd22-92a4-4831-aa63-33255c1aaefe This is not friendly. 56 Level 1: Authority
  • 57.
    Give everything anhuman-friendly identifier. This is friendly: https://data.getty.edu/archives/AK3098 57 Level 1: Authority
  • 58.
    Identifiers are forother PEOPLE to use. Identifiers are most commonly used by machines—but most of the effort around identifiers is done by humans typing them. Optimize for people, not for machines. 58 Level 1: Authority
  • 59.
    Identifiers Identify Documents. Youhave the best sense of what “relevant context” might be. It’s wonderful to provide query capabilities—but you should determine what information is usually relevant for a given identifier. Make easy things easy, and hard things possible. 59 Level 1: Authority
  • 60.
    #2: Reconciliation Use authoritiesand thesauri to disambiguate between similar real world entities. 60 Level 2: Reconciliation
  • 61.
    Reference, even ifyou can’t link. Give people a sense of how your data might be connected to others by adding in pointer to a shared, common point of reference. 61 Level 2: Reconciliation
  • 62.
    Reference, even ifyou can’t link. The Getty Vocabularies are great for this. So is Wikidata. So is VIAF. Doesn’t matter—just give us a way to confirm that what we’re thinking is what you’re thinking. 62 Level 2: Reconciliation
  • 63.
    Publish that reference. Itonly works, though, if you let people KNOW. 63 Level 2: Reconciliation
  • 64.
    If this isall you can do, you’ve done enough. Almost all the value of linked data is present at this point. If you publish data, provide identifiers, and you include links to others—you’ve done linked data. Everything after this is extra credit. 64 Level 2: Reconciliation
  • 65.
    #3: Bidirectional Linking Establishand publish connections between systems or institutions. 65 Level 3: Bidirectional Linking
  • 66.
    Links go bothways. It’s valuable to know that a given artwork is mentioned in a book—but it’s just as valuable to know that a book mentions an artwork! 66 Level 3: Bidirectional Linking
  • 67.
    Sync is hard. We’velearned that trying to keep this in sync within systems is hard. Most of our applications are not designed to deal with information outside of their own sphere of control. Instead, we maintain these references outside systems of record, and look them up when needed for presentation. 67 Level 3: Bidirectional Linking
  • 68.
    Links are oftensurprising! Publishing bidirectional crosswalks between linked things creates networks of information—and helps people discover unexpected relationships. 68 Level 3: Bidirectional Linking
  • 69.
    #4: Aggregation Enhance discoveryby providing search and access to information across collections. 69 Level 4: Aggregation
  • 70.
    This is whereyou start doing things for other people. The previous levels are about what you do in your data, often for yourself— but now, you’re doing things explicitly to help other people do things with your data. 70 Level 4: Aggregation
  • 71.
    The best placefor data is where people are looking for it. Often, that’s not with you. Share your data with other people, and let them point back to you. 71 Level 4: Aggregation
  • 72.
    But: Change Discovery. Ifother people are using your data, they’re going to cache it. They don’t trust you. 72 Level 4: Aggregation
  • 73.
    Change Discovery. If otherpeople are using your data, they’re going to cache it. I don’t trust you. 73 Level 4: Aggregation
  • 74.
    Change Discovery. If otherpeople are using your data, they’re going to cache it. I don’t trust you. Please don’t trust my systems. 74 Level 4: Aggregation
  • 75.
    Change Discovery. Cache ourdata: We’ll let you know if the data changes. 75 Level 4: Aggregation
  • 76.
    Change Discovery. It doesn’tmatter what the change is—just letting someone know to look for changes provides most of the value. Recaching everything is hard, but pulling just the changed records is easy. 76 Level 4: Aggregation
  • 77.
    #5: Interoperability Develop interfacesthat present information from many sources in a single way. 77 Level 5: Interoperability
  • 78.
    Data Standards matternow. Up to this point in the process, I haven’t mentioned anything about linked.art, or CIDOC-CRM, or Schema.org, or SKOS-XL. They don’t matter until you want to create an automatically interoperable application. 78 Level 5: Interoperability
  • 79.
    Data Standards haveother value, of course. Standards are great for consistency and ensuring quality— and for letting other people write the documentation. 79 Level 5: Interoperability
  • 80.
    Externally, they’re forrobots. The external value of standards means that I can write code that consumes your data without needing to talk to you—or even know you exist. This is why Schema.org is so widely used—Google doesn’t know I exist, but they can still extract my event data and share it. 80 Level 5: Interoperability
  • 81.
    IIIF is ourcommunity’s shining example of this. A standard widely-enough used that there are multiple applications that can be used across the field to show other people’s data in yet other people’s applications. 81 Level 5: Interoperability
  • 82.
    Linked.art is justbeginning to demonstrate this. We’re on the precipice of having enough data at this level for it to be worth building applications for artwork. Stay tuned! 82 Level 5: Interoperability
  • 83.
    This is Level5. A reminder here. This is my penultimate level of linked data. You don’t need to start here, and you don’t need to get to here to provide value. 83 Level 5: Interoperability
  • 84.
    #6: Reuse Allowing oneinstitution to import information from another while maintaining the provenance of the data. 84 Level 6: Reuse
  • 85.
    We haven’t gottenhere. The final goal here would be if I could use your data in my application—and have it still be your data. This is the dream. I’m still dreaming about this. 85 Level 6: Reuse
  • 86.
    We will gethere. An ecosystem of shared, reusable, linked data will open potential beyond what we can do at any organization—even Getty. But it can’t be done without others. Without you. 86 Level 6: Reuse
  • 87.
    Start Small. Each ofthese levels provides value. Decide what you can do—and do that—it’s enough, and it helps us build the community. 87 Level 6: Reuse
  • 88.
    Work Together—and complain! Theonly way we’ll know what works—and, more importantly, what doesn’t, is if we hear from others that things don’t work! Linked Data is not valuable outside of a community—and if it’s not working for the community, it’s not working. We’re making mistakes—let us know when, so we can learn—and we can share. 88 Level 6: Reuse
  • 89.
    Thank you! Complaintsgo here: dnewbury@getty.edu 89