2024: Domino Containers - The Next Step. News from the Domino Container commu...
Tim Hill
1. Transforming Access to Culture
& History with Connected Data
The case of Europeana
Netherlands, Public Domain
1660 - 1625, Rijksmuseum
Anonymous
Arrival of a Portuguese ship
2. Netherlands, Public Domain
1615, Rijksmuseum
Anonymous
Elegant Party on a Terrace of a Venetian-
inspired Setting
Who we are
Europeana: Transforming the World with Culture
3. Europeana: Cultural Heritage Metadata
and Content from across Europe
• We aggregate data from Cultural Heritage organisations across
Europe
• predominantly, but not only, EU member states
• In most cases we only harvest metadata
• increasingly, however, we are also hosting content as well as
metadata
• Make it available through our portal site:
https://www.europeana.eu/portal/en
• ultimately linking back to the originating institution
Transforming Access to Culture & History with Connected DataCC BY-SA
4. Europeana in numbers
• 53 million+ items
• 30+ languages
• 4500+ GLAM (Galleries, Libraries, Archives, and Museums)
institutions
CC BY-SA
Transforming Access to Culture & History with Connected Data
5. Europeana as ‘Big Data’
• Volume: relatively low, by Big Data standards (< 2TB metadata)
• Velocity: continuous updating, flushed to datastore every 15 minutes
• Veracity: significant issues of data quality
• Variety: immense
• multiple languages
• multiple formats
• different institutions
• etc … extremely heterogeneous
CC BY-SA
Transforming Access to Culture & History with Connected Data
Analysed as the four ‘V’s …
6. Norway, CC BY-SA
1921, Oslo Museum
Ernest Rude
Ernest Marini - dancer in a costume
Who they are
Users: what they want and what they do
7. Who are they?
• "Culture Vultures"
• Academic researchers
• Teachers and students
• Visual artists
• Graphic designers
• Amateurs (in the original sense of the word)
• "Culture snackers”
• casual browsers looking for entertainment
CC BY-SA
Transforming Access to Culture & History with Connected Data
8. What are they looking for?
• Query pattern is extremely flat
• analysis of logs shows no search term shared by > 6 users
• further analysis needed here
• “serendipity search” is important: users are trying to surprise
themselves
CC BY-SA
Transforming Access to Culture & History with Connected Data
It seems literally impossible to say ….
9. What are they like?
• Culture vultures
• engagement is extremely high
• mean rank of clicked items: 82 (!)
• session length once an item is clicked in the SERP can stretch into
hours
• Culture snackers
• bounce rate difficult to estimate, but high (> 85%)
CC BY-SA
Transforming Access to Culture & History with Connected Data
User engagement
10. What are they doing?
• school reports
• university essays
• presentations
• exhibitions
• research papers
• new artworks
CC BY-SA
Transforming Access to Culture & History with Connected Data
Making new stuff!
11. United Kingdom, CC BY
The Wellcome Library
Luigi Garzi
The birth of Adonis and
the transformation of Myrrha
Where we’ve been, where
we’re going
Visions for cultural heritage and connected data, past
and present
12. Original Vision: as Linked Open Data
provider
CC BY-SA
Transforming Access to Culture & History with Connected Data
Linking Open Data cloud diagram 2011, by Andrejs Abele, John P. McCrae, Paul Buitelaar, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/
13. The original vision, today
• Ontological modelling
• Europeana Data Model (EDM)
• expressed in RDF for data-model mediation
• internationally shared (DPLA, BBC, etc.)
• Served on our SPARQL endpoint
• … but more frequently as JSON-LD over our APIs
• plug plug: received API World award this year for best Data API
CC BY-SA
Transforming Access to Culture & History with Connected Data
Continued contributions
14. LOD: New Directions
• “Entity-fication”
• 70%-80% of our searches are for named entities
• People
• Places
• Concepts (subject headings)
• Information on these can be harvested from:
• DBPedia
• Wikidata
• Geonames
• …
CC BY-SA
Transforming Access to Culture & History with Connected Data
Structuring content through LOD harvesting (i)
15. LOD: New Directions
• “Workification” (FRBR data model)
• creating abstract artistic or intellectual entities from numerous
instantiations
• for example, the novel “Oliver Twist” from its many printed editions
and translations
• Harvested (or at least seeded) from OCLC and VIAF
CC BY-SA
Transforming Access to Culture & History with Connected Data
Structuring content through LOD harvesting (ii)
16. LOD: New Directions
• Knowledge Graphs linking …
• authors to works
• artists to their paintings, and other artists
• concepts to concepts
• …
• Obvious applications
• educational
• research
• “serendipity”
• improved “snacker” engagement
CC BY-SA
Transforming Access to Culture & History with Connected Data
Structuring content through LOD harvesting (iii)
17. Case Study
CC BY-SA
Transforming Access to Culture & History with Connected Data
Linking Rembrandt to Jahangir
• https://www.thetimes.co.uk/article/from-rhinos-to-rembrandt-how-india-
inspired-the-world-hdsr8kls5
“Self-portrait” (Rembrandt van Rijn), “The Great-Mughal Jahangir” (Rembrandt van Rijn),
and “Prince Salim, the future Jahangir, Enthroned” (Anonymous), all in the public domain.
18. How we do it
Technical stack
France, Public Domain
1914, National Library of France
Agence de presse Meurisse
Concours de cycles nautiques sur le lac
d’Enghien : Berregent piloté par Austerling
19. The webapp stack
• Data ingestion: Java + XSLT behemoth
• Data enrichment: Java
• Source-of-truth datastore: MongoDB
• Information retrieval: Solr + Neo4J
• API: Swagger with Java
• UI: JS, variety of libraries
• SPARQL endpoint: Virtuoso
CC BY-SA
Transforming Access to Culture & History with Connected Data
20. France, Public Domain
1588, Bibliothèque municipale de Lyon
Hendrik Goltzius
Le dragon dévorant les compagnons
de Cadmus
Reality check
Where we are and how fast we can go
21. Dirty Data (i)
• getting from things to strings is a non-trivial process
• Named Entity Recognition technology relatively unhelpful in this
domain
• exact-string matching only: precision good, but recall poor
• multilinguality strong
• Limited number of tools to help with cleaning, enhancing, validating this
data
• OpenRefine potentially helpful
• ShEx, SHACL not yet fully mature
CC BY-SA
Transforming Access to Culture & History with Connected Data
Source data
22. Dirty Data (ii)
• Irregular data models
• Large number of Wikidata, DBpedia properties applied irregularly
• “defensive querying”
• Incorrect data
• more often questions of structure than inaccurate field values
• e.g. Geonames hierarchies
• Uncurated or aggregated data
• e.g., many variants provided by VIAF
CC BY-SA
Transforming Access to Culture & History with Connected Data
Linked Data resources
23. Directions forward
• Manual or at least heavily-supervised curation a requirement for the
foreseeable future
• Tools to aid NER and entity-matching are the focus of two US efforts:
• Institute of Museum and Library Services (IMLS) Local Authority Files
project
• Linked Data for Libraries Reconciliation Service Group
• Work division
• devolution to partners
• crowdsourcing
CC BY-SA
Transforming Access to Culture & History with Connected Data
Dealing with dirty data
25. PANEL: LINKED OPEN DATA - IS IT FAILING
OR JUST GETTING OUT OF THE BLOCKS?
Tweet your questions via Direct Message to @Connected_Data or #ConnectedData
MODERATOR
James Phare
Connected Data London
PANELIST
Chris Taggart
CEO
OpenCorporates
@CountCulture
PANELIST
Chris Gutteridge
Linked Open Data
Architect
University of Southampton
PANELIST
Leigh Dodds
Data Infrastructure
Programme Lead
Open Data Institute
@ldodds
PANELIST
Sebastian Hellmann
Executive Director and
Board Member
DBpedia