53 million objects and
then what?
On the challenge of abundance
David Haskiya | Erasme - Descartes 2016
The challenge
Today I want to talk about abundance, the deluge of
content that we produce, also in the Galleries,
Libraries, Archives and Museums (GLAM)-sector.
How can we make such abundance of content
meaningful and useful to citizens, researchers,
educators and students?
How can we make it easier for them to find that
specific needle in the haystack?
Outline of my talk
• An introduction to Europeana (Collections)
• Curation - what do I mean by it and why do we need it?
• Examples of curating content for different audiences and use
cases
• Summary and takeaways
What is Europeana?
In a couple of Tweet lengths, tops!
France, Public Domain
1914, National Library of France
Agence de presse Meurisse
Concours de cycles nautiques sur le lac
d’Enghien : Berregent piloté par Austerling
What and who is Europeana?
• We’re a non-profit foundation - idealists and true believers
• A network of like-minded heritage and technology professionals
• An open data platform with many services and drawing on the
collections of nearly 4000 European GLAMs
• Europeana Collections, Europeana APIs
The GLAMwiki toolset
CC BY-SA
“We want to build on Europe’s rich heritage and make it
easier for people to use, whether for work, for learning
or just for fun!”
Curation
What do I mean by it? Why do we need it?
Norway, CC BY-SA
1921, Oslo Museum
Ernest Rude
Ernest Marini - dancer in a costume
What is curation?
“Content curation is...the gathering, organizing and online
presentation of content related to a particular theme or
topic.”
• So, in contemporary web lingo, not the same as what e.g. most
museum curators would define it.
• But the quote is missing something? Any suggestions?
Users!
Here represented by Personas
Europeana Music Collections
CC BY-SA
What is curation?
“Content curation is...the gathering, organizing and online
presentation of content related to a particular theme or
topic, for a particular audience (or user).”
• There, I fixed it.
• Curation should not be audience agnostic
Some examples
For different audiences
National Library of France, Public Domain
Agence de presse Mondial Photo-Presse,
Tournoi royal de motos à Londres :
changement d'une roue de side-car en
marche
For digital humanists: Newspapers
• 10 libraries, 426 newspaper
titles, c. 11 million pages, 70
Gigabytes of text (compressed)
• Allows unprecedented capability
to research the role of news
from pan-European perspectives
The GLAMwiki toolset
CC BY-SA
Digital humanities (DH) is an area of scholarly activity at the intersection
of computing and the disciplines of the humanities.
For the teacher: World War I - A
Battle of Perspectives
• Created by teacher Gwen
Vergouwen, Apple Distinguished
Educator
• iBook - allowing interactive
teacher-guided exploration of
contextualised primary sources
• iTunesU - a course with
expanded materials from the
book
The GLAMwiki toolset
CC BY-SA
Sources and interpretation concerning the origins of the First World War
For the citizen: WWI on Wikipedia
• 993 files in total, a small curated subset of what users have
contributed to the Europeana 1914-1918 storytelling platform
• Not Europeana’s content, it’s the user’s content, but we have
uploaded it on their behalf
• Various World War I related imagery: photographs, postcards,
documents, trench art, militaria, etc.
The GLAMwiki toolset
CC BY-SA
Wikipedia is the top online source for information
• c. 1.2 million views of the files in Wikipedia articles - per month*
• The postcard of Franz Ferdinand minutes before his assassination is viewed c.
150 000 times per month
• The files are used in about 50 language versions of Wikipedia
• Technical quality is medium with images typically in 2-3 MP range
Some stats
For art lovers: Art on Wikidata and
Wikipedia
• 30 countries, 300 artworks
• 816 Wikipedia articles, 10 000
artwork title pairs
• Engaged dozens of art lovers
in editing and translating
articles
• Articles will be read millions
of times per month
The GLAMwiki toolset
CC BY-SA
World’s most used encyclopedia and linked open database, fuels Google’s
Knowledge Graph
For art professionals: Hi-res
altarpieces
• Microsites with hi-res multi-spectral imagery allows for seeing
what otherwise couldn’t be seen
• Of interest to art historians, conservators and people with a
great love of art!
• Costly and typically siloed and proprietary solutions, but with
IIIF as the emerging image sharing standard, imagery can be
accessed and used by other applications.
• Developed by our partner project Europeana Space
The GLAMwiki toolset
CC BY-SA
Ghent Altarpiece and the Rode altarpieces of Lübeck and Tallin
Takeaways to remember
• Be open - don’t enclose the public domain, use Creative
Commons licences
• Be generous - share your highest quality digital objects
• Be humble - work with partners, use platforms other than
your own, meet your audience where they already are
• Be aware - of your users needs and package your digital
content accordingly
The GLAMwiki toolset
CC BY-SA
If your forget all else, remember this!
Big Data, built by aggregating and de-siloing multiple
Small Data(sets), need to become Small Data(sets)
again. Segmented along different dimensions,
contextualised, re-packaged, curated if you will, to
become meaningful to the users they aim to serve.
The GLAMwiki toolset
CC BY-SA
09 November 2015
The Music Lesson, Louis Moritz,
1808, Rijksmuseum , Public Domain
For computational musicologists:
Music recording features and
metadata
• 35 000 music recordings - traditoinal and folk music, classical
music
• Metadata for all the recordings for download
• Extracted audio features for download
• iPython Jupiter Notebook documentation
The GLAMwiki toolset
CC BY-SA
Computational musicology is defined as the study of music with
computational modelling and simulation.
Really wanted
to
feature this research
dataset but no
tim
e!

53 million objects! Now what?

  • 1.
    53 million objectsand then what? On the challenge of abundance David Haskiya | Erasme - Descartes 2016
  • 2.
    The challenge Today Iwant to talk about abundance, the deluge of content that we produce, also in the Galleries, Libraries, Archives and Museums (GLAM)-sector. How can we make such abundance of content meaningful and useful to citizens, researchers, educators and students? How can we make it easier for them to find that specific needle in the haystack?
  • 3.
    Outline of mytalk • An introduction to Europeana (Collections) • Curation - what do I mean by it and why do we need it? • Examples of curating content for different audiences and use cases • Summary and takeaways
  • 4.
    What is Europeana? Ina couple of Tweet lengths, tops! France, Public Domain 1914, National Library of France Agence de presse Meurisse Concours de cycles nautiques sur le lac d’Enghien : Berregent piloté par Austerling
  • 5.
    What and whois Europeana? • We’re a non-profit foundation - idealists and true believers • A network of like-minded heritage and technology professionals • An open data platform with many services and drawing on the collections of nearly 4000 European GLAMs • Europeana Collections, Europeana APIs The GLAMwiki toolset CC BY-SA “We want to build on Europe’s rich heritage and make it easier for people to use, whether for work, for learning or just for fun!”
  • 6.
    Curation What do Imean by it? Why do we need it? Norway, CC BY-SA 1921, Oslo Museum Ernest Rude Ernest Marini - dancer in a costume
  • 8.
    What is curation? “Contentcuration is...the gathering, organizing and online presentation of content related to a particular theme or topic.” • So, in contemporary web lingo, not the same as what e.g. most museum curators would define it. • But the quote is missing something? Any suggestions?
  • 9.
    Users! Here represented byPersonas Europeana Music Collections CC BY-SA
  • 10.
    What is curation? “Contentcuration is...the gathering, organizing and online presentation of content related to a particular theme or topic, for a particular audience (or user).” • There, I fixed it. • Curation should not be audience agnostic
  • 11.
    Some examples For differentaudiences National Library of France, Public Domain Agence de presse Mondial Photo-Presse, Tournoi royal de motos à Londres : changement d'une roue de side-car en marche
  • 12.
    For digital humanists:Newspapers • 10 libraries, 426 newspaper titles, c. 11 million pages, 70 Gigabytes of text (compressed) • Allows unprecedented capability to research the role of news from pan-European perspectives The GLAMwiki toolset CC BY-SA Digital humanities (DH) is an area of scholarly activity at the intersection of computing and the disciplines of the humanities.
  • 13.
    For the teacher:World War I - A Battle of Perspectives • Created by teacher Gwen Vergouwen, Apple Distinguished Educator • iBook - allowing interactive teacher-guided exploration of contextualised primary sources • iTunesU - a course with expanded materials from the book The GLAMwiki toolset CC BY-SA Sources and interpretation concerning the origins of the First World War
  • 14.
    For the citizen:WWI on Wikipedia • 993 files in total, a small curated subset of what users have contributed to the Europeana 1914-1918 storytelling platform • Not Europeana’s content, it’s the user’s content, but we have uploaded it on their behalf • Various World War I related imagery: photographs, postcards, documents, trench art, militaria, etc. The GLAMwiki toolset CC BY-SA Wikipedia is the top online source for information
  • 15.
    • c. 1.2million views of the files in Wikipedia articles - per month* • The postcard of Franz Ferdinand minutes before his assassination is viewed c. 150 000 times per month • The files are used in about 50 language versions of Wikipedia • Technical quality is medium with images typically in 2-3 MP range Some stats
  • 16.
    For art lovers:Art on Wikidata and Wikipedia • 30 countries, 300 artworks • 816 Wikipedia articles, 10 000 artwork title pairs • Engaged dozens of art lovers in editing and translating articles • Articles will be read millions of times per month The GLAMwiki toolset CC BY-SA World’s most used encyclopedia and linked open database, fuels Google’s Knowledge Graph
  • 17.
    For art professionals:Hi-res altarpieces • Microsites with hi-res multi-spectral imagery allows for seeing what otherwise couldn’t be seen • Of interest to art historians, conservators and people with a great love of art! • Costly and typically siloed and proprietary solutions, but with IIIF as the emerging image sharing standard, imagery can be accessed and used by other applications. • Developed by our partner project Europeana Space The GLAMwiki toolset CC BY-SA Ghent Altarpiece and the Rode altarpieces of Lübeck and Tallin
  • 19.
    Takeaways to remember •Be open - don’t enclose the public domain, use Creative Commons licences • Be generous - share your highest quality digital objects • Be humble - work with partners, use platforms other than your own, meet your audience where they already are • Be aware - of your users needs and package your digital content accordingly The GLAMwiki toolset CC BY-SA If your forget all else, remember this!
  • 20.
    Big Data, builtby aggregating and de-siloing multiple Small Data(sets), need to become Small Data(sets) again. Segmented along different dimensions, contextualised, re-packaged, curated if you will, to become meaningful to the users they aim to serve. The GLAMwiki toolset CC BY-SA
  • 21.
    09 November 2015 TheMusic Lesson, Louis Moritz, 1808, Rijksmuseum , Public Domain
  • 22.
    For computational musicologists: Musicrecording features and metadata • 35 000 music recordings - traditoinal and folk music, classical music • Metadata for all the recordings for download • Extracted audio features for download • iPython Jupiter Notebook documentation The GLAMwiki toolset CC BY-SA Computational musicology is defined as the study of music with computational modelling and simulation. Really wanted to feature this research dataset but no tim e!