A practical case study of how to create Linked Open Data for 1.300 Dutch underground newspapers from World War 2 using Wikipedia, DBpedia and an old paper book.
Lecture given by Olaf Janssen - Wikimedia & Open Data coordinator for the National Library of the Netherlands (KB) - for students of the master's course "Digital Access to Cultural Heritage" at Leiden University on 3-3-2016
Capitol Tech U Doctoral Presentation - April 2024.pptx
Connecting 1,300 Dutch WWII newspapers
1. LOD case study: WW2 underground
newspapers on Wikipedia
Digital Access to Cultural Heritage, Leiden University, 3-3-2016
Olaf Janssen (Koninklijke Bibliotheek)
olaf.janssen@kb.nl - @ookgezellig
2. What I hope you’ll learn today
1. How to give a new life to an old paper book
2. How to get 1.300 newspapers from WW2 on Wikipedia
While doing 1 and 2:
3. The advantages of linked open data
(= downsides of unconnected data sources)
Olaf Janssen
Wikipedia &
Open data coordinator
National library of the
Netherlands
7. After the war many titles have
been (physically) preserved at the NIOD …
The national Institute for War, Holocaust and
Genocide Studies in Amsterdam
By Romaine - Own work, CC0, https://commons.wikimedia.org/w/index.php?curid=37072767
17. Say I want to know more about this newspaper
• What sort/style of underground paper was De Geus?
• What is the history of this newspaper?
• Who were working on it?
• Where was this newpaper printed?
• How was De Geus distributed and financed?
• Were there any relations with other illegal newspapers or resistance
groups?
• Etc…
Under “Details” perhaps?
18. OK ok, some metadata…
.. but I want to know móóór….
21. Problem with Delpher
(and KB-catalogue)
Véry little contextual information
about the newspaper(title)s
https://thejungleisneutral.files.wordpress.com/2013/11/lost.jpg
22. Question:
Where would most people start searching contextual
information about De Geus onder studenten?
Probably Wikipedia! (via Google)
25. Report on interest in WW2 among Dutch population
http://www.oorlogsbronnen.nl/gebruikersonderzoek2015, May 2015
26. Many of us use the internet to search for
information [..]. We often mention Wikipedia…
27. Everything is of course on Wikipedia. Just type in a
name and you can read entire essays... (man 70s)
28. Over half of us think that Wikipedia and Google
contribute to our knowledge and understanding of
history
When we have to find information about WW2 outside
the class setting, we fully concentrate on digital
resources like Google and Wikipedia. (school kids)
33. De Ondergrondse Pers 1940-1945
By Lydia E. Winkel & H. de Vries
1989, ISBN 9021837463
Veen Uitgevers
This book
(“De Winkel”) contains
contextual articles
about
(nearly) all ± 1.300
illegal WW2
newspapers
34. “De Winkel” – nr. 199
De Ondergrondse Pers 1940-1945 ,
Lydia E. Winkel, H. de Vries , 1989,
ISBN 9021837463,
Veen Uitgevers
36. Every article has metadata
• Title, subtitle, motto
• Place of publication
• Period of publication
• Publication frequency (daily, weekly, one-off, irregular)
• Multiplication (stenciled, printed, typed, handwritten)
• Contents (news, opinions, poems, illustrations, humor)
• Number of prints (min – max)
45. We need it digital!!
http://https://knowledgeutopia.files.wordpress.com/2014/01/hollandhouselibraryblitz1940.j
pg
46. We need it digital!!
1. Clear copyright with copyright holder (NIOD)
Open CC-BY-SA license
2. Scan & OCR
3. Convert into PDF
4. Put online: NIOD site & Wikimedia Commons
47. De Winkel as PDF on NIOD website (CC-BY-SA)
http://www.niod.nl/nl/de-ondergrondse-pers-1940-1945
48. De Winkel as PDF on Wikimedia Commons
https://commons.wikimedia.org/wiki/File:PDF_of_De_Ondergrondse_Pers_1940-1945_-_derde_druk_-_1989.pdf
49. De Winkel as PDF on NIOD website (CC-BY-SA)
http://www.niod.nl/nl/de-ondergrondse-pers-1940-1945
Saved us €13.330!
http://www.brill.com/dutch-underground-press-1940-1945
50. Wikipedia article about De Winkel
http://nl.wikipedia.org/wiki/De_ondergrondse_pers_1940-1945
54. ... but the data sources are
unconnected
(and for 3+4 unstructured & not machine-readable)
To summarize:
a lot of information is available about these WW2
underground newspapers
1. Metadata (KB-cat)
2. Content (full-text, Delpher)
3. Context (Winkel, PDF)
4. Relations: titles places, persons, other titles (Winkel, PDF)
5. External resources about titles, places and persons
57. Wikiproject(*) Verzetskranten
Systematically and uniformly describe & link all
1.300 Dutch underground newspapers from WW2
on Dutch Wikipedia
tinyurl.com/verzetskranten (in Dutch)
* https://nl.wikipedia.org/wiki/Wikipedia:Wikiproject, https://en.wikipedia.org/wiki/Wikipedia:Wikiproject
58. From 14 1.300 titles
https://nl.wikipedia.org/wiki/Categorie:Illegale_pers_in_de_Tweede_Wereldoorlog
59. Wikiproject(*) Verzetskranten
Systematically and uniformly describe & link all
1.300 Dutch underground newspapers from WW2
on Dutch Wikipedia
tinyurl.com/verzetskranten (in Dutch)
We need a
database!
73. Build central database
Step 2: Convert Excel into
RDF triplestore
(=special kind of online database anybody can access)
• Steps 1-4 from http://linda-project.eu/linked-
data-primer-2
• Step 4: Vocubulary used = Bibframe
(http://bibframe.org/vocab)
74. Build central database
Step 3: Link to external resources
• Step 5 from http://linda-project.eu/linked-data-primer-2
• DBpedia = machine-readable, structured version of Wikipedia
• DBpedia = hub for linking different data sets on the Web to each
other Linked Open Data cloud
• Connect persons & places in newspaper database to external
resources via DBpedia
75. Step 1c: Link to external resources
• Step 5 from http://linda-project.eu/linked-data-primer-2
• DBpedia = machine-readable , structured version of Wikipedia
DBpedia allows you to ask sophisticated queries against Wikipedia, and to link
the different data sets on the Web to Wikipedia data
• Connect persons & places in newspaper database to external
resources via DBpedia
http://lod-cloud.net/versions/2010-09-22/lod-cloud_colored.png
Linked Open Data cloud
76. Build central database
Step 3: Link to external resources
• Step 5 from http://linda-project.eu/linked-data-primer-2
• DBpedia = machine-readable, structured version of Wikipedia
• DBpedia = hub for linking different data sets on the Web to each
other Linked Open Data cloud
• We use DBpdia to connect persons & places in our newspaper
database to information in other databases
78. Build central database
Added value of Linked Open Data & DBpedia
Software can automatically query for additional
information about places and persons mentioned in
De Winkel that is not available in
• KB-catalogue
• Delpher
• De Winkel
79.
80. Summary: data about 1.300 newspapers
Available online Structured data (RDF-triples)
Open license (CC-BY-SA) Open standard (RDF)
Contextual information Links between titles
Delpher & KB-cat
Relations Links between titles, places
• Titles Places & persons external
• Titles Persons sources (via DBpedia)
• Titles Other titles
81. Summary: data about 1.300 newspapers
Available online Structured data (RDF-triples)
Open license (CC-BY-SA) Open standard (RDF)
Contextual information Links between titles
Delpher & KB-cat
Relations Links between titles, places
• Titles Places & persons external
• Titles Persons sources (via DBpedia)
• Titles Other titles
82. Summary: data about 1.300 newspapers
Available online Structured data (RDF-triples)
Open license (CC-BY-SA) Open standard (RDF)
Contextual information Links between titles
Delpher & KB-cat (via PPNs)
Relations Links between places
• Titles Places & persons external
• Titles Persons sources (via DBpedia)
• Titles Other titles
(PPNs)
85. Using an article template we can generate
1.300 uniform and interlinked Wikipedia articles
from the LOD-database
https://c1.staticflickr.com/9/8281/7699231918_11a7356c38_b.jpg
107. Problem with Delpher
(and KB-catalogue)
Véry little contextual information
about the newspaper(title)s
https://thejungleisneutral.files.wordpress.com/2013/11/lost.jpg
108. Problem with Delpher
(and KB-catalogue)
Véry little contextual information
about the newspaper(title)s
https://thejungleisneutral.files.wordpress.com/2013/11/lost.jpg
The KB can re-use (embed) the
Wikipedia content in its own
websites to tackle this problem
111. De geus; (onder studenten) was een verzetsblad uit de Tweede Wereldoorlog,
dat vanaf 4 oktober 1940 tot en met 13 juli 1944 …. Lees verder op Wikipedia
Embedded contextual
snippet from Wikipedia
http://www.delpher.nl/nl/kranten/results?coll=dddtitel&cql[]=ppn+any+(107123223)
Delpher - search results..
De geus; (onder studenten) was een verzetsblad uit de Tweede Wereldoorlog,
dat vanaf 4 oktober 1940 tot en met 13 juli 1944 …. Lees verder op Wikipedia
113. Over De Geus onder
studenten
De geus; (onder studenten) was een
verzetsblad uit de Tweede
Wereldoorlog, dat vanaf 4 oktober
1940 tot en met 13 juli 1944 in Den
Haag werd uitgegeven. Het blad
verscheen in 1940, 1941 en 1943
maandelijks, verder onregelmatig in
een oplage tussen de 250 en 8000
exemplaren. Het werd aanvankelijk
gestencild, en vanaf november 1942
gedrukt en de inhoud bestond
voornamelijk uit opinie-artikelen.
Het blad werd uitgegeven door Jan
Drion en Huib Drion, twee Leidse…
Lees verder op Wikipedia
Embedded contextual snippet
from Wikipedia
http://resolver.kb.nl/resolve?urn=ddd:010424553:mpeg21:p001
Delpher - object presentation
114. Suggested reading
• http://www.ted.com/talks/tim_berners_lee_on_the_next_web
20 years ago, Tim Berners-Lee invented the World Wide Web. For his next project, he's building a web for open, linked data
that could do for numbers what the Web did for words, pictures, video: unlock our data and reframe the way we use it
together.
• https://en.wikipedia.org/wiki/Linked_data
Wikipedia article related to the above video
• http://5stardata.info/en/
The 5 stars of Linked Open Data (Tim Berners-Lee)
• http://linda-project.eu/linked-data-primer-2/
Short primer about creating LOD in practice, starting from an Excel sheet
• http://www.programmableweb.com/news/how-linked-data-solved-digital-age-marketing-
problem/analysis/2015/08/31
The figure near the bottom of the first page is a good illustration of the concept of (linked) triples
• https://en.wikipedia.org/wiki/DBpedia
• https://en.wikipedia.org/wiki/Semantic_network
http://www.gettyimages.nl/detail/nieuwsfoto's/three-women-of-the-ats-light-up-together-ats-regulations-nieuwsfotos/3094265