Linked Data for
Digital Heritage
and History
Victor de Boer
VU University Amsterdam
Keynote CSWS 2013 Shanghai
About me
Victor de Boer
Assistant professor at VU University Amsterdam
Domain-driven Semantic Technologies, Linked Data
Cu...
Linked Data is ``a term used to describe a
recommended best practice for exposing,
sharing, and connecting pieces
of data,...
The evolution of Science
Antonie van Leeuwenhoek’s
microscope (17th C.)
Large Hadron Collider in
Switzerland (21th C.)
Why Linked Data for E-science
Large amounts of data
Efficient analysis, data mining
Sharing data, information and knowledg...
OpenPhacts explorer
http://www.openphacts.org/
But what about the humanities?
Cultural Heritage
MultimediaN E-Culture project
• Museums have increasingly nice websites
• But: most of them are driven by stand-alone coll...
http://e-culture.multimedian.nl/
Search for objects which are linked
via concepts (semantic link)
China
Kanton
PartOf
Query
“China”
Use the type of semanti...
Vocabulary alignment
• In large virtual collections
there are always multiple
vocabularies with its own
perspective
– In m...
Vocabulary alignment
“Easel-pieces”
RMA concept
“Schilderij”
RMA is the thesaurus
of Rijksmuseum
AAT artefact type
“Easel ...
http://e-culture.multimedian.nl/
Amsterdam Museum
as Linked Open Data
17
Amsterdam Museum
• Formerly Amsterdam Historic
Museum
– “The rich collection of works of art,
objects and archaeological f...
Requirements for conversion and linking
• Transparent conversion
and linking of the data
– Use of provenance and
reproduci...
Methods
ClioPatria
XMLRDF
1. XML ingestion (OAI)
2. Direct transformation to ‘crude’ RDF
3. Interactive RDF restructuring
...
ClioPatria.swi-prolog.org
ClioPatria is powered by
XMLRDF rewriting rules examples
Mapping to popular vocabularies
am:obj_22093 “Job Cohen”
am:contentPersonName
rdfs:subPropertyOf
dcterms:subject
Amalgame alignment platform
• Semi-automatic
linking
– Simple automatic
techniques,
– chained together by
hand
• Transpare...
Amsterdam Museum als Linked Open Data
http://thedatahub.org/dataset/amsterdam-museum-as-edm-lod
E-history
(digital history)
BiographyNet
(Narrative) historical methodology
• Historical facts derived mainly from archival
findings and existing literature
• Hist...
Where do eScience and Biographical History meet?
• Quantitative analyses of a
larger group of people
(prosopography).
Surp...
Johan Rudolph Thorbecke werd
in 1798 geboren op 14 januari
in Zwolle en komt uit een half-Duit
Johan Rudolph Thorbecke wer...
Prototype under development
The information provided by the first system can
be used to:
1. Identify alternative descripti...
Verrijkt Koninkrijk
History of German occupied Dutch society
(1940-1945)
Published between 1969 and 1991 in 14
volumes, 30 parts, 18.000 pages...
country, collection, doc-type, volume, chapter, section, sub-section, paragraph
Back of the Book Vocabulary
+
Named Entity Vocabulary
SKOS vocabularies as stepping stones
http://semanticweb.cs.vu.nl/verrijktkoninkrijk/
niod:Blitzkrieg
niod:parRef
niod:oai_wo2_niod_nl_rec_102045
dct:subject
http://resolver.verrijktkoninkrijk.nl/nl.vk.d.reg....
skos:exactMatch
skos:exactMatch
botb:sjanghai
dbpedianl:sjanghai
dbpedia:sjanghai
owl:sameAs
Шанхай
Thượng Hải
上海市
Xangai
Šanghaj
Shanghai
Shanghai
rdfs:l...
SELECT * WHERE
{ ?s skos:prefLabel ?pl.
?s skos:closeMatch ?geo.
?geo gn:parentADM1 ?prov.
?prov gn:name ?provname.
?s nio...
Results are links to paragraphs
SPARQL for R
National-
Socialist
29%
Social-
Democrat
21%
Protestant
13%
Liberal
12%
R-Catholic
12%
Communist
8%
Jewish
5%...
Dutch Ships and Sailors
gz:Mercuur
1782
gz:Buijksloot
gz:Batavia
gz:Claas Roem
voc:Claas Roem
voc:Buijksloot
1752das:Mercuur
das:Departure
das:Roe...
DataLab
Lessons Learned
Be humble, transparent and
interactive in your data
conversion and linking
Lessons Learned
Lessons Learned
Retain complexities of the
data and establish layers
of interoperability
Lessons Learned
A Little Semantics goes a
Long Way…and so does
a small amount of links
Lessons Learned
Make sure your solutions
and tools fit the
methodology of the field
Lessons Learned
Show added benefit for
scientific research and
(unexpected) re-use
Lessons Learned
Linked Data is a good fit
for Humanities research
Thank you!
Victor de Boer
http://victordeboer.com
v.de.boer@vu.nl
Image credits
• Wikipedia lemmas
• Flickr images (cc-licensed)
– RMTip21
– Argonne National Laboratory
– thegarethwiscombe...
Keynote csws2013
Upcoming SlideShare
Loading in …5
×

Keynote csws2013

480 views

Published on

Keynote presentation for CSWS 2013 Conference in Shanghai, China.
Some slides borrowed from Jan Wielemaker, Guus Schreiber, Jacco van Ossenbruggen, Niels Ockeloen, Antske Fokkens, Serge ter Braake.

Published in: Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
480
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
7
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Keynote csws2013

  1. 1. Linked Data for Digital Heritage and History Victor de Boer VU University Amsterdam Keynote CSWS 2013 Shanghai
  2. 2. About me Victor de Boer Assistant professor at VU University Amsterdam Domain-driven Semantic Technologies, Linked Data Cultural Heritage Digital History Linked Data for Development
  3. 3. Linked Data is ``a term used to describe a recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF.’’ --Wikipedia
  4. 4. The evolution of Science Antonie van Leeuwenhoek’s microscope (17th C.) Large Hadron Collider in Switzerland (21th C.)
  5. 5. Why Linked Data for E-science Large amounts of data Efficient analysis, data mining Sharing data, information and knowledge between scientists Across continents Across disciplines
  6. 6. OpenPhacts explorer http://www.openphacts.org/
  7. 7. But what about the humanities?
  8. 8. Cultural Heritage
  9. 9. MultimediaN E-Culture project • Museums have increasingly nice websites • But: most of them are driven by stand-alone collection databases • Data is isolated, both syntactically and semantically • If users can do cross-collection search, the individual collections become more valuable! • Semantic Search
  10. 10. http://e-culture.multimedian.nl/
  11. 11. Search for objects which are linked via concepts (semantic link) China Kanton PartOf Query “China” Use the type of semantic link to provide meaningful presentation of the search results Rijksmuseum: View of Canton, with two Dutch ships Semantic Search
  12. 12. Vocabulary alignment • In large virtual collections there are always multiple vocabularies with its own perspective – In multiple languages – You can’t just merge them • But you can use vocabularies jointly by defining a limited set of links • It is surprising what you can do with just a few links
  13. 13. Vocabulary alignment “Easel-pieces” RMA concept “Schilderij” RMA is the thesaurus of Rijksmuseum AAT artefact type “Easel Piece” “Painting” AAT is Getty’s Art & Architecture Thesaurus
  14. 14. http://e-culture.multimedian.nl/
  15. 15. Amsterdam Museum as Linked Open Data
  16. 16. 17
  17. 17. Amsterdam Museum • Formerly Amsterdam Historic Museum – “The rich collection of works of art, objects and archaeological finds brings to life the fortunes of Amsterdammers of days gone by and today.” • In March 2010 published their whole collection online – 70.000 objects – CC license
  18. 18. Requirements for conversion and linking • Transparent conversion and linking of the data – Use of provenance and reproducibility • keep original complexities of the data • while making it interoperable with other (Europeana) data • Retain the relation to original data 19
  19. 19. Methods ClioPatria XMLRDF 1. XML ingestion (OAI) 2. Direct transformation to ‘crude’ RDF 3. Interactive RDF restructuring 4. Create a metadata mapping schema 5. Align vocabularies with external sources 6. Publish as Linked Data Amalgame Tools
  20. 20. ClioPatria.swi-prolog.org ClioPatria is powered by
  21. 21. XMLRDF rewriting rules examples
  22. 22. Mapping to popular vocabularies am:obj_22093 “Job Cohen” am:contentPersonName rdfs:subPropertyOf dcterms:subject
  23. 23. Amalgame alignment platform • Semi-automatic linking – Simple automatic techniques, – chained together by hand • Transparent and interactive
  24. 24. Amsterdam Museum als Linked Open Data http://thedatahub.org/dataset/amsterdam-museum-as-edm-lod
  25. 25. E-history (digital history)
  26. 26. BiographyNet
  27. 27. (Narrative) historical methodology • Historical facts derived mainly from archival findings and existing literature • Historians put them together into a narrative/synthesis. – The Narrative: a historical synthesis which can not be scientifically proven (only made likely) based on facts which can be proven or falsified. There is necessarily a creative element in drawing up a narrative Slides by BiographyNet team
  28. 28. Where do eScience and Biographical History meet? • Quantitative analyses of a larger group of people (prosopography). Surpassing the anecdotal. • Finding relations/networks between people which are otherwise hard to detect
  29. 29. Johan Rudolph Thorbecke werd in 1798 geboren op 14 januari in Zwolle en komt uit een half-Duit Johan Rudolph Thorbecke werd in 1798 geboren op 14 januari in Zwolle en komt uit een half-Duit Linked Data for BiograpyNet Thorbecke Biographical Description Provenance Meta Data NNBW Person Meta Data “Thorbecke” Biography Parts Birth 1798 Event Biographical Description Enrichment NLP Tool Person Meta Data Event Birth Johan Rudolph Thorbecke werd in 1798 geboren op 14 januari in Zwolle en komt uit een half-Duit Zwolle 1798-01-14
  30. 30. Prototype under development The information provided by the first system can be used to: 1. Identify alternative descriptions of events (same time, location and/or participants) 2. Identify relations between events (same locations & time, consequent events, same participants, etc.) 3. Initial networks of people http://www.biographynet.nl
  31. 31. Verrijkt Koninkrijk
  32. 32. History of German occupied Dutch society (1940-1945) Published between 1969 and 1991 in 14 volumes, 30 parts, 18.000 pages 1. Digitization, 2. Open Data, 3. Enriched access with Linked Open Data Het Koninkrijk der Nederlanden in de Tweede Wereldoorlog (The Kingdom of the Netherlands During World War II )
  33. 33. country, collection, doc-type, volume, chapter, section, sub-section, paragraph
  34. 34. Back of the Book Vocabulary + Named Entity Vocabulary SKOS vocabularies as stepping stones
  35. 35. http://semanticweb.cs.vu.nl/verrijktkoninkrijk/
  36. 36. niod:Blitzkrieg niod:parRef niod:oai_wo2_niod_nl_rec_102045 dct:subject http://resolver.verrijktkoninkrijk.nl/nl.vk.d.reg.4.1386 botb:Blitzkrieg skos:exactMatch
  37. 37. skos:exactMatch skos:exactMatch
  38. 38. botb:sjanghai dbpedianl:sjanghai dbpedia:sjanghai owl:sameAs Шанхай Thượng Hải 上海市 Xangai Šanghaj Shanghai Shanghai rdfs:label dbpedia: Shanghai_Jiao_Tong_University dbp:is_city_of
  39. 39. SELECT * WHERE { ?s skos:prefLabel ?pl. ?s skos:closeMatch ?geo. ?geo gn:parentADM1 ?prov. ?prov gn:name ?provname. ?s niod:pageRef ?pref. } 0 2000 4000 6000 8000 10000 12000 NE index BotB index Geographical analysis using background knowledge from GeoNames
  40. 40. Results are links to paragraphs
  41. 41. SPARQL for R National- Socialist 29% Social- Democrat 21% Protestant 13% Liberal 12% R-Catholic 12% Communist 8% Jewish 5% Pillar1 Pillar2 Co Liber. Protestant 0.29 Protestant R-Cath. 0.22 Liber. R-Cath. 0.21 Comm Soc-dem 0.20 Liber. Soc-dem 0.15
  42. 42. Dutch Ships and Sailors
  43. 43. gz:Mercuur 1782 gz:Buijksloot gz:Batavia gz:Claas Roem voc:Claas Roem voc:Buijksloot 1752das:Mercuur das:Departure das:Roem, Klaas 19-12-1780 das:Texel das:Arrival 20-7-1781 das:Batavia das:Voyage1 Web of Data
  44. 44. DataLab
  45. 45. Lessons Learned
  46. 46. Be humble, transparent and interactive in your data conversion and linking Lessons Learned
  47. 47. Lessons Learned Retain complexities of the data and establish layers of interoperability
  48. 48. Lessons Learned A Little Semantics goes a Long Way…and so does a small amount of links
  49. 49. Lessons Learned Make sure your solutions and tools fit the methodology of the field
  50. 50. Lessons Learned Show added benefit for scientific research and (unexpected) re-use
  51. 51. Lessons Learned Linked Data is a good fit for Humanities research
  52. 52. Thank you! Victor de Boer http://victordeboer.com v.de.boer@vu.nl
  53. 53. Image credits • Wikipedia lemmas • Flickr images (cc-licensed) – RMTip21 – Argonne National Laboratory – thegarethwiscombe • “Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/” • http://blogs.voanews.com/science-world/tag/cern/ • Gezicht op Canton, Vingboons-atlas, Bussum 1981, p. 35 VOC Kenniscentrum Links • http://semanticweb.cs.vu.nl • http://biographynet.nl • http://e-culture.multimedian.nl • http://cliopatria.swi-prolog.org

×