Eswc2012 presentation: Supporting Linked Data Production for Cultural Heritage institutes: The Amsterdam Museum Case Study

  • 709 views
Uploaded on

Within the cultural heritage field, proprietary metadata and vocabularies are being transformed into public Linked Data. These efforts have mostly been at the level of large-scale aggregators such as …

Within the cultural heritage field, proprietary metadata and vocabularies are being transformed into public Linked Data. These efforts have mostly been at the level of large-scale aggregators such as Europeana where the original data is abstracted to a common format and schema. Although this approach ensures a level of consistency and interoperability, the richness of the original data is lost in the process. In this paper, we present a transparent and interactive methodology for ingesting, converting and linking cultural heritage metadata into Linked Data. The methodology is designed to maintain the richness and detail of the original metadata.
We introduce the XMLRDF conversion tool and describe how it is integrated in the ClioPatria semantic web toolkit. The methodology and the tools have been validated by converting the Amsterdam Museum metadata to a Linked Data version. In this way, the Amsterdam Museum became the first `small' cultural heritage institution with a node in the Linked Data cloud.

More in: Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
709
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
5
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Rather than having Linked Data ingestion being done automatically by large aggregators,we present a methodology that is both transparent and interactive. The methodologycovers data ingestion, conversion, alignment and Linked Data publication. It ishighly modular with clearly recognizable data transformation steps, which can be evaluatedand adapted based on these evaluations. This design allows the institute’s collectionmanagers, who are most knowledgeable about their own data, to perform or overseethe process themselves. We describe a stack of tools that allow collection managers toproduce a Linked Data version of their metadata that maintains the richness of the originaldata including the institute-specific metadata classes and properties. By providinga mapping to a common schema interoperability is achieved.Flickr:  givingnot@rocketmail.com, aoppelaar,  hhesterr,  Grufnik,  moria, Banjaxx, Paradasos
  • 2.4 million texts, images, videos and sounds gathered by Europeana. These objects come from data providers who have reacted early and positively to Europeana's initiative of promoting more open data and new data exchange agreements. These collections come from 8 direct Europeana providers encompassing over 200 cultural institutions from 15 countries. 
  • Rather than having Linked Data ingestion being done automatically by large aggregators,we present a methodology that is both transparent and interactive. The methodologycovers data ingestion, conversion, alignment and Linked Data publication. It ishighly modular with clearly recognizable data transformation steps, which can be evaluatedand adapted based on these evaluations. This design allows the institute’s collectionmanagers, who are most knowledgeable about their own data, to perform or overseethe process themselves. We describe a stack of tools that allow collection managers toproduce a Linked Data version of their metadata that maintains the richness of the originaldata including the institute-specific metadata classes and properties. By providinga mapping to a common schema interoperability is achieved.Flickr:  givingnot@rocketmail.com, aoppelaar,  hhesterr,  Grufnik,  moria, Banjaxx, Paradasos
  • Rather than having Linked Data ingestion being done automatically by large aggregators,we present a methodology that is both transparent and interactive. The methodologycovers data ingestion, conversion, alignment and Linked Data publication. It ishighly modular with clearly recognizable data transformation steps, which can be evaluatedand adapted based on these evaluations. This design allows the institute’s collectionmanagers, who are most knowledgeable about their own data, to perform or overseethe process themselves. We describe a stack of tools that allow collection managers toproduce a Linked Data version of their metadata that maintains the richness of the originaldata including the institute-specific metadata classes and properties. By providinga mapping to a common schema interoperability is achieved.Flickr:  givingnot@rocketmail.com, aoppelaar,  hhesterr,  Grufnik,  moria, Banjaxx, Paradasos
  • - Not completely straightforward xml (nestedness)
  • XMLRDF tool: clean up, link to resources etc.
  • XMLRDF tool: clean up, link to resources etc.58 XMLRDF rewrite rules23 rewriting rules2 rules
  • Apps for AmsterdamPlaatsen van Betekenis

Transcript

  • 1. Supporting Linked Data Production for Cultural Heritage institutes: The Amsterdam Museum Case Study Victor de Boer, Jan Wielemaker, Judith van Gent, Michiel Hildebrand, Antoine Isaac, Jacco van Ossenbruggen, Guus Schreiber EuropeanaConnect
  • 2. Aggregator 2
  • 3. Europeana “Europeana enables people to explore the digital resources of Europes museums, libraries, archives and audio-visual collections.’’ www.europeana.euFrom portal… …to data aggregator. 3
  • 4. data.europeana.eu2.4 Million objects exposed asLinked Data.8 aggregators, 200institutions, 15 countriesEuropeana Semantic Elementsconverted to RDF EuropeanaData Model (EDM) 4
  • 5. Linked data-ifyAggregate and convert 5
  • 6. Linked data-ify Convert to Linked data Mapped to EDMAggregate and convert 6
  • 7. Methodology and tool stack• Focus on transparency and interactivity – Reproducability – Both in conversion and alignment• Maintain detail and complexity of original data• Interoperability through schema mapping 7
  • 8. Methods Tools ClioPatria1. XML ingestion (OAI)2. Direct transformation to ‘crude’ RDF XMLRDF3. Interactive RDF restructuring4. Create a metadata mapping schema5. Align vocabularies with external sources Amalgame6. Publish as Linked Data 8 cliopatria.swi-prolog.org powered by
  • 9. Case study: Amsterdam Museum• Formerly Amsterdam Historic Museum – “The rich collection of works of art, objects and archaeological finds brings to life the fortunes of Amsterdammers of days gone by and today.”• In March 2010 published their whole collection online – 73.000 objects – CC license 9
  • 10. Methods Tools ClioPatria1. XML ingestion2. Direct transformation to ‘crude’ RDF XMLRDF3. Interactive RDF restructuring4. Create a metadata mapping schema Amalgame5. Align vocabularies with external sources6. Publish as Linked Data 10
  • 11. Ingested AM metadata <record priref="10541“ >• Adlib database XML API <acquisition.date>1997</acquisition.date> <dimension> <dimension.type>hoogte</dimension.type> <dimension.unit>cm</dimension.unit>• Object metadata <dimension.value>6</dimension.value> </dimension> • 73.000 objects, 256MB … • Nested XML </record>• Concept Thesaurus <record priref="28024“ > <term>Kalverstraat 124</term> • 27.000, 9MB <broader_term>Kalverstraat</broader_term> <term.type>GEOKEYW </term.type> • Different types (geo,motif, event) </record>• Person Authority File <record priref="6" > • 67.000 persons, 10MB <biography>boekverkoper en uitgever van • Consolidated from object metadata fields cartografie</biography> <birth.date.start>1659</birth.date.start> • Creators, annotators, reproduction <death.date.start>1733</death.date.start> creators, institutions, <name>Aa, Pieter van der</name> <nationality>Nederlands</nationality> <use>Aa, Pieter van der (I)</use> </record> 11
  • 12. XMLRDF (1) Syntactic RDF conversion<record priref="19319 “ > <date>1651</date> <maker>Rembrandt (1606-1669)</maker> <object.type>etsplaat</object.type> priref “19319 ”… date</record> “1651” am:Record _:bn1 “Rembrandt (1606-1669)” object.type “etsplaat”  XML-Element is attributes + content  Map to RDF blank-node + attributes Attributes → Literals (+xml:lang) Content  If plain → Literal (+xml:lang)  Otherwise → RDF blank node (recursive) 12
  • 13. ClioPatria:Intermediate Statistics 13
  • 14. Methods Tools ClioPatria1. XML ingestion2. Direct transformation to ‘crude’ RDF XMLRDF3. Interactive RDF restructuring4. Create a metadata mapping schema Amalgame5. Align vocabularies with external sources6. Publish as Linked Data 14
  • 15. XMLRDF Graph rewrite rule language Declarative committed-choice language based on CHR (Constraint Handling Rules)  Triples <=> Guard, NewTriples  Keep Triples <=> Guard, NewTriples 15
  • 16. Example
  • 17. AM rewriting rules examples 17
  • 18. RDF rewriting conversion<record priref="19319 “ > <date>1651</date> <maker>Rembrandt (1606-1669)</maker> <object.type>etsplaat</object.type> priref “19319 ”… date</record> “1651” am:Record _:bn1 “Rembrandt (1606-1669)” object.type “etsplaat” “19319 ” am:date “1651” “1234” am:priref am:Record am:birthdate am:maker am:Person am:proxy-19319 “1606” am:p-1234 rda:name “Rembrandt” skos:Concept am:etsplaat “etsplaat” skos:prefLabel 18
  • 19. Some statistics Amsterdam Museum Rules Resources Predicaes Triples usedObject metadata 58 73,447 100 5,700,371 (Proxies)Thesaurus 23 28,000 13 601,819 (Concepts)Person Auth List 2 66,966 21 301,143 (Persons) 558,161 Proxy-Concept relations 80,432 Proxy-Person relations 243,532 Proxy-Proxy relations 19
  • 20. Methods Tools ClioPatria1. XML ingestion2. Direct transformation to ‘crude’ RDF XMLRDF3. Interactive RDF restructuring4. Create a metadata mapping schema Amalgame5. Align vocabularies with external sources6. Publish as Linked Data 20
  • 21. Mapping to EDM dcterms:subject rdfs:subPropertyOf am:contentPersonNameam:proxy_22093 “Job Cohen” 21
  • 22. Europeana Data Model (EDM)• Dublin Core for metadata representation – creator, date, title etc.• SKOS for vocabularies – preferredLabel, hasBroader, etc.• RDA Group 2 elements for persons – dateOfBirth, name etc.• OAI-ORE to allow for aggregations etc.• Some EDM-specific properties – edm:wasPresentAt, … 22
  • 23. Methods Tools ClioPatria1. XML ingestion2. Direct transformation to ‘crude’ RDF XMLRDF3. Interactive RDF restructuring4. Create a metadata mapping schema Amalgame5. Align vocabularies with external sources6. Publish as Linked Data 23
  • 24. Amalgame Alignment Platform 24 semanticweb.cs.vu.nl/amalgame
  • 25. AM Alignments• 3500+ links put in RDF – 143 places linked to GeoNames – 1076 persons linked to ULAN (VIAF) – 34 persons linked to DBPedia – 2498 concepts AATNed. 25
  • 26. Methods Tools ClioPatria1. XML ingestion2. Direct transformation to ‘crude’ RDF XMLRDF3. Interactive RDF restructuring4. Create a metadata mapping schema Amalgame5. Align vocabularies with external sources6. Publish as Linked Data 26
  • 27. Architecture SPARQL-app Browser Purl.org redirect SPARQL Web interface HTTP serverRDF(s) storage Logic Prolog 27 http://semanticweb.cs.vu.nl/europeana/
  • 28. Content negotiation @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix ore: <http://www.openarchives.org/ore/terms/> . @prefix ens: <http://www.europeana.eu/schemas/edm/> . @prefix ahm: <http://purl.org/collections/nl/am/> ahm:proxy-66970 a ore:Proxy ; ahm:title "Zegelstempel Felix Meritis"@nl ; ahm:material ahm:t-12463 , ahm:t-5447 ; ahm:objectCategory ahm:t-5504 ; ahm:objectName ahm:t-13817 , ahm:t-8489 ; ahm:objectNumber "KA 7653.1" ; ahm:priref "66970" . ahm:proxy-66972 a ore:Proxy ; ahm:acquisitionDate "0000" ; ahm:title "Zegelstempel mogelijk van familiewapen"@nl . 28
  • 29. “Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
  • 30. Wrapping up Methodology• Stay as close as possible to original (XML) metadata• Separate syntactic transformation, semantic interpretations• Interactive workflow, simple steps• Use rdf schema to map to interoperability layer• Keep provenance, reproducability Tools• XMLRDF Realised clean workflow for RDF production.• Amalgame: Interactive and transparent vocablary alignment• ClioPatria Semantic server: statistics at any moment + Full expressivity of Some Prolog 31
  • 31. Issues• Validate with real collection managers – Making good rules is sometimes hard – Graphical tools can help• Integrate in normal collection workflow (tools) – LD as another view on the data – Live updates• RDFS reasoning needed to have interoperability 32
  • 32. http://semanticweb.cs.vu.nl/lod/am/ v.de.boer@vu.nl amsterdammuseum.nl ? ClioPatria: the SWI-Prolog RDF toolkit(includes XMLRDF and Amalgame packages) http://cliopatria.swi-prolog.org