co-funded by the European Union 
Linked Data Mapping Cultures 
An Evaluation of Metadata Usage and Distribution in a Linked Data Environment 
Konstantin Baierer, Evelyn Dröge, Vivien Petras, Violeta Trkulja Berlin School of Library and Information Science, Humboldt-Universität zu Berlin 
Presentation at the International Conference on Dublin Core and Metadata Applications Austin, October 9, 2014
Outline 
Linked Data Mapping Cultures 
2 
09.10.2014 
1.Linked Data mapping cultures 
2.Digitised Manuscripts to Europeana 
3.EDM and DM2E model 
4.Evaluation: aim, datasets, methods 
5.Results of the evaluation 
6.Conclusion
Linked Data mapping cultures 
•Linked Data offers great expressivity 
 With great freedom comes great responsibility 
•Data in DM2E: 
–Different data formats 
–Different data curation background = Different cultures in Linked Data 
•Data providers ≠ data mapping institutions 
•Mapping is influenced by policies, technology, best practices, personal preferences… 
Linked Data Mapping Cultures 
3 
09.10.2014
Digitised Manuscripts to Europeana (DM2E) 
4 
09.10.2014 
Linked Data Mapping Cultures 
Heterogeneous object data in independent resources
EDM and DM2E model 
EDM = Europeana Data Model 
•Used to describe Cultural Heritage Objects (CHOs) 
•Very generic but can be specialized 
DM2E model: Specialization of EDM for manuscripts 
Linked Data Mapping Cultures 
5 
09.10.2014 
dm2e: <http://onto.dm2e.eu/schemas/dm2e/1.0/> dm2edata: <http://data.dm2e.eu/data/> 
edm: <http://www.europeana.eu/schemas/edm/>
DM2E model: Example 
Linked Data Mapping Cultures 
6 
09.10.2014 
foaf:Person dm2edata:agent/uib/ wab/ Ludwig_Wittgenstein 
ore:Aggregation dm2edata:aggregation/uib/wab/Ms-115/Ms-115-2 
skos: prefLabel 
“Ludwig Wittgenstein”@de 
“remark Ms-115,1[2]et2[1] from Wittgenstein Nachlass MS 115”@en 
edm:ProvidedCHO 
dm2edata: item/uib/wab/ Ms-115/Ms-115-2 
foaf:Organization dm2edata:agent/uib/wab/ Wittgenstein_Archives 
edm:WebResource 
http://wab.uib.no/cost- a32_fax/115/Ms-115%2c1.jpg 
dm2e:Paragraph 
dc:type
Aim of the evaluation 
•Evaluation of datasets from the DM2E project 
–Based on mappings to the DM2E model 
•Aim: discover similarities and differences between datasets from different mapping institutions 
Linked Data Mapping Cultures 
7 
09.10.2014 
Do mapping preferences of individual institutions influence the resulting data from a mapping process?
Analyzed datasets 
•Datasets as of May 1, 2014 
•Analyzed datasets: 
–Eight data providers DP I – DP VIII 
–Ten datasets Dataset 1 – 10 
–Six mapping institutions MI A – F 
–Variety of metadata formats 
Linked Data Mapping Cultures 
8 
09.10.2014 
DP 
Dataset 
Metadata format 
MI 
DP I 
Dataset 1 
proprietary format 
MI A 
DP I 
Dataset 2 
proprietary format 
MI A 
DP II 
Dataset 3 
MAB2 
MI B 
DP II 
Dataset 4 
MAB2 
MI B 
DP III 
Dataset 5 
METS/ 
MODS 
MI C 
DP IV 
Dataset 6 
METS/ MODS 
MI C 
DP V 
Dataset 7 
TEI P5 
MI D 
DP VI 
Dataset 8 
EAD 
MI D 
DP VII 
Dataset 9 
TEI P5 
MI E 
DP VIII 
Dataset 10 
TEI P5 
MI F 
DP: Data Provider 
MI: Mapping institution
Evaluation methods 
•Count (SPARQL) 
–per dataset 
–globally 
–per rdf:type and dc:type 
•Create metrics (Python) 
•Visualize (Google Charts) 
•All visualizations: 
Linked Data Mapping Cultures 
9 
09.10.2014 
http://data.dm2e.eu/visualize
Results
CHO types 
Dataset 
bibo: 
Series 
bibo: 
Book 
dm2e: 
Manu- script 
dm2e: 
Para- graph 
bibo: 
Journal 
bibo: 
Issue 
fabio: 
Article 
bibo: 
Letter 
dm2e: 
Page 
Dataset 1 
- 
- 
24 
- 
- 
- 
- 
- 
10,427 
Dataset 2 
1,251 
10 
530,314 
Dataset 3 
4,552 
39,873 
- 
- 
- 
- 
- 
- 
- 
Dataset 4 
- 
- 
175 
- 
- 
- 
- 
- 
46,006 
Dataset 5 
- 
- 
1,012 
- 
- 
- 
- 
- 
307,202 
Dataset 6 
- 
2,916 
- 
- 
- 
- 
- 
- 
472,994 
Dataset 7 
- 
1,295 
- 
- 
- 
- 
- 
- 
416,172 
Dataset 8 
- 
- 
- 
- 
- 
- 
- 
3,630 
34,596 
Dataset 9 
- 
- 
- 
- 
1 
346 
42,173 
- 
159,277 
Dataset 10 
- 
- 
20 
9,635 
- 
- 
- 
- 
- 
Total 
4,552 
45,335 
1,241 
9,635 
1 
346 
42,173 
3,630 
1,976,988 
Linked Data Mapping Cultures 
11 
09.10.2014
Distribution of classes 
Linked Data Mapping Cultures 
12 
09.10.2014
Distribution of properties 
Linked Data Mapping Cultures 
13 
09.10.2014
Usage of different ontologies 
Linked Data Mapping Cultures 
14 
09.10.2014
Resources vs. literals 
Linked Data Mapping Cultures 
15 
09.10.2014
Literal statements 
Linked Data Mapping Cultures 
16 
09.10.2014
Predicate-Object-Equality-Ratio (POER-n) 
Linked Data Mapping Cultures 
17 
09.10.2014 
Triples 
S1 P1 O1 
S2 P1 O1 
S3 P1 O1 
S3 P2 O2 
S4 P1 O2 
S4 P2 O2 
S4 P2 O3 
POER-n POER-1: 85.71 % POER-2: 57.14 % 
POER-3: 57.14 % 
POER-4: 0 % 
POER-1 in DM2E datasets: 0.08 – 2.48 % 
Graph
Average number of statements (ANOS) 
Linked Data Mapping Cultures 
18 
09.10.2014
Conclusion 
•Linked Data quality assurance is vital 
•Structural metrics help everybody 
•Ontology engineering as a cyclic process 
•“Ontology pruning” 
•People > data in metadata mapping 
Linked Data Mapping Cultures 
19 
09.10.2014
Thank you for your attention! 
Konstantin Baierer 
Evelyn Dröge 
Berlin School of Library and Information Science 
Humboldt-Universität zu Berlin 
www.ibi.hu-berlin.de 
Digitised Manuscripts to Europeana 
www.dm2e.eu 
konstantin.baierer@ibi.hu-berlin.de 
evelyn.droege@ibi.hu-berlin.de 
Linked Data Mapping Cultures 
20 
09.10.2014
References 
Literature 
•Alexander, Keith, Richard Cyganiak, Michael Hausenblas, and Jun Zhao. (2009). Describing Linked Datasets. On the Design and Usage of VoID, the “Vocabulary of Interlinked Datasets”. In Bizer et al. (Eds.), Proceedings of the Linked Data on the Web Workshop (LDOW2009), Madrid, Spain, April 20, 2009, CEUR Workshop Proceedings. Retrieved, May 14, 2014, from http://ceur-ws.org/Vol-538/. 
•Auer, Sören, Jan Demter, Michael Martin, and Jens Lehmann. (2012). LODStats – An Extensible Framework for High-Performance Dataset Analytics. In ten Teije et al. (Eds.), Knowledge Engineering and Knowledge Management. 18th International Conference, EKAW 2012, Galway City, Ireland, October 8-12, 2012, Proceedings (pp. 356-362). Berlin, Heidelberg: Springer. doi: 10.1007/978-3-642-33876-2. 
•Carroll, J. Carroll, Christian Bizer, Pat Hayes, and Patrick Stickler. (2005). Named Graphs. In Journal of Web Semantics, 3, 247-267. 
•Dröge, Evelyn, Julia Iwanowa, and Steffen Hennicke. (2014a). A specialisation of the Europeana Data Model for the representation of manuscripts: The DM2E model. In Libraries in the Digital Age (LIDA) Proceedings, Volume 13, 2014. Retrieved, July, 24, 2014, from http://ozk.unizd.hr/proceedings/index.php/lida/article/view/117. 
•Dröge, Evelyn, Julia Iwanowa, Steffen Hennicke and Kai Eckert. (2014b, March). DM2E Model V1.1 Retrieved, May 12, 2014, from http://pro.europeana.eu/documents/1044284/0/DM2E+Model+V+1.1+Specification. 
•Europeana Data Model Primer, v14/07/2013. (2013, July). Retrieved from: Europeana Professional website. Retrieved, April 28, 2014, from http://pro.europeana.eu/ documents/900548/770bdb58-c60e-4beb-a687-874639312ba5. 
•Heath, Tom, and Christian Bizer. (2011). Linked Data: Evolving the Web into a Global Data Space. Synthesis Lectures on the Semantic Web: Theory and Technology (Vol. 1). Morgan & Claypool. 
•Klimek, Jakub, Jirí Helmich, and Martin Necasky. (2014). An analysis supported by numerous visualizations Application of the Linked Data Visualization Model on Real World Data from the Czech LOD Cloud. Linked Data on the Web (LDOW 2014) Workshop. Retrieved, May 14, 2014, from http://events.linkeddata.org/ldow2014/papers/ldow2014_paper_13.pdf. 
•Palavitsinis, Nikos, Nikos Manouselis, and Salvador Sanchez-Alonso. (2014). Metadata quality in digital repositories: Empirical results from the cross-domain transfer of a quality assurance process. Journal of the Association for Information Science and Technology. doi: 10.1002/asi.23045. 
•Seiffert, Florian. (2001). Eine Analyse der Verbunddaten des HBZ. ABI-technik 21(2): 125-146. 
•Smith-Yoshimura, Karen, Catherine Argus, Timothy J. Dickey, Chew Chiat Naun, Lisa Rowlison de Ortiz, Hugh Taylor. (2010, March). Implications of MARC Tag Usage on Library Metadata Practices, OCLC Online Computer Library Center, Inc. Retrieved, May 14, 2014, from http://www.oclc.org/research/publications/library/2010/2010-06.pdf 
Images 
•Speech Bubble (Slide 2): http://commons.wikimedia.org/wiki/File:Blue-Speech-Bubble.png 
•IBI (Slide 20): http://commons.wikimedia.org/wiki/File:Berlin,_Mitte, _Dorotheenstrasse,_Handelskammer_Berlin_02.jpg 
Linked Data Mapping Cultures 
21 
09.10.2014

Dc 2014 baierer-droege

  • 1.
    co-funded by theEuropean Union Linked Data Mapping Cultures An Evaluation of Metadata Usage and Distribution in a Linked Data Environment Konstantin Baierer, Evelyn Dröge, Vivien Petras, Violeta Trkulja Berlin School of Library and Information Science, Humboldt-Universität zu Berlin Presentation at the International Conference on Dublin Core and Metadata Applications Austin, October 9, 2014
  • 2.
    Outline Linked DataMapping Cultures 2 09.10.2014 1.Linked Data mapping cultures 2.Digitised Manuscripts to Europeana 3.EDM and DM2E model 4.Evaluation: aim, datasets, methods 5.Results of the evaluation 6.Conclusion
  • 3.
    Linked Data mappingcultures •Linked Data offers great expressivity  With great freedom comes great responsibility •Data in DM2E: –Different data formats –Different data curation background = Different cultures in Linked Data •Data providers ≠ data mapping institutions •Mapping is influenced by policies, technology, best practices, personal preferences… Linked Data Mapping Cultures 3 09.10.2014
  • 4.
    Digitised Manuscripts toEuropeana (DM2E) 4 09.10.2014 Linked Data Mapping Cultures Heterogeneous object data in independent resources
  • 5.
    EDM and DM2Emodel EDM = Europeana Data Model •Used to describe Cultural Heritage Objects (CHOs) •Very generic but can be specialized DM2E model: Specialization of EDM for manuscripts Linked Data Mapping Cultures 5 09.10.2014 dm2e: <http://onto.dm2e.eu/schemas/dm2e/1.0/> dm2edata: <http://data.dm2e.eu/data/> edm: <http://www.europeana.eu/schemas/edm/>
  • 6.
    DM2E model: Example Linked Data Mapping Cultures 6 09.10.2014 foaf:Person dm2edata:agent/uib/ wab/ Ludwig_Wittgenstein ore:Aggregation dm2edata:aggregation/uib/wab/Ms-115/Ms-115-2 skos: prefLabel “Ludwig Wittgenstein”@de “remark Ms-115,1[2]et2[1] from Wittgenstein Nachlass MS 115”@en edm:ProvidedCHO dm2edata: item/uib/wab/ Ms-115/Ms-115-2 foaf:Organization dm2edata:agent/uib/wab/ Wittgenstein_Archives edm:WebResource http://wab.uib.no/cost- a32_fax/115/Ms-115%2c1.jpg dm2e:Paragraph dc:type
  • 7.
    Aim of theevaluation •Evaluation of datasets from the DM2E project –Based on mappings to the DM2E model •Aim: discover similarities and differences between datasets from different mapping institutions Linked Data Mapping Cultures 7 09.10.2014 Do mapping preferences of individual institutions influence the resulting data from a mapping process?
  • 8.
    Analyzed datasets •Datasetsas of May 1, 2014 •Analyzed datasets: –Eight data providers DP I – DP VIII –Ten datasets Dataset 1 – 10 –Six mapping institutions MI A – F –Variety of metadata formats Linked Data Mapping Cultures 8 09.10.2014 DP Dataset Metadata format MI DP I Dataset 1 proprietary format MI A DP I Dataset 2 proprietary format MI A DP II Dataset 3 MAB2 MI B DP II Dataset 4 MAB2 MI B DP III Dataset 5 METS/ MODS MI C DP IV Dataset 6 METS/ MODS MI C DP V Dataset 7 TEI P5 MI D DP VI Dataset 8 EAD MI D DP VII Dataset 9 TEI P5 MI E DP VIII Dataset 10 TEI P5 MI F DP: Data Provider MI: Mapping institution
  • 9.
    Evaluation methods •Count(SPARQL) –per dataset –globally –per rdf:type and dc:type •Create metrics (Python) •Visualize (Google Charts) •All visualizations: Linked Data Mapping Cultures 9 09.10.2014 http://data.dm2e.eu/visualize
  • 10.
  • 11.
    CHO types Dataset bibo: Series bibo: Book dm2e: Manu- script dm2e: Para- graph bibo: Journal bibo: Issue fabio: Article bibo: Letter dm2e: Page Dataset 1 - - 24 - - - - - 10,427 Dataset 2 1,251 10 530,314 Dataset 3 4,552 39,873 - - - - - - - Dataset 4 - - 175 - - - - - 46,006 Dataset 5 - - 1,012 - - - - - 307,202 Dataset 6 - 2,916 - - - - - - 472,994 Dataset 7 - 1,295 - - - - - - 416,172 Dataset 8 - - - - - - - 3,630 34,596 Dataset 9 - - - - 1 346 42,173 - 159,277 Dataset 10 - - 20 9,635 - - - - - Total 4,552 45,335 1,241 9,635 1 346 42,173 3,630 1,976,988 Linked Data Mapping Cultures 11 09.10.2014
  • 12.
    Distribution of classes Linked Data Mapping Cultures 12 09.10.2014
  • 13.
    Distribution of properties Linked Data Mapping Cultures 13 09.10.2014
  • 14.
    Usage of differentontologies Linked Data Mapping Cultures 14 09.10.2014
  • 15.
    Resources vs. literals Linked Data Mapping Cultures 15 09.10.2014
  • 16.
    Literal statements LinkedData Mapping Cultures 16 09.10.2014
  • 17.
    Predicate-Object-Equality-Ratio (POER-n) LinkedData Mapping Cultures 17 09.10.2014 Triples S1 P1 O1 S2 P1 O1 S3 P1 O1 S3 P2 O2 S4 P1 O2 S4 P2 O2 S4 P2 O3 POER-n POER-1: 85.71 % POER-2: 57.14 % POER-3: 57.14 % POER-4: 0 % POER-1 in DM2E datasets: 0.08 – 2.48 % Graph
  • 18.
    Average number ofstatements (ANOS) Linked Data Mapping Cultures 18 09.10.2014
  • 19.
    Conclusion •Linked Dataquality assurance is vital •Structural metrics help everybody •Ontology engineering as a cyclic process •“Ontology pruning” •People > data in metadata mapping Linked Data Mapping Cultures 19 09.10.2014
  • 20.
    Thank you foryour attention! Konstantin Baierer Evelyn Dröge Berlin School of Library and Information Science Humboldt-Universität zu Berlin www.ibi.hu-berlin.de Digitised Manuscripts to Europeana www.dm2e.eu konstantin.baierer@ibi.hu-berlin.de evelyn.droege@ibi.hu-berlin.de Linked Data Mapping Cultures 20 09.10.2014
  • 21.
    References Literature •Alexander,Keith, Richard Cyganiak, Michael Hausenblas, and Jun Zhao. (2009). Describing Linked Datasets. On the Design and Usage of VoID, the “Vocabulary of Interlinked Datasets”. In Bizer et al. (Eds.), Proceedings of the Linked Data on the Web Workshop (LDOW2009), Madrid, Spain, April 20, 2009, CEUR Workshop Proceedings. Retrieved, May 14, 2014, from http://ceur-ws.org/Vol-538/. •Auer, Sören, Jan Demter, Michael Martin, and Jens Lehmann. (2012). LODStats – An Extensible Framework for High-Performance Dataset Analytics. In ten Teije et al. (Eds.), Knowledge Engineering and Knowledge Management. 18th International Conference, EKAW 2012, Galway City, Ireland, October 8-12, 2012, Proceedings (pp. 356-362). Berlin, Heidelberg: Springer. doi: 10.1007/978-3-642-33876-2. •Carroll, J. Carroll, Christian Bizer, Pat Hayes, and Patrick Stickler. (2005). Named Graphs. In Journal of Web Semantics, 3, 247-267. •Dröge, Evelyn, Julia Iwanowa, and Steffen Hennicke. (2014a). A specialisation of the Europeana Data Model for the representation of manuscripts: The DM2E model. In Libraries in the Digital Age (LIDA) Proceedings, Volume 13, 2014. Retrieved, July, 24, 2014, from http://ozk.unizd.hr/proceedings/index.php/lida/article/view/117. •Dröge, Evelyn, Julia Iwanowa, Steffen Hennicke and Kai Eckert. (2014b, March). DM2E Model V1.1 Retrieved, May 12, 2014, from http://pro.europeana.eu/documents/1044284/0/DM2E+Model+V+1.1+Specification. •Europeana Data Model Primer, v14/07/2013. (2013, July). Retrieved from: Europeana Professional website. Retrieved, April 28, 2014, from http://pro.europeana.eu/ documents/900548/770bdb58-c60e-4beb-a687-874639312ba5. •Heath, Tom, and Christian Bizer. (2011). Linked Data: Evolving the Web into a Global Data Space. Synthesis Lectures on the Semantic Web: Theory and Technology (Vol. 1). Morgan & Claypool. •Klimek, Jakub, Jirí Helmich, and Martin Necasky. (2014). An analysis supported by numerous visualizations Application of the Linked Data Visualization Model on Real World Data from the Czech LOD Cloud. Linked Data on the Web (LDOW 2014) Workshop. Retrieved, May 14, 2014, from http://events.linkeddata.org/ldow2014/papers/ldow2014_paper_13.pdf. •Palavitsinis, Nikos, Nikos Manouselis, and Salvador Sanchez-Alonso. (2014). Metadata quality in digital repositories: Empirical results from the cross-domain transfer of a quality assurance process. Journal of the Association for Information Science and Technology. doi: 10.1002/asi.23045. •Seiffert, Florian. (2001). Eine Analyse der Verbunddaten des HBZ. ABI-technik 21(2): 125-146. •Smith-Yoshimura, Karen, Catherine Argus, Timothy J. Dickey, Chew Chiat Naun, Lisa Rowlison de Ortiz, Hugh Taylor. (2010, March). Implications of MARC Tag Usage on Library Metadata Practices, OCLC Online Computer Library Center, Inc. Retrieved, May 14, 2014, from http://www.oclc.org/research/publications/library/2010/2010-06.pdf Images •Speech Bubble (Slide 2): http://commons.wikimedia.org/wiki/File:Blue-Speech-Bubble.png •IBI (Slide 20): http://commons.wikimedia.org/wiki/File:Berlin,_Mitte, _Dorotheenstrasse,_Handelskammer_Berlin_02.jpg Linked Data Mapping Cultures 21 09.10.2014