All WP Meeting Athens - Preliminary Results of the Contextualisation - Klaus Thoden

895 views

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
895
On SlideShare
0
From Embeds
0
Number of Embeds
475
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

All WP Meeting Athens - Preliminary Results of the Contextualisation - Klaus Thoden

  1. 1. Preliminary Results from the Contextualization Dominique Ritze, Klaus Thoden co-funded by the European Union
  2. 2. Why contextualization? • • Disambiguation Linked Open Data
  3. 3. Contextualization in Year 1 • Baseline • Identification of global identifiers • Authority and type of identity • BBAW, SBB, NLI, UB Frankfurt, MPIWG, ÖNB: – mostly contextualization of persons and corporate bodies • What can we do more? 27.11.13
  4. 4. Sources • GND/ VIAF – Persons, corporations, titles • LCSH, DDC – Subject headings • Wikipedia/Dbpedia – Everything • Geonames – Places • InPho – Argumentation structure • ISIL – Libraries • CERL – Historical places, printers 27.11.13
  5. 5. GND <title>Der zerbrochene Krug</title> GND 27.11.13 6
  6. 6. Viaf <author>Ludwig Wittgenstein</author> IA F V 27.11.13 7
  7. 7. Wikipedia/Dbpedia <subfield code=“a“>Aus der Bibliothek des Prinzen Eugen von Savoyen</subfield> edia ikip W 27.11.13 8
  8. 8. LCSH <subject>Adminstration</subject> CSH L 27.11.13 9
  9. 9. DDC The 1914 - 1918 Collection of the American Jewish Joint Distribution Committee is comprised of the records of the New York headquarters for the period from the Joint's origins providing emergency relief through World War I. DC D 27.11.13 10
  10. 10. Geonames <pubPlace>Berlin</pubPlace> Ge 27.11.13 es Nam o 11
  11. 11. Sources • GND/ VIAF – Persons, corporations, titles • LCSH, DDC – Subject headings • Wikipedia/Dbpedia – Everything • Geonames – Places • InPho – Argumentation structure • ISIL – Libraries • CERL – Historical places, printers 27.11.13
  12. 12. Workflow • Ingestion through Omnom • Contextualization in DM2E Triplestore • Common input vocabulary – but not really consistent • Saved as independent triples – no change of original data 27.11.13
  13. 13. SILK Demo • • • • 27.11.13 Workbench to create Linkage Rules with a GUI Transformations and Normalizations Similarity metrics to compare values Aggregators to combine various comparisons 14
  14. 14. Structured Data Unique Identifier a1 GND “118650130“ equals a2 GND “118650130“ project data GND 27.11.13 16
  15. 15. Structured Data Datatype Properties a1 name “C. Brodley“ similarity a2 name “Brodley, Carla“ project data GND 27.11.13 17
  16. 16. Structured Data Combination of Datatype Properties a1 year “1991“ a2 name name similarity “C. Brodley“ “Brodley, Carla“ year “1991“ similarity project data GND 27.11.13 18
  17. 17. Structured Data Excluding Links “1956“ year of birth a1 name “C. Brodley“ similarity a2 name “Brodley, Carla“ year of death “1820“ 27.11.13 project data GND 19
  18. 18. Unstructured Data Real-world example 27.11.13 24
  19. 19. Limitations      Needs high computing power No on-the-fly change of linkage rules Not well-suited for structured data Sparse metadata: get information out of transcriptions? Named Entity Recognition? Know your data! Results have to be checked. 27.11.13
  20. 20. DM2E Silk Workbench • Put behind SSO • No user management • Keep own sources (at least GND) • Possibly keep contextualization job to some power users 27.11.13

×