All WP Meeting Athens - Preliminary Results of the Contextualisation - Klaus Thoden

  • 558 views
Uploaded on

 

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
558
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
1
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Preliminary Results from the Contextualization Dominique Ritze, Klaus Thoden co-funded by the European Union
  • 2. Why contextualization? • • Disambiguation Linked Open Data
  • 3. Contextualization in Year 1 • Baseline • Identification of global identifiers • Authority and type of identity • BBAW, SBB, NLI, UB Frankfurt, MPIWG, ÖNB: – mostly contextualization of persons and corporate bodies • What can we do more? 27.11.13
  • 4. Sources • GND/ VIAF – Persons, corporations, titles • LCSH, DDC – Subject headings • Wikipedia/Dbpedia – Everything • Geonames – Places • InPho – Argumentation structure • ISIL – Libraries • CERL – Historical places, printers 27.11.13
  • 5. GND <title>Der zerbrochene Krug</title> GND 27.11.13 6
  • 6. Viaf <author>Ludwig Wittgenstein</author> IA F V 27.11.13 7
  • 7. Wikipedia/Dbpedia <subfield code=“a“>Aus der Bibliothek des Prinzen Eugen von Savoyen</subfield> edia ikip W 27.11.13 8
  • 8. LCSH <subject>Adminstration</subject> CSH L 27.11.13 9
  • 9. DDC The 1914 - 1918 Collection of the American Jewish Joint Distribution Committee is comprised of the records of the New York headquarters for the period from the Joint's origins providing emergency relief through World War I. DC D 27.11.13 10
  • 10. Geonames <pubPlace>Berlin</pubPlace> Ge 27.11.13 es Nam o 11
  • 11. Sources • GND/ VIAF – Persons, corporations, titles • LCSH, DDC – Subject headings • Wikipedia/Dbpedia – Everything • Geonames – Places • InPho – Argumentation structure • ISIL – Libraries • CERL – Historical places, printers 27.11.13
  • 12. Workflow • Ingestion through Omnom • Contextualization in DM2E Triplestore • Common input vocabulary – but not really consistent • Saved as independent triples – no change of original data 27.11.13
  • 13. SILK Demo • • • • 27.11.13 Workbench to create Linkage Rules with a GUI Transformations and Normalizations Similarity metrics to compare values Aggregators to combine various comparisons 14
  • 14. Structured Data Unique Identifier a1 GND “118650130“ equals a2 GND “118650130“ project data GND 27.11.13 16
  • 15. Structured Data Datatype Properties a1 name “C. Brodley“ similarity a2 name “Brodley, Carla“ project data GND 27.11.13 17
  • 16. Structured Data Combination of Datatype Properties a1 year “1991“ a2 name name similarity “C. Brodley“ “Brodley, Carla“ year “1991“ similarity project data GND 27.11.13 18
  • 17. Structured Data Excluding Links “1956“ year of birth a1 name “C. Brodley“ similarity a2 name “Brodley, Carla“ year of death “1820“ 27.11.13 project data GND 19
  • 18. Unstructured Data Real-world example 27.11.13 24
  • 19. Limitations      Needs high computing power No on-the-fly change of linkage rules Not well-suited for structured data Sparse metadata: get information out of transcriptions? Named Entity Recognition? Know your data! Results have to be checked. 27.11.13
  • 20. DM2E Silk Workbench • Put behind SSO • No user management • Keep own sources (at least GND) • Possibly keep contextualization job to some power users 27.11.13