• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
All WP Meeting Athens - Preliminary Results of the Contextualisation - Klaus Thoden
 

All WP Meeting Athens - Preliminary Results of the Contextualisation - Klaus Thoden

on

  • 602 views

 

Statistics

Views

Total Views
602
Views on SlideShare
193
Embed Views
409

Actions

Likes
0
Downloads
1
Comments
0

2 Embeds 409

http://dm2e.eu 405
http://feedly.com 4

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    All WP Meeting Athens - Preliminary Results of the Contextualisation - Klaus Thoden All WP Meeting Athens - Preliminary Results of the Contextualisation - Klaus Thoden Presentation Transcript

    • Preliminary Results from the Contextualization Dominique Ritze, Klaus Thoden co-funded by the European Union
    • Why contextualization? • • Disambiguation Linked Open Data
    • Contextualization in Year 1 • Baseline • Identification of global identifiers • Authority and type of identity • BBAW, SBB, NLI, UB Frankfurt, MPIWG, ÖNB: – mostly contextualization of persons and corporate bodies • What can we do more? 27.11.13
    • Sources • GND/ VIAF – Persons, corporations, titles • LCSH, DDC – Subject headings • Wikipedia/Dbpedia – Everything • Geonames – Places • InPho – Argumentation structure • ISIL – Libraries • CERL – Historical places, printers 27.11.13
    • GND <title>Der zerbrochene Krug</title> GND 27.11.13 6
    • Viaf <author>Ludwig Wittgenstein</author> IA F V 27.11.13 7
    • Wikipedia/Dbpedia <subfield code=“a“>Aus der Bibliothek des Prinzen Eugen von Savoyen</subfield> edia ikip W 27.11.13 8
    • LCSH <subject>Adminstration</subject> CSH L 27.11.13 9
    • DDC The 1914 - 1918 Collection of the American Jewish Joint Distribution Committee is comprised of the records of the New York headquarters for the period from the Joint's origins providing emergency relief through World War I. DC D 27.11.13 10
    • Geonames <pubPlace>Berlin</pubPlace> Ge 27.11.13 es Nam o 11
    • Sources • GND/ VIAF – Persons, corporations, titles • LCSH, DDC – Subject headings • Wikipedia/Dbpedia – Everything • Geonames – Places • InPho – Argumentation structure • ISIL – Libraries • CERL – Historical places, printers 27.11.13
    • Workflow • Ingestion through Omnom • Contextualization in DM2E Triplestore • Common input vocabulary – but not really consistent • Saved as independent triples – no change of original data 27.11.13
    • SILK Demo • • • • 27.11.13 Workbench to create Linkage Rules with a GUI Transformations and Normalizations Similarity metrics to compare values Aggregators to combine various comparisons 14
    • Structured Data Unique Identifier a1 GND “118650130“ equals a2 GND “118650130“ project data GND 27.11.13 16
    • Structured Data Datatype Properties a1 name “C. Brodley“ similarity a2 name “Brodley, Carla“ project data GND 27.11.13 17
    • Structured Data Combination of Datatype Properties a1 year “1991“ a2 name name similarity “C. Brodley“ “Brodley, Carla“ year “1991“ similarity project data GND 27.11.13 18
    • Structured Data Excluding Links “1956“ year of birth a1 name “C. Brodley“ similarity a2 name “Brodley, Carla“ year of death “1820“ 27.11.13 project data GND 19
    • Unstructured Data Real-world example 27.11.13 24
    • Limitations      Needs high computing power No on-the-fly change of linkage rules Not well-suited for structured data Sparse metadata: get information out of transcriptions? Named Entity Recognition? Know your data! Results have to be checked. 27.11.13
    • DM2E Silk Workbench • Put behind SSO • No user management • Keep own sources (at least GND) • Possibly keep contextualization job to some power users 27.11.13