Your SlideShare is downloading. ×
0
Het Verrijkt Koninkrijk                           NIOD Lunchlezing                              08/01/2013Johan van Doorni...
The Kingdom of the Netherlands          During World War II• History of German occupied Dutch society  (1940-1945)• 14 vol...
Clarin-VK: Verrijkt Koninkrijk“The aim of this project is twofold; in the demonstrator part ofthe project advanced tools a...
Verrijkt Koninkrijk Project          NIOD: Historical research          questions          UvA: Representation of digital ...
Digitization and Search     (the UvA part)
<book xmlns="http://www.loedejongdigitaal.nl" vk:id="nl.vk.d.5-I"> <index vk:title="Inhoud" vk:id="nl.vk.d.5-I.1"> <chapte...
Back of the BookRequired specialized parsing:  Pages (312, 316, …) and page ranges (210-215, …)  See and See also referenc...
Counting elementsvk:book         30vk:chapter      226vk:section      1885vk:subsection   4708vk:p            86257vk:quot...
Resolver     http://resolver.loedejongdigitaal.nl/nl.vk.d.5-II.6.1.2.2country, collection, doc-type, volume, chapter, sect...
Named Entities + Wikification1. Natural Language Processing with FROG2. Detecting names  Machine learned detection using P...
Verrijkt Koninkrijk and Linked Data           (the VUA part)
What is Linked Open Data •Open data is about open licenses •Linked (Open) Data is about interoperability``a term used to d...
Web of Documents (WWW)Linked Documents
Web of DataLinked Data
“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
Linked Data:NIOD and VK       bbwo2:plaatje1.jpg                                   4en5mei:Avonklok                       ...
Niod      Named Entity                         Back of thethesaurus     Results                            Book-index     ...
Niod      Named Entity                         Back of thethesaurus     Results                            Book-index     ...
Niodthesaurus     NIOD List of terms• Used by NIOD library,           Rub Term  archive, AV archive             4   Repres...
Niod                                                       Niod Thesaurus (SKOS)thesaurus                                 ...
Back of the                                                                       Back-of-the-Book Index (SKOS)        Boo...
Named Entities (SKOS)     Named Entity       Results                                                                      ...
Linked Data       Niod      Named Entity                         Back of the     thesaurus     Results                    ...
Niod thesaurusniod:oai_wo2_niod_nl_rec_102045                                               subject                       ...
GTAA thesaurus             gtaa:Oorlog                                                                   Niod thesaurus   ...
dbpedia:Minister-President dbpedia:Barend Biesheuvel                                   dbpedia:Abraham Kuijperentity:Baren...
Geonames:Zuid-Holland                                                                          32780                      ...
The semantic server
“Give me all BBWO2 images linked to a     VK paragraph through a niod  thesaurus entity found in the text”PREFIX niod: <ht...
“What placenames occur on which page and to which province do they belong”PREFIX niod: <http://purl.org/collections/nl/nio...
“Give me all occurrences of Prime        Ministers in Het Koninkrijk”PREFIX dcterms: <http://purl.org/dc/terms/>PREFIX nio...
Hackathon        Photos from Flickr user HackNY
Some issues• Quality issues  – OCR  – Named Entity Recognition/Reconcilliation  – Linkage• Pillarization question• Accepta...
?
Vk niod jan_2013
Vk niod jan_2013
Vk niod jan_2013
Upcoming SlideShare
Loading in...5
×

Vk niod jan_2013

213

Published on

Verrijkt Koninkrijk presentation given at the NIOD lunch meeting

Published in: Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
213
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
2
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Also: how did the perspective change
  • VIC: Add numbers
  • Transcript of "Vk niod jan_2013"

    1. 1. Het Verrijkt Koninkrijk NIOD Lunchlezing 08/01/2013Johan van Doornik (UvA) Victor de Boer (VUA)
    2. 2. The Kingdom of the Netherlands During World War II• History of German occupied Dutch society (1940-1945)• 14 volumes, 30 parts, 18.000 pages• Digitized version online in 2011, crashing the server “Published between 1969 and 1991, the 30 volumes still combine the qualities of an authoritative work for a general audience, and an inevitable point of reference for scholars”
    3. 3. Clarin-VK: Verrijkt Koninkrijk“The aim of this project is twofold; in the demonstrator part ofthe project advanced tools and techniques are applied togather data on De Jongs perception of the much debated issueof pillarization (Dutch: verzuiling) and group identity. In theresource curation part of the project the corpus will beenriched and made available to the CLARIN-community forfurther research”
    4. 4. Verrijkt Koninkrijk Project NIOD: Historical research questions UvA: Representation of digital text, Named Entity extraction and consolidation, search prototype VUA: Enrichment of structured sources, internal and external linking. Hackathon DANS: Data storage and access.
    5. 5. Digitization and Search (the UvA part)
    6. 6. <book xmlns="http://www.loedejongdigitaal.nl" vk:id="nl.vk.d.5-I"> <index vk:title="Inhoud" vk:id="nl.vk.d.5-I.1"> <chapter vk:title="Lente 4 1" vk:number="1" vk:id="nl.vk.d.5-I.2"> <section vk:title="" vk:id="nl.vk.d.5-I.2.1"> <section vk:title="Oorlogsverloop en -perspectiej?" vk:id="nl.vk.d.5-I.2.2"> <section vk:title="II. Midden-Oosten, lente 1941" vk:id="nl.vk.d.5-I.2.3"> <subsection vk:id="nl.vk.d.5-I.2.3.1"> <subsection vk:id="nl.vk.d.5-I.2.3.2"> <p vk:pdf-page-ref="21" vk:id="nl.vk.d.5-I.2.3.2.1">Hoe kon Engeland ooit de oorlog winnen?</p> <p vk:pdf-page-ref="21" vk:id="nl.vk.d.5-I.2.3.2.2">Het is, achteraf gezien, volstrekt duidelijk ... <p vk:pdf-page-ref="22" vk:id="nl.vk.d.5-I.2.3.2.3">Deze conceptie was bemoedigend en dit ... <page vk:pdf-page="22" vk:original-page="14" vk:id="nl.vk.d.5-I.2.3.2.3.14"> <backofbook-ref> </page> <header vk:id="nl.vk.d.5-I.2.3.2.3.15">HET BRITSE OORLOGSPLAN</header>men zich in Londen: in de ... <p vk:pdf-page-ref="23" vk:id="nl.vk.d.5-I.2.3.2.4">Hoe dat zij vooral Churchill ... <p vk:pdf-page-ref="23" vk:id="nl.vk.d.5-I.2.3.2.5">Had men dat in bezet Nederland vernomen ... </subsection> </section> <section vk:title="Publieke opinie" vk:id="nl.vk.d.5-I.2.4"> <subsection vk:id="nl.vk.d.5-I.2.4.1"> <p vk:pdf-page-ref="23" vk:id="nl.vk.d.5-I.2.4.1.1">Het verwachtingspatroon van een volk ... <p vk:pdf-page-ref="23" vk:id="nl.vk.d.5-I.2.4.1.2">1 Aangehaald in Butler .... <page vk:pdf-page="23" vk:original-page="15" vk:id="nl.vk.d.5-I.2.4.1.2.4"> <backofbook-ref> <lemma-ref>Azoren</lemma-ref> <lemma-ref>Bomber Command</lemma-ref> <lemma-ref>Canarische eilanden</lemma-ref> <lemma-ref>Madeira</lemma-ref> <lemma-ref>Portugal</lemma-ref> <lemma-ref>Spanje</lemma-ref> <lemma-ref>Tsjechoslowakije</lemma-ref> </backofbook-ref> </page>
    7. 7. Back of the BookRequired specialized parsing: Pages (312, 316, …) and page ranges (210-215, …) See and See also references OCR correction for numbers (3I2 = 312, …) Verification of all page references Mapping page references to paragraph references Terms that span multiple pages in the back of book Layout not always as consistent as you would like
    8. 8. Counting elementsvk:book 30vk:chapter 226vk:section 1885vk:subsection 4708vk:p 86257vk:quote 56547vk:page 16922vk:lemma 16186vk:lemma-ref 148370
    9. 9. Resolver http://resolver.loedejongdigitaal.nl/nl.vk.d.5-II.6.1.2.2country, collection, doc-type, volume, chapter, section, sub-section, paragraph <p vk:pdf-page-ref="338" vk:id="nl.vk.d.5-II.6.1.2.2">En in het algemeen leed de Geallieerde koopvaardij in de eerste zes maanden van 42 opnieuw zeer zware verliezen. Zij waren vooral gevolg van het feit dat de Amerikanen traag waren met het treffen van veiligheidsmaatregelen in de Caraïbische Zee en in de zeegebieden bij de Amerikaanse oostkust. Maandenlang vonden<i>U-Boote</i>daar een uiterst profijtelijk jachtterrein. Het aantal<i>U-Boote</i>nam ook steeds toe; in juli 41 waren er constant 65 in de vaart, in juli 42 140. Hitler bezat er toen 331 en er waren, doordat de<i>U-Boote</i>zich zo verspreid hadden, in de zeven maandenvan januari t.e.m. juli 42 slechts weinige vernietigd: 31. In die periode verloren de Geallieerden daartentegen per maand gemiddeld meer dan een half miljoen ton aan scheepsruimte. Het waren vooral die scheepsverliezen die de Geallieerde oorlogsleiders in de eerste helft van 42 voortdurend aanleiding gaven tot diepe bezorgdheid. Hoe haakten zij naar de dag waarop de Duitsers en Italianen uit NoordAfrika verdreven zouden zijn! Dan zou eindelijk de lange, schepen verslindende toevoerroute naar Egypte om Afrika heen door de zoveel kortere via de Straat van Gibraltar vervangen kunnen worden.</p>
    10. 10. Named Entities + Wikification1. Natural Language Processing with FROG2. Detecting names Machine learned detection using POS and capitalization3. Linking to Wikipedia with ILPS tools Mussert Anton Mussert Avondklok Spertijd Nationale Padvindersraad Padvinder
    11. 11. Verrijkt Koninkrijk and Linked Data (the VUA part)
    12. 12. What is Linked Open Data •Open data is about open licenses •Linked (Open) Data is about interoperability``a term used to describe arecommended best practice forexposing, sharing, and connectingpieces of data, information,and knowledge on the SemanticWeb using URIs and RDF.’’ --Wikipedia ``Sharable, spreadable and nerd- friendly’’ -- Charlotte S H Jensen, kulturweb
    13. 13. Web of Documents (WWW)Linked Documents
    14. 14. Web of DataLinked Data
    15. 15. “Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
    16. 16. Linked Data:NIOD and VK bbwo2:plaatje1.jpg 4en5mei:Avonklok 4en5mei:monumentX “Spertijd” niod:Avondklok Dbpedia:Avondklok VK:paragraaf 1.2.3.4 DBPedia:Curfew
    17. 17. Niod Named Entity Back of thethesaurus Results Book-index Verrijkt Koninkrijk
    18. 18. Niod Named Entity Back of thethesaurus Results Book-index Verrijkt Koninkrijk
    19. 19. Niodthesaurus NIOD List of terms• Used by NIOD library, Rub Term archive, AV archive 4 Repressie• Externally by 29 institutions Voorlichting Kernwapens - Zie:• 1408 terms: “Civil servants”, Atoomwapens 3 Atoomwapens “Anti-fascism”, “Arrival” 2 Kolonialisme - Zie ook: – 12 ‘categories’: “Law,” Dekolonisatie “Military history”, 8 Religie - Zie ook bij soorten “Countries”, etc. afzonderlijk, bijv.: Christendom
    20. 20. Niod Niod Thesaurus (SKOS)thesaurus niod:Uitrusting niod:Gasmaskers niod:Transport Preferred: “Transport” Alternative: “Vracht” Niod termenlijst (XML)
    21. 21. Back of the Back-of-the-Book Index (SKOS) Book-index botb:Amsterdam niod:botb-Blitzkrieg niod:botb-Blitzkrieg botb:Blitzkrieghttp://resolver.verrijktkoninkrijk.nl/nl.vk.d.reg.4.1386
    22. 22. Named Entities (SKOS) Named Entity Results entity:Maassluis entity:Amsterdam niod:botb-Blitzkrieg niod:botb-Blitzkrieg entity:Abraham Kuijperhttp://resolver.verrijktkoninkrijk.nl/nl.vk.d.reg.4.1386
    23. 23. Linked Data Niod Named Entity Back of the thesaurus Results Book-index Verrijkt Koninkrijk
    24. 24. Niod thesaurusniod:oai_wo2_niod_nl_rec_102045 subject niod:Blitzkrieg http://resolver.verrijktkoninkrijk.nl/ nl.vk.d.reg.4.1386 Skos:exactMatch hasParRef niod:botb-Blitzkrieg Koninkrijk Back-of-the-Book Index
    25. 25. GTAA thesaurus gtaa:Oorlog Niod thesaurus subject Niod:Oorlog niod:Blitzkrieg sameAs http://resolver.verrijktkoninkrijk.nl/ nl.vk.d.reg.4.1386Koninkrijk Back-of-the-Book Index
    26. 26. dbpedia:Minister-President dbpedia:Barend Biesheuvel dbpedia:Abraham Kuijperentity:Barend Biesheuvel Entity:Abraham Kuijper Koninkrijk
    27. 27. Geonames:Zuid-Holland 32780 Geonames:Maassluis population coordinates N 51° 55 24 E 4° 15 0 Botb:MaassluisKoninkrijk
    28. 28. The semantic server
    29. 29. “Give me all BBWO2 images linked to a VK paragraph through a niod thesaurus entity found in the text”PREFIX niod: <http://purl.org/collections/nl/niod/>prefix dc: <http://purl.org/dc/elements/1.1/>PREFIX skos: <http://www.w3.org/2004/02/skos/core#>SELECT DISTINCT *WHERE {?object dc:subject ?subj ; dc:relation ?img .?subj skos:inScheme niod:ConceptScheme.?subj skos:exactMatch ?bc.?bc skos:inScheme niod:EntityScheme.?bc niod:pRef ?pRef.}limit 100
    30. 30. “What placenames occur on which page and to which province do they belong”PREFIX niod: <http://purl.org/collections/nl/niod/>PREFIX skos: <http://www.w3.org/2004/02/skos/core#>SELECT ?pl ?provname ?prefWHERE{?s skos:inScheme niod:BotBScheme.?s skos:prefLabel ?pl.?s skos:closeMatch ?geo.?geo <http://www.geonames.org/ontology#parentADM1>?prov.?prov <http://www.geonames.org/ontology%23name>?provname.?s niod:pageRef ?pref.}LIMIT 100
    31. 31. “Give me all occurrences of Prime Ministers in Het Koninkrijk”PREFIX dcterms: <http://purl.org/dc/terms/>PREFIX niod: <http://purl.org/collections/nl/niod/>PREFIX skos: <http://www.w3.org/2004/02/skos/core#>PREFIX dbp-prop: <http://nl.dbpedia.org/property/>PREFIX dbp-res: <http://nl.dbpedia.org/resource/>SELECT * WHERE {?entity niod:nerClass niod:nerclass-per;owl:sameAs ?dbpedia_entry;niod:pRef ?pref.?dbpedia_entry dbp-prop:functie dbp-res:Minister-president_van_Nederland.}LIMIT 100
    32. 32. Hackathon Photos from Flickr user HackNY
    33. 33. Some issues• Quality issues – OCR – Named Entity Recognition/Reconcilliation – Linkage• Pillarization question• Acceptability for historical research
    34. 34. ?
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×