Dutch Ships and Sailors
Victor de Boer - WAI - 17-3-2014
Dutch History = Maritime history
The problem
25+ Maritime datasets; Heterogeneous
• CLARIN Call 4 project (9 mo. – ends april)
– VU Hist (Matthias van Rossum)
– Huygens ING (Jur Leinenga)
– VU CS (me)
• I...
17-3-2014
Datasets on ‘ships’, ‘places’, ‘persons’
VOC Opvarenden
Dutch-Asiatic Shipping
Generale Zeemonsterrollen
Monster...
Matthias van Rossum onderzocht de verhoudingen tussen Europese en
Aziatische zeelieden onder de Verenigde Oost-Indische Co...
Jur Leinenga – Monsterrollen
Noordelijke provincies
Monsterrollen-database 1803-1937:
Monsterrollen zijn bemanningslijsten...
Dutch Ships and Sailors
Why Linked Data?
Why Linked Data?
gz:Mercuur
1782
gz:Buijksloot
gz:Batavia
gz:Claas Roem
voc:Claas Roem
voc:Buijksloot
1752das:Mercuur
das:...
Why Linked Data?
mdb:Persoon
das:Persoon
gzmvoc:Schipper
dss:Person
foaf:Person
mdb:Begunstigde
mdb:Opvarende
Why Linked Data?
mdb:Schip1 mdb:Kof
mdb:scheepsType
das:ShipX das:Kofship
das:typeOfShip
dss:has_shipType
rdfs:subProperty...
mdb:Schip1 mdb:Kof
mdb:scheepsType
das:ShipX das:Kofship
das:typeOfShip
Aat:Kof
Aat:Platbodems
skos:exactMatch
skos:exactM...
Why Linked Data
• Heterogeneous models, one dataformat
– Link what can be linked
• Keep specificity, allow integration at ...
Methods
ClioPatria
XMLRDF
1. XML ingestion (OAI)
2. Direct transformation to ‘crude’ RDF
3. Interactive RDF restructuring
...
Noordelijke Monsterrollen
Model
mdb: aanmonstering-gron_nsm-1868-2
gzmvoc:schip-gron_nsm-1868-2-
Frouwke
gzmvoc: persoon-gron_nsm-1868-2-
Harm_Klaas...
Conversion:
Generale Zeemonsterrollen
Model
gzmvoc:telling-3659-Marsseveen
gzmvoc:schip-3659-Marsseveen
gzmvoc:schipper-3659-Tollen
"NB: Ervaren onderstuurman T...
gzmvoc:telling-7271-Marsseveen
gzmvoc:schip-3659-Marsseveen
"5188 -> F6095"
Marsseveen
Schip
“55 soldaten"
gzmvoc:telling-...
gzmvoc:telling-7271-Marsseveen
gzmvoc:schip-3659-Marsseveen
"5188 -> F6095"
Marsseveen
Schip
“55 soldaten"
gzmvoc:telling-...
Identifying ships – Robin Ponstein
• Identify ships within a dataset
– Based on: name, size, type, destinations etc.
– Bac...
Linking to Historical newspapers - Andrea Bravo Balado
• Using existing data about ships to link
to news items in a collec...
Current status
• Input data set: Noordelijke
Monsterrollen
• “Semi-supervised learning”
– Multiple versions of algorithm
–...
Short demo
http://semanticweb.cs.vu.nl/dss/home
“To do”
• Example application (map)
• Query Interface
• Provenance
– How to represent (un)certainty for graphs?
• Link rec...
DataLab
Questions?
Victor de Boer - WAI - 17-3-2014
Dutch Ships and Sailors Project @ WAI 2014
Dutch Ships and Sailors Project @ WAI 2014
Dutch Ships and Sailors Project @ WAI 2014
Upcoming SlideShare
Loading in …5
×

Dutch Ships and Sailors Project @ WAI 2014

973 views

Published on

VU Weekly AI Meeting (WAI) talk showing the current status of the CLARIN-DSS Dutch Ships and Sailors project.

Published in: Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
973
On SlideShare
0
From Embeds
0
Number of Embeds
13
Actions
Shares
0
Downloads
6
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Dutch Ships and Sailors Project @ WAI 2014

  1. 1. Dutch Ships and Sailors Victor de Boer - WAI - 17-3-2014
  2. 2. Dutch History = Maritime history
  3. 3. The problem 25+ Maritime datasets; Heterogeneous
  4. 4. • CLARIN Call 4 project (9 mo. – ends april) – VU Hist (Matthias van Rossum) – Huygens ING (Jur Leinenga) – VU CS (me) • Inventory of Maritime DBs • Create Linked Data cloud for subset – Link places, persons, ships, concepts, events • Link to KB newspapers • Reusable components Dutch Ships and Sailors
  5. 5. 17-3-2014 Datasets on ‘ships’, ‘places’, ‘persons’ VOC Opvarenden Dutch-Asiatic Shipping Generale Zeemonsterrollen Monsterrollen Noordelijke Scheepvaart Textual historical data on ‘ship movements’, ‘events’ Historische Kranten (KB) DSS data sources
  6. 6. Matthias van Rossum onderzocht de verhoudingen tussen Europese en Aziatische zeelieden onder de Verenigde Oost-Indische Compagnie (1602-1795) erg gelijkwaardig waren. Dat is in scherp contrast met de latere 19de eeuwse situatie, toen Aziatische zeelieden in een ongelijkwaardige en soms onvrijere positie werkten onder slechtere behandeling en beloning. Het werken onder de VOC werd bovendien gekenmerkt door een nuchter multiculturalisme. Matthias van Rossum – Generale Zeemonsterrollen VOC
  7. 7. Jur Leinenga – Monsterrollen Noordelijke provincies Monsterrollen-database 1803-1937: Monsterrollen zijn bemanningslijsten met naam, rang, gage, woonplaats en leeftijd van elke zeeman aan boord, evenals de naam, het type en de grootte van het schip. […] voor Groningen en Friesland ligt het begin pas in de negentiende eeuw. Ze gunnen ons een kijkje in het beroepsleven van de zeeman in de negentiende en begin twintigste eeuw.
  8. 8. Dutch Ships and Sailors
  9. 9. Why Linked Data?
  10. 10. Why Linked Data? gz:Mercuur 1782 gz:Buijksloot gz:Batavia gz:Claas Roem voc:Claas Roem voc:Buijksloot 1752das:Mercuur das:Departure das:Roem, Klaas 19-12-1780 das:Texel das:Arrival 20-7-1781 das:Batavia das:Voyage1 Web of Data
  11. 11. Why Linked Data? mdb:Persoon das:Persoon gzmvoc:Schipper dss:Person foaf:Person mdb:Begunstigde mdb:Opvarende
  12. 12. Why Linked Data? mdb:Schip1 mdb:Kof mdb:scheepsType das:ShipX das:Kofship das:typeOfShip dss:has_shipType rdfs:subPropertyOf rdfs:subPropertyOf
  13. 13. mdb:Schip1 mdb:Kof mdb:scheepsType das:ShipX das:Kofship das:typeOfShip Aat:Kof Aat:Platbodems skos:exactMatch skos:exactMatch skos:exactMatch Why Linked Data?
  14. 14. Why Linked Data • Heterogeneous models, one dataformat – Link what can be linked • Keep specificity, allow integration at project level • Links to other sources: re-use knowledge • Extensible • Allow multiple levels of semantic enrichment/ normalization – through Named Graphs – Provenance
  15. 15. Methods ClioPatria XMLRDF 1. XML ingestion (OAI) 2. Direct transformation to ‘crude’ RDF 3. Interactive RDF restructuring 4. Create a metadata mapping schema 5. Align vocabularies with external sources 6. Publish as Linked Data Amalgame Tools ClioPatria powered by
  16. 16. Noordelijke Monsterrollen
  17. 17. Model mdb: aanmonstering-gron_nsm-1868-2 gzmvoc:schip-gron_nsm-1868-2- Frouwke gzmvoc: persoon-gron_nsm-1868-2- Harm_Klaassens_Heins "1868-01-21" "66" Frouwke Smak Harm Klaassens Heins gzmvoc: persoonscontract-gron_nsm- 1868-2-Harm_Klaassens_Heins "kapitein" 46
  18. 18. Conversion: Generale Zeemonsterrollen
  19. 19. Model gzmvoc:telling-3659-Marsseveen gzmvoc:schip-3659-Marsseveen gzmvoc:schipper-3659-Tollen "NB: Ervaren onderstuurman Thomas Aldermark (Stokholm, 32 g, Meijenberg 1734), derdewaak Pieter Terduijn (Altena, 26 g, Opperdoes 1735)" "5188 -> F6095" Marsseveen Schip Gerrit van der Tollen "21 gemeene zoldaaten"
  20. 20. gzmvoc:telling-7271-Marsseveen gzmvoc:schip-3659-Marsseveen "5188 -> F6095" Marsseveen Schip “55 soldaten" gzmvoc:telling-2881-Eendracht gzmvoc:schipper-2881-Tollen Gerrit v.d. Tollen
  21. 21. gzmvoc:telling-7271-Marsseveen gzmvoc:schip-3659-Marsseveen "5188 -> F6095" Marsseveen Schip “55 soldaten" gzmvoc:telling-2881-Eendracht gzmvoc:schipper-2881-Tollen Gerrit v.d. Tollen ?
  22. 22. Identifying ships – Robin Ponstein • Identify ships within a dataset – Based on: name, size, type, destinations etc. – Background knowledge • Gold standard fabricated by Jur Leinenga • Base line algorithm: 74% • How dataset specific is this task? • Save results as separate graphs, with provenance Date ShipName ShipType ShipSize HomePort CurrentPort Captain 1852-02-27 Alberdiena kof NULL NULL Noorwegen (N) Wolkammer Albert Augustinus 1852-07-31 Alberdina kof NULL Farmsum Friedrichstadt (D) Wolkammer Albert A. 1861-09-30 Alberdina kof 98 NULL Gdansk, Danzig (PL) Wolkammer Albert Augustinus 1870-03-08 Alberdina brik 222 NULL NULL Wolkammer Albert Augustinus 1875-09-22 Alberdina bark 309 NULL Oostzee Wolkammer Augustinus
  23. 23. Linking to Historical newspapers - Andrea Bravo Balado • Using existing data about ships to link to news items in a collection of historical newspapers • Performing limited information extraction to enrich existing records • Features: ship name, time intervals, captain’s names, ship type, named entities, keywords, background knowledge
  24. 24. Current status • Input data set: Noordelijke Monsterrollen • “Semi-supervised learning” – Multiple versions of algorithm – Evaluation done by expert (Jur Leinenga) • Current version: 94% precision, 9.739 records have 1+ links Example: http://purl.org/collections/nl/dss/mdb/aanmonstering-del_gem-1879-101
  25. 25. Short demo http://semanticweb.cs.vu.nl/dss/home
  26. 26. “To do” • Example application (map) • Query Interface • Provenance – How to represent (un)certainty for graphs? • Link records to source images • Infrastructure @ Huygens ING • Link to other VU hist datasources! – DATATHON 2-4-2014!
  27. 27. DataLab Questions? Victor de Boer - WAI - 17-3-2014

×