Linked Humanities data

2,123 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,123
On SlideShare
0
From Embeds
0
Number of Embeds
1,794
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Linked Humanities data

  1. 1. Linked Humanities Data: The Next Frontier? A Case-Study in Historical Census Data Albert Meroño-Peñuela Knowledge Representation & Reasoning Group 29-10-2012
  2. 2. The Dutch historical censuses (1795-1971)29-10-2012 Linked Humanities Data: The Next Frontier? 2
  3. 3. The Dutch historical censuses (1795-1971)29-10-2012 Linked Humanities Data: The Next Frontier? 3
  4. 4. The Dutch historical censuses (1795-1971)• Population, Houses and Occupation censuses• 507 Excel files• 2,288 tables• 33,283 annotated cells29-10-2012 Linked Humanities Data: The Next Frontier? 4
  5. 5. Heterogeneity: structural29-10-2012 Linked Humanities Data: The Next Frontier? 5
  6. 6. Heterogeneity: semantic• Variable meaning – Plaatselijke indeling / Kom, buiten de kom + Wijk + Naam / Plaats – Variable design (age 14-18, 19-20 vs. 14-15, 16-20)• Variable values – RomschKatholik, RomsKatholic, VaticanChristelijk – Change in municipalities, occupations29-10-2012 Linked Humanities Data: The Next Frontier? 6
  7. 7. (Current) Harmonization• Manually create a (more general) translation table using standard CS – Map occupation literals with HISCO codes – Map municipality literals with AC codes• Cons – Expensive – Detail/specificity loss – Process is non-repeatable29-10-2012 Linked Humanities Data: The Next Frontier? 7
  8. 8. Additional requirements• Errors: non-destructive update of values• Provenance: record who did what, when, why• Datamodel: do not commit to a specific one• Linkage: enrich the dataset by linking it to others (e.g. labour strikes, book publications in NL)• Publication: open data for researchers29-10-2012 Linked Humanities Data: The Next Frontier? 8
  9. 9. Census RDF: arch • RDF Data Cube Vocabulary (cell data) • D2S Vocabulary (layout data) • Open Annotation Core Data Model (annotation data)29-10-2012 Linked Humanities Data: The Next Frontier? 9
  10. 10. Census RDF: cell data29-10-2012 Linked Humanities Data: The Next Frontier? 10
  11. 11. Census RDF: layout data29-10-2012 Linked Humanities Data: The Next Frontier? 11
  12. 12. Census RDF: annotation data29-10-2012 Linked Humanities Data: The Next Frontier? 12
  13. 13. Querying the RDF’d census29-10-2012 Linked Humanities Data: The Next Frontier? 13
  14. 14. Not ready-to-publish RDF• Disconnected graphs (but 279,136 possible variable mappings!)• Complex & non-homogeneous SPARQL queries• Contradictory annotation statements• Drifted concepts – Tile settler -> roof repairer – Shoemaker (works with leather) -> shoemaker (owns a company)29-10-2012 Linked Humanities Data: The Next Frontier? 14
  15. 15. New challenges• Dynamic ontologies – Different concept formalizations depending on the time frame – Subjective definitions (contested concepts)• Partitions and counting – Cannot merge counts of non aligned concepts – Infer individuals?• Format round-tripping – On-demand XLS, CSV, RDF, RDB conversions with(out) data loss29-10-2012 Linked Humanities Data: The Next Frontier? 15
  16. 16. Thank you!Questions, suggestions? http://cedar-project.nl/http://www.data2semantics.org/

×