Linked Humanities data
Upcoming SlideShare
Loading in...5
×
 

Linked Humanities data

on

  • 1,684 views

 

Statistics

Views

Total Views
1,684
Views on SlideShare
229
Embed Views
1,455

Actions

Likes
0
Downloads
4
Comments
0

6 Embeds 1,455

http://www.data2semantics.org 1437
http://abtasty.com 9
http://www.newsblur.com 4
http://translate.googleusercontent.com 3
http://webcache.googleusercontent.com 1
http://feedly.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Linked Humanities data Linked Humanities data Presentation Transcript

    • Linked Humanities Data: The Next Frontier? A Case-Study in Historical Census Data Albert Meroño-Peñuela Knowledge Representation & Reasoning Group 29-10-2012
    • The Dutch historical censuses (1795-1971)29-10-2012 Linked Humanities Data: The Next Frontier? 2
    • The Dutch historical censuses (1795-1971)29-10-2012 Linked Humanities Data: The Next Frontier? 3
    • The Dutch historical censuses (1795-1971)• Population, Houses and Occupation censuses• 507 Excel files• 2,288 tables• 33,283 annotated cells29-10-2012 Linked Humanities Data: The Next Frontier? 4
    • Heterogeneity: structural29-10-2012 Linked Humanities Data: The Next Frontier? 5
    • Heterogeneity: semantic• Variable meaning – Plaatselijke indeling / Kom, buiten de kom + Wijk + Naam / Plaats – Variable design (age 14-18, 19-20 vs. 14-15, 16-20)• Variable values – RomschKatholik, RomsKatholic, VaticanChristelijk – Change in municipalities, occupations29-10-2012 Linked Humanities Data: The Next Frontier? 6
    • (Current) Harmonization• Manually create a (more general) translation table using standard CS – Map occupation literals with HISCO codes – Map municipality literals with AC codes• Cons – Expensive – Detail/specificity loss – Process is non-repeatable29-10-2012 Linked Humanities Data: The Next Frontier? 7
    • Additional requirements• Errors: non-destructive update of values• Provenance: record who did what, when, why• Datamodel: do not commit to a specific one• Linkage: enrich the dataset by linking it to others (e.g. labour strikes, book publications in NL)• Publication: open data for researchers29-10-2012 Linked Humanities Data: The Next Frontier? 8
    • Census RDF: arch • RDF Data Cube Vocabulary (cell data) • D2S Vocabulary (layout data) • Open Annotation Core Data Model (annotation data)29-10-2012 Linked Humanities Data: The Next Frontier? 9
    • Census RDF: cell data29-10-2012 Linked Humanities Data: The Next Frontier? 10
    • Census RDF: layout data29-10-2012 Linked Humanities Data: The Next Frontier? 11
    • Census RDF: annotation data29-10-2012 Linked Humanities Data: The Next Frontier? 12
    • Querying the RDF’d census29-10-2012 Linked Humanities Data: The Next Frontier? 13
    • Not ready-to-publish RDF• Disconnected graphs (but 279,136 possible variable mappings!)• Complex & non-homogeneous SPARQL queries• Contradictory annotation statements• Drifted concepts – Tile settler -> roof repairer – Shoemaker (works with leather) -> shoemaker (owns a company)29-10-2012 Linked Humanities Data: The Next Frontier? 14
    • New challenges• Dynamic ontologies – Different concept formalizations depending on the time frame – Subjective definitions (contested concepts)• Partitions and counting – Cannot merge counts of non aligned concepts – Infer individuals?• Format round-tripping – On-demand XLS, CSV, RDF, RDB conversions with(out) data loss29-10-2012 Linked Humanities Data: The Next Frontier? 15
    • Thank you!Questions, suggestions? http://cedar-project.nl/http://www.data2semantics.org/