SlideShare a Scribd company logo
1 of 16
Linked Humanities Data:
   The Next Frontier?
 A Case-Study in Historical Census Data


            Albert Meroño-Peñuela
  Knowledge Representation & Reasoning Group
                  29-10-2012
The Dutch historical censuses
                     (1795-1971)




29-10-2012           Linked Humanities Data: The Next Frontier?   2
The Dutch historical censuses
                     (1795-1971)




29-10-2012           Linked Humanities Data: The Next Frontier?   3
The Dutch historical censuses
                     (1795-1971)

• Population,
  Houses and
  Occupation
  censuses
• 507 Excel files
• 2,288 tables
• 33,283
  annotated cells

29-10-2012           Linked Humanities Data: The Next Frontier?   4
Heterogeneity: structural




29-10-2012         Linked Humanities Data: The Next Frontier?   5
Heterogeneity: semantic
• Variable meaning
      – Plaatselijke indeling / Kom, buiten de kom + Wijk +
        Naam / Plaats
      – Variable design (age 14-18, 19-20 vs. 14-15, 16-20)
• Variable values
      – RomschKatholik, RomsKatholic, VaticanChristelijk
      – Change in municipalities, occupations



29-10-2012           Linked Humanities Data: The Next Frontier?   6
(Current) Harmonization
• Manually create a (more general) translation
  table using standard CS
      – Map occupation literals with HISCO codes
      – Map municipality literals with AC codes
• Cons
      – Expensive
      – Detail/specificity loss
      – Process is non-repeatable

29-10-2012           Linked Humanities Data: The Next Frontier?   7
Additional requirements
• Errors: non-destructive update of values
• Provenance: record who did what, when, why
• Datamodel: do not commit to a specific one
• Linkage: enrich the dataset by linking it to
  others (e.g. labour strikes, book publications
  in NL)
• Publication: open data for researchers


29-10-2012        Linked Humanities Data: The Next Frontier?   8
Census RDF: arch

   • RDF Data Cube
     Vocabulary (cell data)
   • D2S Vocabulary (layout
     data)

   • Open Annotation Core
     Data Model (annotation
     data)




29-10-2012                Linked Humanities Data: The Next Frontier?   9
Census RDF: cell data




29-10-2012       Linked Humanities Data: The Next Frontier?   10
Census RDF: layout data




29-10-2012        Linked Humanities Data: The Next Frontier?   11
Census RDF: annotation data




29-10-2012          Linked Humanities Data: The Next Frontier?   12
Querying the RDF’d census




29-10-2012         Linked Humanities Data: The Next Frontier?   13
Not ready-to-publish RDF
• Disconnected graphs (but 279,136 possible variable
  mappings!)
• Complex & non-homogeneous SPARQL queries
• Contradictory annotation statements
• Drifted concepts
      – Tile settler -> roof repairer
      – Shoemaker (works with leather) -> shoemaker (owns a
        company)



29-10-2012            Linked Humanities Data: The Next Frontier?   14
New challenges
• Dynamic ontologies
      – Different concept formalizations depending on the
        time frame
      – Subjective definitions (contested concepts)
• Partitions and counting
      – Cannot merge counts of non aligned concepts
      – Infer individuals?
• Format round-tripping
      – On-demand XLS, CSV, RDF, RDB conversions with(out)
        data loss

29-10-2012            Linked Humanities Data: The Next Frontier?   15
Thank you!
Questions, suggestions?

     http://cedar-project.nl/
http://www.data2semantics.org/

More Related Content

Similar to Linked Humanities data

CEDAR & PRELIDA Preservation of Linked Socio-Historical Data
CEDAR & PRELIDA Preservation of Linked Socio-Historical DataCEDAR & PRELIDA Preservation of Linked Socio-Historical Data
CEDAR & PRELIDA Preservation of Linked Socio-Historical DataPRELIDA Project
 
MDST 3703 F10 Seminar 11
MDST 3703 F10 Seminar 11MDST 3703 F10 Seminar 11
MDST 3703 F10 Seminar 11Rafael Alvarado
 
Jordi Martí-Henneberg, Luís Espinha da Silveira, Daniel Alves & Josep Puig,To...
Jordi Martí-Henneberg, Luís Espinha da Silveira, Daniel Alves & Josep Puig,To...Jordi Martí-Henneberg, Luís Espinha da Silveira, Daniel Alves & Josep Puig,To...
Jordi Martí-Henneberg, Luís Espinha da Silveira, Daniel Alves & Josep Puig,To...Universidade Nova de Lisboa
 
Data Models and Query Languages for Linked Geospatial Data
Data Models and Query Languages for Linked Geospatial DataData Models and Query Languages for Linked Geospatial Data
Data Models and Query Languages for Linked Geospatial DataKostis Kyzirakos
 
Data Models and Query Languages for Linked Geospatial Data
Data Models and Query Languages for Linked Geospatial DataData Models and Query Languages for Linked Geospatial Data
Data Models and Query Languages for Linked Geospatial DataKostis Kyzirakos
 
LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data Cube
LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data CubeLSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data Cube
LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data CubeAlbert Meroño-Peñuela
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked DataEUCLID project
 
Linked Open Data in Romania
Linked Open Data in RomaniaLinked Open Data in Romania
Linked Open Data in RomaniaVlad Posea
 
[DCSB] Dr Gabriel Bodard (KCL) “A View on Digital Classics Collaboration: fro...
[DCSB] Dr Gabriel Bodard (KCL) “A View on Digital Classics Collaboration: fro...[DCSB] Dr Gabriel Bodard (KCL) “A View on Digital Classics Collaboration: fro...
[DCSB] Dr Gabriel Bodard (KCL) “A View on Digital Classics Collaboration: fro...Digital Classicist Seminar Berlin
 
Real-time Visualisation of Cultural Heritage and Environmental Archaeology Da...
Real-time Visualisation of Cultural Heritage and Environmental Archaeology Da...Real-time Visualisation of Cultural Heritage and Environmental Archaeology Da...
Real-time Visualisation of Cultural Heritage and Environmental Archaeology Da...Marcus Smith
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph IntroductionSören Auer
 
Decentralized Data Management for the Semantic Web
Decentralized Data Management for the Semantic WebDecentralized Data Management for the Semantic Web
Decentralized Data Management for the Semantic Webhala Skaf
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked DataMarin Dimitrov
 
Intra- and interdisciplinary cross-concordances for information retrieval
Intra- and interdisciplinary cross-concordances for information retrieval Intra- and interdisciplinary cross-concordances for information retrieval
Intra- and interdisciplinary cross-concordances for information retrieval GESIS
 

Similar to Linked Humanities data (20)

CEDAR & PRELIDA Preservation of Linked Socio-Historical Data
CEDAR & PRELIDA Preservation of Linked Socio-Historical DataCEDAR & PRELIDA Preservation of Linked Socio-Historical Data
CEDAR & PRELIDA Preservation of Linked Socio-Historical Data
 
MDST 3703 F10 Seminar 11
MDST 3703 F10 Seminar 11MDST 3703 F10 Seminar 11
MDST 3703 F10 Seminar 11
 
Data Driven Ontology Practices: The Real world objects of Ordnance Survey Ir...
Data Driven Ontology Practices: The Real world objects of  Ordnance Survey Ir...Data Driven Ontology Practices: The Real world objects of  Ordnance Survey Ir...
Data Driven Ontology Practices: The Real world objects of Ordnance Survey Ir...
 
Jordi Martí-Henneberg, Luís Espinha da Silveira, Daniel Alves & Josep Puig,To...
Jordi Martí-Henneberg, Luís Espinha da Silveira, Daniel Alves & Josep Puig,To...Jordi Martí-Henneberg, Luís Espinha da Silveira, Daniel Alves & Josep Puig,To...
Jordi Martí-Henneberg, Luís Espinha da Silveira, Daniel Alves & Josep Puig,To...
 
Data Models and Query Languages for Linked Geospatial Data
Data Models and Query Languages for Linked Geospatial DataData Models and Query Languages for Linked Geospatial Data
Data Models and Query Languages for Linked Geospatial Data
 
POSTDATA: Towards publishing European Poetry as Linked Open Data
POSTDATA: Towards publishing European Poetry as Linked Open DataPOSTDATA: Towards publishing European Poetry as Linked Open Data
POSTDATA: Towards publishing European Poetry as Linked Open Data
 
Data Models and Query Languages for Linked Geospatial Data
Data Models and Query Languages for Linked Geospatial DataData Models and Query Languages for Linked Geospatial Data
Data Models and Query Languages for Linked Geospatial Data
 
LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data Cube
LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data CubeLSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data Cube
LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data Cube
 
Semantic Technologies for Cultural Heritage
Semantic Technologies for Cultural HeritageSemantic Technologies for Cultural Heritage
Semantic Technologies for Cultural Heritage
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
 
Linked Open Data in Romania
Linked Open Data in RomaniaLinked Open Data in Romania
Linked Open Data in Romania
 
[DCSB] Dr Gabriel Bodard (KCL) “A View on Digital Classics Collaboration: fro...
[DCSB] Dr Gabriel Bodard (KCL) “A View on Digital Classics Collaboration: fro...[DCSB] Dr Gabriel Bodard (KCL) “A View on Digital Classics Collaboration: fro...
[DCSB] Dr Gabriel Bodard (KCL) “A View on Digital Classics Collaboration: fro...
 
Real-time Visualisation of Cultural Heritage and Environmental Archaeology Da...
Real-time Visualisation of Cultural Heritage and Environmental Archaeology Da...Real-time Visualisation of Cultural Heritage and Environmental Archaeology Da...
Real-time Visualisation of Cultural Heritage and Environmental Archaeology Da...
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph Introduction
 
Statistical data in RDF
Statistical data in RDFStatistical data in RDF
Statistical data in RDF
 
Decentralized Data Management for the Semantic Web
Decentralized Data Management for the Semantic WebDecentralized Data Management for the Semantic Web
Decentralized Data Management for the Semantic Web
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
 
CBS CEDAR Presentation
CBS CEDAR PresentationCBS CEDAR Presentation
CBS CEDAR Presentation
 
Intra- and interdisciplinary cross-concordances for information retrieval
Intra- and interdisciplinary cross-concordances for information retrieval Intra- and interdisciplinary cross-concordances for information retrieval
Intra- and interdisciplinary cross-concordances for information retrieval
 
Open statistics Belgium
Open statistics BelgiumOpen statistics Belgium
Open statistics Belgium
 

More from Albert Meroño-Peñuela

List.MID: A MIDI-Based Benchmark for RDF Lists
List.MID: A MIDI-Based Benchmark for RDF ListsList.MID: A MIDI-Based Benchmark for RDF Lists
List.MID: A MIDI-Based Benchmark for RDF ListsAlbert Meroño-Peñuela
 
Modelling and Querying Lists in RDF. A Pragmatic Study
Modelling and Querying Lists in RDF. A Pragmatic StudyModelling and Querying Lists in RDF. A Pragmatic Study
Modelling and Querying Lists in RDF. A Pragmatic StudyAlbert Meroño-Peñuela
 
Making social science more reproducible by encapsulating access to linked data
Making social science more reproducible by encapsulating access to linked dataMaking social science more reproducible by encapsulating access to linked data
Making social science more reproducible by encapsulating access to linked dataAlbert Meroño-Peñuela
 
What can I expect from an academic career? Valuable skills
What can I expect from an academic career? Valuable skillsWhat can I expect from an academic career? Valuable skills
What can I expect from an academic career? Valuable skillsAlbert Meroño-Peñuela
 
Automatic Query-Centric API for Routine Access to Linked Data
Automatic Query-Centric API for Routine Access to Linked DataAutomatic Query-Centric API for Routine Access to Linked Data
Automatic Query-Centric API for Routine Access to Linked DataAlbert Meroño-Peñuela
 
One Score To Rule Them All: Semantics in Music Notation
One Score To Rule Them All: Semantics in Music NotationOne Score To Rule Them All: Semantics in Music Notation
One Score To Rule Them All: Semantics in Music NotationAlbert Meroño-Peñuela
 
Repeatable Semantic Queries for the Linked Data Agnostic
Repeatable Semantic Queries for the Linked Data AgnosticRepeatable Semantic Queries for the Linked Data Agnostic
Repeatable Semantic Queries for the Linked Data AgnosticAlbert Meroño-Peñuela
 
The Statistics of Stairway to Heaven: A Semantic Story About Digital Humanities
The Statistics of Stairway to Heaven: A Semantic Story About Digital HumanitiesThe Statistics of Stairway to Heaven: A Semantic Story About Digital Humanities
The Statistics of Stairway to Heaven: A Semantic Story About Digital HumanitiesAlbert Meroño-Peñuela
 
grlc: Bridging the Gap Between RESTful APIs and Linked Data
grlc: Bridging the Gap Between RESTful APIs and Linked Datagrlc: Bridging the Gap Between RESTful APIs and Linked Data
grlc: Bridging the Gap Between RESTful APIs and Linked DataAlbert Meroño-Peñuela
 
grlc Makes GitHub Taste Like Linked Data APIs
grlc Makes GitHub Taste Like Linked Data APIsgrlc Makes GitHub Taste Like Linked Data APIs
grlc Makes GitHub Taste Like Linked Data APIsAlbert Meroño-Peñuela
 
How does a knowledge graph sound like? (or: music is a graph)
How does a knowledge graph sound like? (or: music is a graph)How does a knowledge graph sound like? (or: music is a graph)
How does a knowledge graph sound like? (or: music is a graph)Albert Meroño-Peñuela
 
Non-Temporal Orderings for Extensional Concept Drift
Non-Temporal Orderings for Extensional Concept DriftNon-Temporal Orderings for Extensional Concept Drift
Non-Temporal Orderings for Extensional Concept DriftAlbert Meroño-Peñuela
 
Detecting and Reporting Extensional Concept Drift in Statistical Linked Data
Detecting and Reporting Extensional Concept Drift in Statistical Linked DataDetecting and Reporting Extensional Concept Drift in Statistical Linked Data
Detecting and Reporting Extensional Concept Drift in Statistical Linked DataAlbert Meroño-Peñuela
 

More from Albert Meroño-Peñuela (18)

List.MID: A MIDI-Based Benchmark for RDF Lists
List.MID: A MIDI-Based Benchmark for RDF ListsList.MID: A MIDI-Based Benchmark for RDF Lists
List.MID: A MIDI-Based Benchmark for RDF Lists
 
Modelling and Querying Lists in RDF. A Pragmatic Study
Modelling and Querying Lists in RDF. A Pragmatic StudyModelling and Querying Lists in RDF. A Pragmatic Study
Modelling and Querying Lists in RDF. A Pragmatic Study
 
Making social science more reproducible by encapsulating access to linked data
Making social science more reproducible by encapsulating access to linked dataMaking social science more reproducible by encapsulating access to linked data
Making social science more reproducible by encapsulating access to linked data
 
What can I expect from an academic career? Valuable skills
What can I expect from an academic career? Valuable skillsWhat can I expect from an academic career? Valuable skills
What can I expect from an academic career? Valuable skills
 
The MIDI Linked Data Cloud
The MIDI Linked Data CloudThe MIDI Linked Data Cloud
The MIDI Linked Data Cloud
 
Automatic Query-Centric API for Routine Access to Linked Data
Automatic Query-Centric API for Routine Access to Linked DataAutomatic Query-Centric API for Routine Access to Linked Data
Automatic Query-Centric API for Routine Access to Linked Data
 
One Score To Rule Them All: Semantics in Music Notation
One Score To Rule Them All: Semantics in Music NotationOne Score To Rule Them All: Semantics in Music Notation
One Score To Rule Them All: Semantics in Music Notation
 
Repeatable Semantic Queries for the Linked Data Agnostic
Repeatable Semantic Queries for the Linked Data AgnosticRepeatable Semantic Queries for the Linked Data Agnostic
Repeatable Semantic Queries for the Linked Data Agnostic
 
The Statistics of Stairway to Heaven: A Semantic Story About Digital Humanities
The Statistics of Stairway to Heaven: A Semantic Story About Digital HumanitiesThe Statistics of Stairway to Heaven: A Semantic Story About Digital Humanities
The Statistics of Stairway to Heaven: A Semantic Story About Digital Humanities
 
grlc: Bridging the Gap Between RESTful APIs and Linked Data
grlc: Bridging the Gap Between RESTful APIs and Linked Datagrlc: Bridging the Gap Between RESTful APIs and Linked Data
grlc: Bridging the Gap Between RESTful APIs and Linked Data
 
grlc Makes GitHub Taste Like Linked Data APIs
grlc Makes GitHub Taste Like Linked Data APIsgrlc Makes GitHub Taste Like Linked Data APIs
grlc Makes GitHub Taste Like Linked Data APIs
 
Historical Reasoning on the Web
Historical Reasoning on the WebHistorical Reasoning on the Web
Historical Reasoning on the Web
 
How does a knowledge graph sound like? (or: music is a graph)
How does a knowledge graph sound like? (or: music is a graph)How does a knowledge graph sound like? (or: music is a graph)
How does a knowledge graph sound like? (or: music is a graph)
 
What Is Linked Historical Data?
What Is Linked Historical Data?What Is Linked Historical Data?
What Is Linked Historical Data?
 
Non-Temporal Orderings for Extensional Concept Drift
Non-Temporal Orderings for Extensional Concept DriftNon-Temporal Orderings for Extensional Concept Drift
Non-Temporal Orderings for Extensional Concept Drift
 
Detecting and Reporting Extensional Concept Drift in Statistical Linked Data
Detecting and Reporting Extensional Concept Drift in Statistical Linked DataDetecting and Reporting Extensional Concept Drift in Statistical Linked Data
Detecting and Reporting Extensional Concept Drift in Statistical Linked Data
 
Semantic Web for the Humanities
Semantic Web for the HumanitiesSemantic Web for the Humanities
Semantic Web for the Humanities
 
Linked Census Data
Linked Census DataLinked Census Data
Linked Census Data
 

Recently uploaded

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 

Recently uploaded (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Linked Humanities data

  • 1. Linked Humanities Data: The Next Frontier? A Case-Study in Historical Census Data Albert Meroño-Peñuela Knowledge Representation & Reasoning Group 29-10-2012
  • 2. The Dutch historical censuses (1795-1971) 29-10-2012 Linked Humanities Data: The Next Frontier? 2
  • 3. The Dutch historical censuses (1795-1971) 29-10-2012 Linked Humanities Data: The Next Frontier? 3
  • 4. The Dutch historical censuses (1795-1971) • Population, Houses and Occupation censuses • 507 Excel files • 2,288 tables • 33,283 annotated cells 29-10-2012 Linked Humanities Data: The Next Frontier? 4
  • 5. Heterogeneity: structural 29-10-2012 Linked Humanities Data: The Next Frontier? 5
  • 6. Heterogeneity: semantic • Variable meaning – Plaatselijke indeling / Kom, buiten de kom + Wijk + Naam / Plaats – Variable design (age 14-18, 19-20 vs. 14-15, 16-20) • Variable values – RomschKatholik, RomsKatholic, VaticanChristelijk – Change in municipalities, occupations 29-10-2012 Linked Humanities Data: The Next Frontier? 6
  • 7. (Current) Harmonization • Manually create a (more general) translation table using standard CS – Map occupation literals with HISCO codes – Map municipality literals with AC codes • Cons – Expensive – Detail/specificity loss – Process is non-repeatable 29-10-2012 Linked Humanities Data: The Next Frontier? 7
  • 8. Additional requirements • Errors: non-destructive update of values • Provenance: record who did what, when, why • Datamodel: do not commit to a specific one • Linkage: enrich the dataset by linking it to others (e.g. labour strikes, book publications in NL) • Publication: open data for researchers 29-10-2012 Linked Humanities Data: The Next Frontier? 8
  • 9. Census RDF: arch • RDF Data Cube Vocabulary (cell data) • D2S Vocabulary (layout data) • Open Annotation Core Data Model (annotation data) 29-10-2012 Linked Humanities Data: The Next Frontier? 9
  • 10. Census RDF: cell data 29-10-2012 Linked Humanities Data: The Next Frontier? 10
  • 11. Census RDF: layout data 29-10-2012 Linked Humanities Data: The Next Frontier? 11
  • 12. Census RDF: annotation data 29-10-2012 Linked Humanities Data: The Next Frontier? 12
  • 13. Querying the RDF’d census 29-10-2012 Linked Humanities Data: The Next Frontier? 13
  • 14. Not ready-to-publish RDF • Disconnected graphs (but 279,136 possible variable mappings!) • Complex & non-homogeneous SPARQL queries • Contradictory annotation statements • Drifted concepts – Tile settler -> roof repairer – Shoemaker (works with leather) -> shoemaker (owns a company) 29-10-2012 Linked Humanities Data: The Next Frontier? 14
  • 15. New challenges • Dynamic ontologies – Different concept formalizations depending on the time frame – Subjective definitions (contested concepts) • Partitions and counting – Cannot merge counts of non aligned concepts – Infer individuals? • Format round-tripping – On-demand XLS, CSV, RDF, RDB conversions with(out) data loss 29-10-2012 Linked Humanities Data: The Next Frontier? 15
  • 16. Thank you! Questions, suggestions? http://cedar-project.nl/ http://www.data2semantics.org/