SlideShare a Scribd company logo
SPARQLasto
Auke Rijpma (UU)
(CC-BY-SA)
DH BeNeLux 2017
Utrecht University
Clariah datahub example
• Try to construct some queries to get a feel for
interacting with Clariah Structured Data Hub.
• Use Catasto, famous dataset, made by David Herlihy
and Christiane Klapisch-Zuber.
• Fiscal census for 1427 Tuscany, covering 60k+
households and 270k+ individuals.
• Covering such fiscal matters as asset ownership,
occupations, etc., but also some basic demographic
information.
6-812
76
SAMPLE CODING FORM
Ser . Hold No. Loc. Name Fat-er's Farii v
3 7 12 2^ 32
Source :
Vol. Pp. K H A I Oc . Inv. Puhiic Total Deduct . Tax
42 45- 48 52 55 60 65 71 76
Ilt3' -
Ser. & Hhoid No . Me—triers
(1-6) Cd.
As above. 7 9 16 30 37
1_6 0l ~ Io, ~
44 51 5S 65 - 72
1 _1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_ I_1_1_I_1_1_1_1_1_1_1_1_1_1_1_ !
Ser. Hhold No. Loc. Name Fathers Famil y
1 3 7 12 22 ?2
Iv l~l_I_1_1_1~1~1JID ;7 L D ., IQ •. E,N2, o ; _1_ ,_ B,~' A,N~,U ~C1~1~,_1 _1 _'_1_1_1_+_1_1_ i
Source :
Vol. Pp. K -H A I 0c: Inv. Public Total Deduct. Ta x
42 45 48 52 55 60 65 71 7 6
!~,8,_I$ I l ,_,_,_,_!_,__ 1_11 R.!_1_I_I1$ _1__°
•
Ser. & Hhold No . Members
(1-6) Cd.
As above . 7 9 16 23 30 3
7d451 58 65 72
_+_,_ ,
1_I_1_I_1_1_I_1_1_1_1_I_1_I_I_I_I_I_1_ I _I_ 1
Ser. Hhold No. Loc. Name Father's Family
1 3 7 12 22 32
ID,b ;_,_1_I_i ~lal`_~,~ :~ ;N1I4,Ni~/,1,_,_,_,_,_ iG,A .,t!',ZI~!;_i_1_1_1_1_1_1_,_1_1_1_1_1_1 _
Source :
Vol. Pp. K H A I Oc. Inv. Public Total Deduct. Tax
42 45 48 52 55 60 65 71 76 - -
111C 11i 8 ,` 1_ ;_1A _
Ser. & Hhold No. Members
(1-6) Cd.
As above . 7 9 16 23 30 37
ii 1' I ~I J 1 01LI_i~i3101 e1 r_ 2 e.L2,6 :_2. 1 l,_1_•_1_,_I_r—, _
44 51 ' 58 65 7 2
I_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_ 1 _1_{_1_1_1_1_ 1
75
Catasto datasets
• Early versions error-prone fwf files
• More recent version offer tabular data
• Mix of household and individual data in rows:
need to know whether e.g. A11 will exist for a
given household.
• Early versions strictly numeric except hhh-names.
• Hard to browse, interpret results.
Catasto as linked data
• New datamodel:
• individuals (rdf:type) inHousehold household
• observations (age, occupation, sex, marital
status, relation to head) for individuals
• households householdMember individual
• observations (fiscal, occupation, house)
• Codebook included using prefLabel
Browse
• Find links and other long, hard-to-type things at
goo.gl/pwnTZo.
• Browse the new data at <http://
data.socialhistory.org/resource/catasto/household/
2222>
• Try to find some individuals there.
• Try to find the meaning of the codes of a variable
like METIER (occupation) or maritalStatus.
SPARQL and triples
• Basic unit in linked data and linked data (SPARQL) queries is
the triple.
• subject - predicate -object
• So here for example:
• individual - age - 75
• household privateInvestments - 5000
• household(head) - occupation - Barbiere
• individual:4_11 inHousehold household:4
SPARQL and triples
• SPARQL queries are made with similar triple statements.
• Statement is either a URI: <http://…/…>
• Or a literal: “something”
• Place a question-mark ? to allow part of the statement to
be anything.
• Specify part of the statement as URI or Literal to fix it.
• FROM specifies the named graph where the statements
are in.
Query basics
• The basic starting query asks for all triples by
entering all three parts of the statement as variable.
• SELECT * to select all
• ?sub ?pred ?obj
• LIMIT 10 to go easy on the server.
• http://yasgui.org/short/rkQeY_vEZ
Query basics: DISTINCT
• Putting DISTINCT after SELECT gives the unique
results; get rid of duplicates.
• write a query to see all the predicates in the Catasto:
• http://yasgui.org/short/ry8iLdPNb
• write a query to see all the possible codes for the
METIER predicate
• http://yasgui.org/short/SytvcOD4W
Query basics: PREFIXes
• Writing our URIs all the time isn’t fun and prone to errors.
• Make your life easier by adding prefixes.
• PREFIX name: <uri goes here>
• Usage in the query is name:FINAL_BIT_OF_STATEMENT.
• Replace everything before “METIER” in previous query
by a sensible prefix.
• http://yasgui.org/short/S1SYjOwNb
Query basics: PREFIXes
• Useful prefixes for today:
• rdf (pre-added)
• skos (simple knowledge organisation scheme)
• Yasgui autocompletes prefixes it knows.
• catasto:
• <http://data.socialhistory.org/resource/catasto/>
• catdim:
• <http://data.socialhistory.org/resource/catasto/dimension/>
Query basics: summarise
• Add COUNT after SELECT to count how often a
statement in a triple exists in the data.
• Automatically grouped by other variables in the query.
• Can also add GROUP BY at the end to
• Count the number of household (heads) in each
occupational category.
• http://yasgui.org/short/HyCsnuvVb
Codebook access
• Codebook is integrated part of data.
• Explore with skos:prefLabel
• Because Clariah-hub uses CSVW-standard, each
file has its own unique graph.
• Either add graph names (there are a lot!) or remove
the FROM statement to search the entire hub.
Ordering results
• Use ORDER BY or ORDER BY DESC() at the end of
the query to sort the results.
• Place the previous results in a sensible order
• http://yasgui.org/short/BJzFetvEb
Codebook access
• Careful! Need some sort of triple statement that limits it to
the right graphs or you’ll be flooded with results.
• Do limit 100 for safety as well.
• Add meaningful labels to the occupation count query.
• To do this, you’ll need to add a query line.
• Queries with multiple query lines requires the lines to end
with a dot.
• http://yasgui.org/short/rkeLktDNZ
Your turn
• Now build something from the ground up.
• Get the ages for individuals (use limit 10 at first).
• http://yasgui.org/short/rJZe-KDEb
• Then make a population distribution:
• http://yasgui.org/short/rkErbKwEZ
Your turn
• Use catasto/dimension:relationToHead (not actually to head) and
catasto/dimension:sex (explore using brwsr) to find couples in the
catasto.
• Calculate the age difference between them
• http://yasgui.org/short/rJgIcFPNZ
• What do you notice?
• Can you extend the query to see if this varies by socio-economic group?
• http://yasgui.org/short/BkMA9YP4Z
• http://yasgui.org/short/rkW0V5PEZ (heavy on the browser)

More Related Content

What's hot

File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetFile Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & Parquet
Owen O'Malley
 
Basic Tutorial of Association Mapping by Avjinder Kaler
Basic Tutorial of Association Mapping by Avjinder KalerBasic Tutorial of Association Mapping by Avjinder Kaler
Basic Tutorial of Association Mapping by Avjinder KalerAvjinder (Avi) Kaler
 
Implementing IDR in __alloc_fd()
Implementing IDR in __alloc_fd()Implementing IDR in __alloc_fd()
Implementing IDR in __alloc_fd()
Sandhya Bankar
 
How to Take Advantage of Optimizer Improvements in MySQL 8.0
How to Take Advantage of Optimizer Improvements in MySQL 8.0How to Take Advantage of Optimizer Improvements in MySQL 8.0
How to Take Advantage of Optimizer Improvements in MySQL 8.0
Norvald Ryeng
 
SAS and R Code for Basic Statistics
SAS and R Code for Basic StatisticsSAS and R Code for Basic Statistics
SAS and R Code for Basic Statistics
Avjinder (Avi) Kaler
 
Impetus White Paper- Handling Data Corruption in Elasticsearch
Impetus White Paper- Handling  Data Corruption  in ElasticsearchImpetus White Paper- Handling  Data Corruption  in Elasticsearch
Impetus White Paper- Handling Data Corruption in Elasticsearch
Impetus Technologies
 
Tech Talk - JPA and Query Optimization - publish
Tech Talk  -  JPA and Query Optimization - publishTech Talk  -  JPA and Query Optimization - publish
Tech Talk - JPA and Query Optimization - publish
Gleydson Lima
 
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...shravanthium111
 
How Instacart’s Catalog Flourished While Hyper-Growing (ANT328-S) - AWS re:In...
How Instacart’s Catalog Flourished While Hyper-Growing (ANT328-S) - AWS re:In...How Instacart’s Catalog Flourished While Hyper-Growing (ANT328-S) - AWS re:In...
How Instacart’s Catalog Flourished While Hyper-Growing (ANT328-S) - AWS re:In...
Amazon Web Services
 
1 Installing & getting started with R
1 Installing & getting started with R1 Installing & getting started with R
1 Installing & getting started with R
naroranisha
 
Hashing gt1
Hashing gt1Hashing gt1
Hashing gt1
Gopi Saiteja
 
Efficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databasesEfficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databases
Julian Hyde
 
LATERAL Derived Tables in MySQL 8.0
LATERAL Derived Tables in MySQL 8.0LATERAL Derived Tables in MySQL 8.0
LATERAL Derived Tables in MySQL 8.0
Norvald Ryeng
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Serban Tanasa
 
MySQL 8.0: Common Table Expressions
MySQL 8.0: Common Table ExpressionsMySQL 8.0: Common Table Expressions
MySQL 8.0: Common Table Expressions
oysteing
 

What's hot (15)

File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetFile Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & Parquet
 
Basic Tutorial of Association Mapping by Avjinder Kaler
Basic Tutorial of Association Mapping by Avjinder KalerBasic Tutorial of Association Mapping by Avjinder Kaler
Basic Tutorial of Association Mapping by Avjinder Kaler
 
Implementing IDR in __alloc_fd()
Implementing IDR in __alloc_fd()Implementing IDR in __alloc_fd()
Implementing IDR in __alloc_fd()
 
How to Take Advantage of Optimizer Improvements in MySQL 8.0
How to Take Advantage of Optimizer Improvements in MySQL 8.0How to Take Advantage of Optimizer Improvements in MySQL 8.0
How to Take Advantage of Optimizer Improvements in MySQL 8.0
 
SAS and R Code for Basic Statistics
SAS and R Code for Basic StatisticsSAS and R Code for Basic Statistics
SAS and R Code for Basic Statistics
 
Impetus White Paper- Handling Data Corruption in Elasticsearch
Impetus White Paper- Handling  Data Corruption  in ElasticsearchImpetus White Paper- Handling  Data Corruption  in Elasticsearch
Impetus White Paper- Handling Data Corruption in Elasticsearch
 
Tech Talk - JPA and Query Optimization - publish
Tech Talk  -  JPA and Query Optimization - publishTech Talk  -  JPA and Query Optimization - publish
Tech Talk - JPA and Query Optimization - publish
 
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
 
How Instacart’s Catalog Flourished While Hyper-Growing (ANT328-S) - AWS re:In...
How Instacart’s Catalog Flourished While Hyper-Growing (ANT328-S) - AWS re:In...How Instacart’s Catalog Flourished While Hyper-Growing (ANT328-S) - AWS re:In...
How Instacart’s Catalog Flourished While Hyper-Growing (ANT328-S) - AWS re:In...
 
1 Installing & getting started with R
1 Installing & getting started with R1 Installing & getting started with R
1 Installing & getting started with R
 
Hashing gt1
Hashing gt1Hashing gt1
Hashing gt1
 
Efficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databasesEfficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databases
 
LATERAL Derived Tables in MySQL 8.0
LATERAL Derived Tables in MySQL 8.0LATERAL Derived Tables in MySQL 8.0
LATERAL Derived Tables in MySQL 8.0
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
 
MySQL 8.0: Common Table Expressions
MySQL 8.0: Common Table ExpressionsMySQL 8.0: Common Table Expressions
MySQL 8.0: Common Table Expressions
 

Similar to Rijpma's Catasto meets SPARQL dhb2017_workshop

MySQL Optimizer: What's New in 8.0
MySQL Optimizer: What's New in 8.0MySQL Optimizer: What's New in 8.0
MySQL Optimizer: What's New in 8.0
Manyi Lu
 
Neural Architectures for Named Entity Recognition
Neural Architectures for Named Entity RecognitionNeural Architectures for Named Entity Recognition
Neural Architectures for Named Entity Recognition
Rrubaa Panchendrarajan
 
2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload
Prof. Wim Van Criekinge
 
Oracle Database In-Memory Option in Action
Oracle Database In-Memory Option in ActionOracle Database In-Memory Option in Action
Oracle Database In-Memory Option in Action
Tanel Poder
 
In Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry OsborneIn Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry Osborne
Enkitec
 
AAT LOD Microthesauri
AAT LOD MicrothesauriAAT LOD Microthesauri
AAT LOD Microthesauri
Marcia Zeng
 
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Michael Rys
 
Hypertable - massively scalable nosql database
Hypertable - massively scalable nosql databaseHypertable - massively scalable nosql database
Hypertable - massively scalable nosql database
bigdatagurus_meetup
 
2CPP16 - STL
2CPP16 - STL2CPP16 - STL
2CPP16 - STL
Michael Heron
 
Sql 2016 - What's New
Sql 2016 - What's NewSql 2016 - What's New
Sql 2016 - What's New
dpcobb
 
TopNotch: Systematically Quality Controlling Big Data by David Durst
TopNotch: Systematically Quality Controlling Big Data by David DurstTopNotch: Systematically Quality Controlling Big Data by David Durst
TopNotch: Systematically Quality Controlling Big Data by David Durst
Spark Summit
 
Migrating To PostgreSQL
Migrating To PostgreSQLMigrating To PostgreSQL
Migrating To PostgreSQL
Grant Fritchey
 
Shshsjsjsjs-4 - Copdjsjjsjsjsjakakakaaky.pptx
Shshsjsjsjs-4 - Copdjsjjsjsjsjakakakaaky.pptxShshsjsjsjs-4 - Copdjsjjsjsjsjakakakaaky.pptx
Shshsjsjsjs-4 - Copdjsjjsjsjsjakakakaaky.pptx
086ChintanPatel1
 
The Protein Regulatory Networks of COVID-19 - A Knowledge Graph Created by El...
The Protein Regulatory Networks of COVID-19 - A Knowledge Graph Created by El...The Protein Regulatory Networks of COVID-19 - A Knowledge Graph Created by El...
The Protein Regulatory Networks of COVID-19 - A Knowledge Graph Created by El...
Neo4j
 
Inside SQL Server In-Memory OLTP
Inside SQL Server In-Memory OLTPInside SQL Server In-Memory OLTP
Inside SQL Server In-Memory OLTP
Bob Ward
 
How Clean is your database? Data scrubbing for all skills sets
How Clean is your database? Data scrubbing for all skills setsHow Clean is your database? Data scrubbing for all skills sets
How Clean is your database? Data scrubbing for all skills sets
Chad Petrovay
 
Pairwise Coverage-based Testing with Selected Elements in a Query for Databas...
Pairwise Coverage-based Testing with Selected Elements in a Query for Databas...Pairwise Coverage-based Testing with Selected Elements in a Query for Databas...
Pairwise Coverage-based Testing with Selected Elements in a Query for Databas...
Hironori Washizaki
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big Data
Rahul Jain
 
SQL Performance Solutions: Refactor Mercilessly, Index Wisely
SQL Performance Solutions: Refactor Mercilessly, Index WiselySQL Performance Solutions: Refactor Mercilessly, Index Wisely
SQL Performance Solutions: Refactor Mercilessly, Index WiselyEnkitec
 
API-Testing-SOAPUI-1.pptx
API-Testing-SOAPUI-1.pptxAPI-Testing-SOAPUI-1.pptx
API-Testing-SOAPUI-1.pptx
amarnathdeo
 

Similar to Rijpma's Catasto meets SPARQL dhb2017_workshop (20)

MySQL Optimizer: What's New in 8.0
MySQL Optimizer: What's New in 8.0MySQL Optimizer: What's New in 8.0
MySQL Optimizer: What's New in 8.0
 
Neural Architectures for Named Entity Recognition
Neural Architectures for Named Entity RecognitionNeural Architectures for Named Entity Recognition
Neural Architectures for Named Entity Recognition
 
2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload
 
Oracle Database In-Memory Option in Action
Oracle Database In-Memory Option in ActionOracle Database In-Memory Option in Action
Oracle Database In-Memory Option in Action
 
In Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry OsborneIn Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry Osborne
 
AAT LOD Microthesauri
AAT LOD MicrothesauriAAT LOD Microthesauri
AAT LOD Microthesauri
 
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
 
Hypertable - massively scalable nosql database
Hypertable - massively scalable nosql databaseHypertable - massively scalable nosql database
Hypertable - massively scalable nosql database
 
2CPP16 - STL
2CPP16 - STL2CPP16 - STL
2CPP16 - STL
 
Sql 2016 - What's New
Sql 2016 - What's NewSql 2016 - What's New
Sql 2016 - What's New
 
TopNotch: Systematically Quality Controlling Big Data by David Durst
TopNotch: Systematically Quality Controlling Big Data by David DurstTopNotch: Systematically Quality Controlling Big Data by David Durst
TopNotch: Systematically Quality Controlling Big Data by David Durst
 
Migrating To PostgreSQL
Migrating To PostgreSQLMigrating To PostgreSQL
Migrating To PostgreSQL
 
Shshsjsjsjs-4 - Copdjsjjsjsjsjakakakaaky.pptx
Shshsjsjsjs-4 - Copdjsjjsjsjsjakakakaaky.pptxShshsjsjsjs-4 - Copdjsjjsjsjsjakakakaaky.pptx
Shshsjsjsjs-4 - Copdjsjjsjsjsjakakakaaky.pptx
 
The Protein Regulatory Networks of COVID-19 - A Knowledge Graph Created by El...
The Protein Regulatory Networks of COVID-19 - A Knowledge Graph Created by El...The Protein Regulatory Networks of COVID-19 - A Knowledge Graph Created by El...
The Protein Regulatory Networks of COVID-19 - A Knowledge Graph Created by El...
 
Inside SQL Server In-Memory OLTP
Inside SQL Server In-Memory OLTPInside SQL Server In-Memory OLTP
Inside SQL Server In-Memory OLTP
 
How Clean is your database? Data scrubbing for all skills sets
How Clean is your database? Data scrubbing for all skills setsHow Clean is your database? Data scrubbing for all skills sets
How Clean is your database? Data scrubbing for all skills sets
 
Pairwise Coverage-based Testing with Selected Elements in a Query for Databas...
Pairwise Coverage-based Testing with Selected Elements in a Query for Databas...Pairwise Coverage-based Testing with Selected Elements in a Query for Databas...
Pairwise Coverage-based Testing with Selected Elements in a Query for Databas...
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big Data
 
SQL Performance Solutions: Refactor Mercilessly, Index Wisely
SQL Performance Solutions: Refactor Mercilessly, Index WiselySQL Performance Solutions: Refactor Mercilessly, Index Wisely
SQL Performance Solutions: Refactor Mercilessly, Index Wisely
 
API-Testing-SOAPUI-1.pptx
API-Testing-SOAPUI-1.pptxAPI-Testing-SOAPUI-1.pptx
API-Testing-SOAPUI-1.pptx
 

More from Richard Zijdeman

Linked Data: Een extra ontstluitingslaag op archieven
Linked Data: Een extra ontstluitingslaag op archieven Linked Data: Een extra ontstluitingslaag op archieven
Linked Data: Een extra ontstluitingslaag op archieven
Richard Zijdeman
 
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
Richard Zijdeman
 
grlc. store, share and run sparql queries
grlc. store, share and run sparql queriesgrlc. store, share and run sparql queries
grlc. store, share and run sparql queries
Richard Zijdeman
 
Data legend dh_benelux_2017.key
Data legend dh_benelux_2017.keyData legend dh_benelux_2017.key
Data legend dh_benelux_2017.key
Richard Zijdeman
 
Toogdag 2017
Toogdag 2017Toogdag 2017
Toogdag 2017
Richard Zijdeman
 
Historical occupational classification and occupational stratification schemes
Historical occupational classification and occupational stratification schemesHistorical occupational classification and occupational stratification schemes
Historical occupational classification and occupational stratification schemes
Richard Zijdeman
 
Basic introduction into R
Basic introduction into RBasic introduction into R
Basic introduction into R
Richard Zijdeman
 
Labour force participation of married women, US 1860-2010
Labour force participation of married women, US 1860-2010Labour force participation of married women, US 1860-2010
Labour force participation of married women, US 1860-2010
Richard Zijdeman
 
Advancing the comparability of occupational data through Linked Open Data
Advancing the comparability of occupational data through Linked Open DataAdvancing the comparability of occupational data through Linked Open Data
Advancing the comparability of occupational data through Linked Open Data
Richard Zijdeman
 
work in a globalized world
work in a globalized worldwork in a globalized world
work in a globalized world
Richard Zijdeman
 
The Structured Data Hub in 2019
The Structured Data Hub in 2019The Structured Data Hub in 2019
The Structured Data Hub in 2019
Richard Zijdeman
 
Examples of digital history at the IISH
Examples of digital history at the IISHExamples of digital history at the IISH
Examples of digital history at the IISH
Richard Zijdeman
 
Introduction into R for historians (part 4: data manipulation)
Introduction into R for historians (part 4: data manipulation)Introduction into R for historians (part 4: data manipulation)
Introduction into R for historians (part 4: data manipulation)
Richard Zijdeman
 
Introduction into R for historians (part 3: examine and import data)
Introduction into R for historians (part 3: examine and import data)Introduction into R for historians (part 3: examine and import data)
Introduction into R for historians (part 3: examine and import data)
Richard Zijdeman
 
Introduction into R for historians (part 1: introduction)
Introduction into R for historians (part 1: introduction)Introduction into R for historians (part 1: introduction)
Introduction into R for historians (part 1: introduction)
Richard Zijdeman
 
Historical occupational classification and stratification schemes (lecture)
Historical occupational classification and stratification schemes (lecture)Historical occupational classification and stratification schemes (lecture)
Historical occupational classification and stratification schemes (lecture)
Richard Zijdeman
 
Using HISCO and HISCAM to code and analyze occupations
Using HISCO and HISCAM to code and analyze occupationsUsing HISCO and HISCAM to code and analyze occupations
Using HISCO and HISCAM to code and analyze occupations
Richard Zijdeman
 
Csdh sbg clariah_intr01
Csdh sbg clariah_intr01Csdh sbg clariah_intr01
Csdh sbg clariah_intr01
Richard Zijdeman
 

More from Richard Zijdeman (18)

Linked Data: Een extra ontstluitingslaag op archieven
Linked Data: Een extra ontstluitingslaag op archieven Linked Data: Een extra ontstluitingslaag op archieven
Linked Data: Een extra ontstluitingslaag op archieven
 
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
 
grlc. store, share and run sparql queries
grlc. store, share and run sparql queriesgrlc. store, share and run sparql queries
grlc. store, share and run sparql queries
 
Data legend dh_benelux_2017.key
Data legend dh_benelux_2017.keyData legend dh_benelux_2017.key
Data legend dh_benelux_2017.key
 
Toogdag 2017
Toogdag 2017Toogdag 2017
Toogdag 2017
 
Historical occupational classification and occupational stratification schemes
Historical occupational classification and occupational stratification schemesHistorical occupational classification and occupational stratification schemes
Historical occupational classification and occupational stratification schemes
 
Basic introduction into R
Basic introduction into RBasic introduction into R
Basic introduction into R
 
Labour force participation of married women, US 1860-2010
Labour force participation of married women, US 1860-2010Labour force participation of married women, US 1860-2010
Labour force participation of married women, US 1860-2010
 
Advancing the comparability of occupational data through Linked Open Data
Advancing the comparability of occupational data through Linked Open DataAdvancing the comparability of occupational data through Linked Open Data
Advancing the comparability of occupational data through Linked Open Data
 
work in a globalized world
work in a globalized worldwork in a globalized world
work in a globalized world
 
The Structured Data Hub in 2019
The Structured Data Hub in 2019The Structured Data Hub in 2019
The Structured Data Hub in 2019
 
Examples of digital history at the IISH
Examples of digital history at the IISHExamples of digital history at the IISH
Examples of digital history at the IISH
 
Introduction into R for historians (part 4: data manipulation)
Introduction into R for historians (part 4: data manipulation)Introduction into R for historians (part 4: data manipulation)
Introduction into R for historians (part 4: data manipulation)
 
Introduction into R for historians (part 3: examine and import data)
Introduction into R for historians (part 3: examine and import data)Introduction into R for historians (part 3: examine and import data)
Introduction into R for historians (part 3: examine and import data)
 
Introduction into R for historians (part 1: introduction)
Introduction into R for historians (part 1: introduction)Introduction into R for historians (part 1: introduction)
Introduction into R for historians (part 1: introduction)
 
Historical occupational classification and stratification schemes (lecture)
Historical occupational classification and stratification schemes (lecture)Historical occupational classification and stratification schemes (lecture)
Historical occupational classification and stratification schemes (lecture)
 
Using HISCO and HISCAM to code and analyze occupations
Using HISCO and HISCAM to code and analyze occupationsUsing HISCO and HISCAM to code and analyze occupations
Using HISCO and HISCAM to code and analyze occupations
 
Csdh sbg clariah_intr01
Csdh sbg clariah_intr01Csdh sbg clariah_intr01
Csdh sbg clariah_intr01
 

Recently uploaded

Introduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptxIntroduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptx
zeex60
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
David Osipyan
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
muralinath2
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Studia Poinsotiana
 
nodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptxnodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptx
alishadewangan1
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
fafyfskhan251kmf
 
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiologyBLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
NoelManyise1
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
Areesha Ahmad
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
yusufzako14
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdfMudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
frank0071
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 

Recently uploaded (20)

Introduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptxIntroduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptx
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
 
nodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptxnodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptx
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
 
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiologyBLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdfMudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 

Rijpma's Catasto meets SPARQL dhb2017_workshop

  • 1. SPARQLasto Auke Rijpma (UU) (CC-BY-SA) DH BeNeLux 2017 Utrecht University
  • 2. Clariah datahub example • Try to construct some queries to get a feel for interacting with Clariah Structured Data Hub. • Use Catasto, famous dataset, made by David Herlihy and Christiane Klapisch-Zuber. • Fiscal census for 1427 Tuscany, covering 60k+ households and 270k+ individuals. • Covering such fiscal matters as asset ownership, occupations, etc., but also some basic demographic information.
  • 3. 6-812 76 SAMPLE CODING FORM Ser . Hold No. Loc. Name Fat-er's Farii v 3 7 12 2^ 32 Source : Vol. Pp. K H A I Oc . Inv. Puhiic Total Deduct . Tax 42 45- 48 52 55 60 65 71 76 Ilt3' - Ser. & Hhoid No . Me—triers (1-6) Cd. As above. 7 9 16 30 37 1_6 0l ~ Io, ~ 44 51 5S 65 - 72 1 _1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_ I_1_1_I_1_1_1_1_1_1_1_1_1_1_1_ ! Ser. Hhold No. Loc. Name Fathers Famil y 1 3 7 12 22 ?2 Iv l~l_I_1_1_1~1~1JID ;7 L D ., IQ •. E,N2, o ; _1_ ,_ B,~' A,N~,U ~C1~1~,_1 _1 _'_1_1_1_+_1_1_ i Source : Vol. Pp. K -H A I 0c: Inv. Public Total Deduct. Ta x 42 45 48 52 55 60 65 71 7 6 !~,8,_I$ I l ,_,_,_,_!_,__ 1_11 R.!_1_I_I1$ _1__° • Ser. & Hhold No . Members (1-6) Cd. As above . 7 9 16 23 30 3 7d451 58 65 72 _+_,_ , 1_I_1_I_1_1_I_1_1_1_1_I_1_I_I_I_I_I_1_ I _I_ 1 Ser. Hhold No. Loc. Name Father's Family 1 3 7 12 22 32 ID,b ;_,_1_I_i ~lal`_~,~ :~ ;N1I4,Ni~/,1,_,_,_,_,_ iG,A .,t!',ZI~!;_i_1_1_1_1_1_1_,_1_1_1_1_1_1 _ Source : Vol. Pp. K H A I Oc. Inv. Public Total Deduct. Tax 42 45 48 52 55 60 65 71 76 - - 111C 11i 8 ,` 1_ ;_1A _ Ser. & Hhold No. Members (1-6) Cd. As above . 7 9 16 23 30 37 ii 1' I ~I J 1 01LI_i~i3101 e1 r_ 2 e.L2,6 :_2. 1 l,_1_•_1_,_I_r—, _ 44 51 ' 58 65 7 2 I_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_ 1 _1_{_1_1_1_1_ 1 75
  • 4.
  • 5.
  • 6.
  • 7. Catasto datasets • Early versions error-prone fwf files • More recent version offer tabular data • Mix of household and individual data in rows: need to know whether e.g. A11 will exist for a given household. • Early versions strictly numeric except hhh-names. • Hard to browse, interpret results.
  • 8. Catasto as linked data • New datamodel: • individuals (rdf:type) inHousehold household • observations (age, occupation, sex, marital status, relation to head) for individuals • households householdMember individual • observations (fiscal, occupation, house) • Codebook included using prefLabel
  • 9. Browse • Find links and other long, hard-to-type things at goo.gl/pwnTZo. • Browse the new data at <http:// data.socialhistory.org/resource/catasto/household/ 2222> • Try to find some individuals there. • Try to find the meaning of the codes of a variable like METIER (occupation) or maritalStatus.
  • 10. SPARQL and triples • Basic unit in linked data and linked data (SPARQL) queries is the triple. • subject - predicate -object • So here for example: • individual - age - 75 • household privateInvestments - 5000 • household(head) - occupation - Barbiere • individual:4_11 inHousehold household:4
  • 11. SPARQL and triples • SPARQL queries are made with similar triple statements. • Statement is either a URI: <http://…/…> • Or a literal: “something” • Place a question-mark ? to allow part of the statement to be anything. • Specify part of the statement as URI or Literal to fix it. • FROM specifies the named graph where the statements are in.
  • 12. Query basics • The basic starting query asks for all triples by entering all three parts of the statement as variable. • SELECT * to select all • ?sub ?pred ?obj • LIMIT 10 to go easy on the server. • http://yasgui.org/short/rkQeY_vEZ
  • 13. Query basics: DISTINCT • Putting DISTINCT after SELECT gives the unique results; get rid of duplicates. • write a query to see all the predicates in the Catasto: • http://yasgui.org/short/ry8iLdPNb • write a query to see all the possible codes for the METIER predicate • http://yasgui.org/short/SytvcOD4W
  • 14. Query basics: PREFIXes • Writing our URIs all the time isn’t fun and prone to errors. • Make your life easier by adding prefixes. • PREFIX name: <uri goes here> • Usage in the query is name:FINAL_BIT_OF_STATEMENT. • Replace everything before “METIER” in previous query by a sensible prefix. • http://yasgui.org/short/S1SYjOwNb
  • 15. Query basics: PREFIXes • Useful prefixes for today: • rdf (pre-added) • skos (simple knowledge organisation scheme) • Yasgui autocompletes prefixes it knows. • catasto: • <http://data.socialhistory.org/resource/catasto/> • catdim: • <http://data.socialhistory.org/resource/catasto/dimension/>
  • 16. Query basics: summarise • Add COUNT after SELECT to count how often a statement in a triple exists in the data. • Automatically grouped by other variables in the query. • Can also add GROUP BY at the end to • Count the number of household (heads) in each occupational category. • http://yasgui.org/short/HyCsnuvVb
  • 17. Codebook access • Codebook is integrated part of data. • Explore with skos:prefLabel • Because Clariah-hub uses CSVW-standard, each file has its own unique graph. • Either add graph names (there are a lot!) or remove the FROM statement to search the entire hub.
  • 18. Ordering results • Use ORDER BY or ORDER BY DESC() at the end of the query to sort the results. • Place the previous results in a sensible order • http://yasgui.org/short/BJzFetvEb
  • 19. Codebook access • Careful! Need some sort of triple statement that limits it to the right graphs or you’ll be flooded with results. • Do limit 100 for safety as well. • Add meaningful labels to the occupation count query. • To do this, you’ll need to add a query line. • Queries with multiple query lines requires the lines to end with a dot. • http://yasgui.org/short/rkeLktDNZ
  • 20. Your turn • Now build something from the ground up. • Get the ages for individuals (use limit 10 at first). • http://yasgui.org/short/rJZe-KDEb • Then make a population distribution: • http://yasgui.org/short/rkErbKwEZ
  • 21. Your turn • Use catasto/dimension:relationToHead (not actually to head) and catasto/dimension:sex (explore using brwsr) to find couples in the catasto. • Calculate the age difference between them • http://yasgui.org/short/rJgIcFPNZ • What do you notice? • Can you extend the query to see if this varies by socio-economic group? • http://yasgui.org/short/BkMA9YP4Z • http://yasgui.org/short/rkW0V5PEZ (heavy on the browser)