Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Rijpma's Catasto meets SPARQL dhb2017_workshop

25 views

Published on

These slides are from Auke Rijpma who presented the Catasto meets SPARQL workshop. All stuff is in beta, so let us know when something broke (twitter: @rlzijdeman)

Published in: Science
  • DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT (Unlimited) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... Download Full EPUB Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ACCESS WEBSITE for All Ebooks ......................................................................................................................... Download Full PDF EBOOK here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... Download EPUB Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... Download doc Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Rijpma's Catasto meets SPARQL dhb2017_workshop

  1. 1. SPARQLasto Auke Rijpma (UU) (CC-BY-SA) DH BeNeLux 2017 Utrecht University
  2. 2. Clariah datahub example • Try to construct some queries to get a feel for interacting with Clariah Structured Data Hub. • Use Catasto, famous dataset, made by David Herlihy and Christiane Klapisch-Zuber. • Fiscal census for 1427 Tuscany, covering 60k+ households and 270k+ individuals. • Covering such fiscal matters as asset ownership, occupations, etc., but also some basic demographic information.
  3. 3. 6-812 76 SAMPLE CODING FORM Ser . Hold No. Loc. Name Fat-er's Farii v 3 7 12 2^ 32 Source : Vol. Pp. K H A I Oc . Inv. Puhiic Total Deduct . Tax 42 45- 48 52 55 60 65 71 76 Ilt3' - Ser. & Hhoid No . Me—triers (1-6) Cd. As above. 7 9 16 30 37 1_6 0l ~ Io, ~ 44 51 5S 65 - 72 1 _1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_ I_1_1_I_1_1_1_1_1_1_1_1_1_1_1_ ! Ser. Hhold No. Loc. Name Fathers Famil y 1 3 7 12 22 ?2 Iv l~l_I_1_1_1~1~1JID ;7 L D ., IQ •. E,N2, o ; _1_ ,_ B,~' A,N~,U ~C1~1~,_1 _1 _'_1_1_1_+_1_1_ i Source : Vol. Pp. K -H A I 0c: Inv. Public Total Deduct. Ta x 42 45 48 52 55 60 65 71 7 6 !~,8,_I$ I l ,_,_,_,_!_,__ 1_11 R.!_1_I_I1$ _1__° • Ser. & Hhold No . Members (1-6) Cd. As above . 7 9 16 23 30 3 7d451 58 65 72 _+_,_ , 1_I_1_I_1_1_I_1_1_1_1_I_1_I_I_I_I_I_1_ I _I_ 1 Ser. Hhold No. Loc. Name Father's Family 1 3 7 12 22 32 ID,b ;_,_1_I_i ~lal`_~,~ :~ ;N1I4,Ni~/,1,_,_,_,_,_ iG,A .,t!',ZI~!;_i_1_1_1_1_1_1_,_1_1_1_1_1_1 _ Source : Vol. Pp. K H A I Oc. Inv. Public Total Deduct. Tax 42 45 48 52 55 60 65 71 76 - - 111C 11i 8 ,` 1_ ;_1A _ Ser. & Hhold No. Members (1-6) Cd. As above . 7 9 16 23 30 37 ii 1' I ~I J 1 01LI_i~i3101 e1 r_ 2 e.L2,6 :_2. 1 l,_1_•_1_,_I_r—, _ 44 51 ' 58 65 7 2 I_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_ 1 _1_{_1_1_1_1_ 1 75
  4. 4. Catasto datasets • Early versions error-prone fwf files • More recent version offer tabular data • Mix of household and individual data in rows: need to know whether e.g. A11 will exist for a given household. • Early versions strictly numeric except hhh-names. • Hard to browse, interpret results.
  5. 5. Catasto as linked data • New datamodel: • individuals (rdf:type) inHousehold household • observations (age, occupation, sex, marital status, relation to head) for individuals • households householdMember individual • observations (fiscal, occupation, house) • Codebook included using prefLabel
  6. 6. Browse • Find links and other long, hard-to-type things at goo.gl/pwnTZo. • Browse the new data at <http:// data.socialhistory.org/resource/catasto/household/ 2222> • Try to find some individuals there. • Try to find the meaning of the codes of a variable like METIER (occupation) or maritalStatus.
  7. 7. SPARQL and triples • Basic unit in linked data and linked data (SPARQL) queries is the triple. • subject - predicate -object • So here for example: • individual - age - 75 • household privateInvestments - 5000 • household(head) - occupation - Barbiere • individual:4_11 inHousehold household:4
  8. 8. SPARQL and triples • SPARQL queries are made with similar triple statements. • Statement is either a URI: <http://…/…> • Or a literal: “something” • Place a question-mark ? to allow part of the statement to be anything. • Specify part of the statement as URI or Literal to fix it. • FROM specifies the named graph where the statements are in.
  9. 9. Query basics • The basic starting query asks for all triples by entering all three parts of the statement as variable. • SELECT * to select all • ?sub ?pred ?obj • LIMIT 10 to go easy on the server. • http://yasgui.org/short/rkQeY_vEZ
  10. 10. Query basics: DISTINCT • Putting DISTINCT after SELECT gives the unique results; get rid of duplicates. • write a query to see all the predicates in the Catasto: • http://yasgui.org/short/ry8iLdPNb • write a query to see all the possible codes for the METIER predicate • http://yasgui.org/short/SytvcOD4W
  11. 11. Query basics: PREFIXes • Writing our URIs all the time isn’t fun and prone to errors. • Make your life easier by adding prefixes. • PREFIX name: <uri goes here> • Usage in the query is name:FINAL_BIT_OF_STATEMENT. • Replace everything before “METIER” in previous query by a sensible prefix. • http://yasgui.org/short/S1SYjOwNb
  12. 12. Query basics: PREFIXes • Useful prefixes for today: • rdf (pre-added) • skos (simple knowledge organisation scheme) • Yasgui autocompletes prefixes it knows. • catasto: • <http://data.socialhistory.org/resource/catasto/> • catdim: • <http://data.socialhistory.org/resource/catasto/dimension/>
  13. 13. Query basics: summarise • Add COUNT after SELECT to count how often a statement in a triple exists in the data. • Automatically grouped by other variables in the query. • Can also add GROUP BY at the end to • Count the number of household (heads) in each occupational category. • http://yasgui.org/short/HyCsnuvVb
  14. 14. Codebook access • Codebook is integrated part of data. • Explore with skos:prefLabel • Because Clariah-hub uses CSVW-standard, each file has its own unique graph. • Either add graph names (there are a lot!) or remove the FROM statement to search the entire hub.
  15. 15. Ordering results • Use ORDER BY or ORDER BY DESC() at the end of the query to sort the results. • Place the previous results in a sensible order • http://yasgui.org/short/BJzFetvEb
  16. 16. Codebook access • Careful! Need some sort of triple statement that limits it to the right graphs or you’ll be flooded with results. • Do limit 100 for safety as well. • Add meaningful labels to the occupation count query. • To do this, you’ll need to add a query line. • Queries with multiple query lines requires the lines to end with a dot. • http://yasgui.org/short/rkeLktDNZ
  17. 17. Your turn • Now build something from the ground up. • Get the ages for individuals (use limit 10 at first). • http://yasgui.org/short/rJZe-KDEb • Then make a population distribution: • http://yasgui.org/short/rkErbKwEZ
  18. 18. Your turn • Use catasto/dimension:relationToHead (not actually to head) and catasto/dimension:sex (explore using brwsr) to find couples in the catasto. • Calculate the age difference between them • http://yasgui.org/short/rJgIcFPNZ • What do you notice? • Can you extend the query to see if this varies by socio-economic group? • http://yasgui.org/short/BkMA9YP4Z • http://yasgui.org/short/rkW0V5PEZ (heavy on the browser)

×