Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Advancing the comparability of occupational data through Linked Open Data

108 views

Published on

Occupations are a crucial resource for historical research in a wide variety of fields. This presentation indicates the size of the error that is made when combining data from the two major classification schemes OCCHISCO and HISCO. Next it shows how Linked Data provides a solution to circumvent this and similar issues.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Advancing the comparability of occupational data through Linked Open Data

  1. 1. Richard Zijdeman [richard.zijdeman at iisg.nl] Kathrin Dentler Rinke Hoekstra Albert Meroño-Peñuela Advancing the comparability of occupational data through Linked Open Data HISCO workshop Historical Population Database of Transylvania Cluj, Romania June 18, 2016
  2. 2. ... it is market position, and especially position in the occupational division of labour, which is fundamental to the generation of structured inequalities. The life chances of individuals and families are largely determined by their position in the market and occupation is taken to be its central indicator ... . (Rose and Harrison, 2010) 2
  3. 3. 3 Occupations are important as dependent variables (occupational attainment studies) and independent variables (occupation stratification studies) in educational (and occupational) status attainment, health, voting, consumption, marriage etc. (Ganzeboom, 2008)
  4. 4. Occupations are one of the few indicators of social position that are available in: • large quantities • different time periods • various societies • at the individual level (smallest level of detail) 4
  5. 5. Lack of comparability • Many different occupational classifications • Differences in mobility studies could results from different classification methods (Kaelble 1985) 5 Charles Booth (1886-1903)
  6. 6. HISCO • Historical International Standard Classification of Occupations • Put together by a large number of institutes • Based on ILO’s ISCO ’68 • Occupations retrieved from registers • 1675 occupational codes 6
  7. 7. Current solution: 2-step procedure Code into the concept, first: • Classify into the concept (HISCO) • Link the measure of stratification to the concept (e.g. SOCPO, HISCAM) 7
  8. 8. New problems 1. What concept? • Historical International Standard Classification (HISCO) • OCCHISCO • PST 2. Not all measures link to all concepts • E.g. no link between OCCHISCO and HISCAM 3. Adaptability of concepts (new versions) 8
  9. 9. Is this a substantive problem? Illustrative example: • Subset of SAME occupational titles from NAPP and HISCO • Link these occupations to HISCAM • For HISCO directly provided by HISCAM people • For OCCHISCO indirectly through a mapping 9
  10. 10. 10 occupations OCCHISCO HISCO HISCAMCross- walk E.g.: necessary for a comparison between Norway and the Netherlands
  11. 11. 11
  12. 12. 12
  13. 13. So yes, this is problematic • ‘Lost’ 41% explained variance • Cf. regression models: usually not above 30% • HISCAM often both as dependent and independent variable 13
  14. 14. New problems 1. What concept? • Historical International Standard Classification (HISCO) • OCCHISCO • PST 2. Not all measures link to all concepts • E.g. no link between OCCHISCO and HISCAM 3. Adaptability of concepts (new versions) 14
  15. 15. Towards a solution • Linked Data (Berners-Lee, 2006) • Define Resources (books, respondents, etc.) with a URI • Present URI’s as URL’s • Describe Resources using so called ’triples’ 15
  16. 16. An example of a triple 16 Margaret Miner works as PropertyResource Value
  17. 17. 17 Miner occupation is of type Resource Property Value
  18. 18. 18 Miner occupation is of type Margaret Miner works as
  19. 19. 19 miner 50.56 71105 71120 has occhisco has hisco has hiscam
  20. 20. Occupational title Source PST: 123 OCCHISCO: 123 HISCO: 12345 HISCO: 54321 Was DerivedFrom HISCAM: 88 codedByMappingFile Provenance
  21. 21. 21 HISCO vocabulary
  22. 22. 22 • hisco:entry for ‘occupational titles’ • transitivity between category, unit, minor and major group
  23. 23. Case study: DBpedia - Structured data behind Wikipedia - Information on all kinds of topics, also occupations - Add HISCO codes to DBpedia occupations - Let’s try and do this live: http://yasgui.org/short/VJfZvnx6x 23
  24. 24. Caveats • We did not check the technique on a really big scale (e.g. NAPP data) • Sharing code remains a collective action problem (but less of a coordination problem) 24
  25. 25. Conclusions Linked Data • Enhances comparative occupational research • Adds visibility of heterogeneity in coding practices 25
  26. 26. Outlook • Linkage to texts (occupations in newspapers) • Linkage to public resources: Wikipedia • Combine Machine Learning and Linked Data for automated occupational coding 26
  27. 27. Thank you richard.zijdeman@iisg.nl 27

×