Leveraging the DDI Model for Linked Statistical Data
 in the Social, Behavioural, and Economic Sciences

                                    DC 2012
                                       05.09.2012

                Thomas Bosch                              Richard Cyganiak
    GESIS – Leibniz Institute for the Social   Digital Enterprise Research Institute,
             Sciences, Germany                                 Ireland
          thomas.bosch@gesis.org                        richard@cyganiak.de


             Joachim Wackerow                            Benjamin Zapilko
    GESIS – Leibniz Institute for the Social   GESIS – Leibniz Institute for the Social
             Sciences, Germany                          Sciences, Germany
       joachim.wackerow@gesis.org                  benjamin.zapilko@gesis.org
Agenda




         2
What is DDI?
• DDI (Data Documentation Initiative)
   • Established international standard for the documentation and
     management of data from the social, behavioral, and economic
     sciences
   • Data model for statistical data
• Supports the entire research data lifecycle
• Focus on microdata
• Structured high quality metadata
   • enable secondary analysis without the need to contact the primary
     researcher
• Enables the re-use of metadata of existing studies for
  designing new studies
• Currently specified in XML Schema

                                                                         3
How was the DDI Ontology developed?

• DDI subset
   • of the most important DDI elements
• Use cases
   • Experts in the statistics domain formulated use cases which are seen
     as most significant to solve frequent problems
   • Most important use case: discover microdata connected with multiple
     studies
• Leverage existing DDI-XML docs to DDI-RDF automatically
   • Direct mapping
   • Generic mapping (Bosch and Mathiak, 2011)



                                                                            4
Why DDI as Linked Data?

• Currently no such ontology available
• To increase visibility of data holdings using mainstream Web
  technologies
• To open DDI to the Linked Data community
• To process DDI-RDF by RDF tools
• To link DDI-RDF to other RDF data
• To better identify opportunities for merging datasets
• To enable inferencing
• To research microdata within the LOD cloud


                                                                 5
What other metadata standards
         vocabularies are used?

•   Dublin Core Metadata Element Set, Version 1.1
•   DCMI Metadata Terms
•   SKOS
•   SDMX  RDF Data Cube Vocabulary
•   ISO/IEC 11179
•   ISO 19115




                                                    6
Discovery Use Case
•   Which studies are connected with a specific coverage consisting of the 3
    dimensions: time, country, and subject?
•   What questions with a specific question text are contained in the study
    questionnaire?
•   What questions are connected with a concept with a specific label?
•   What questions are combined with a variable with an associated coverage
    consisting of the 3 dimensions time, country, and subject?
•   What concepts are linked to particular variables or questions?
•   What representation does a specific variable have?
•   What codes and what categories are part of this representation?
•   What variable label does a variable with a particular variable name have?
•   What‘s the maximum value of a certain variable?
•   What are the absolute and relative frequencies of a specific code?
•   What data files contain the entire dataset?
                                                                                7
8
study | coverage




                   9
10
instrument | question | concept




                                  11
12
13
values | value labels




                        14
15
16
variable | descriptive statistics




                                    17
18
19
logical dataset | dataset | data file




                                        20
21
22
conceptual model




                   23
24
Acknowledgements
•   Archana Bidargaddi (NSD - Norwegian Social Science Data Services, Norway)
•   Franck Cotton (INSEE - Institut National de la Statistique et des Études
    Économiques, France)
•   Richard Cyganiak (DERI - Digital Enterprise Research Institute, Ireland)
•   Daniel Gilman (BLS - Bureau of Labor Statistics, USA)
•   Marcel Hebing (SOEP - German Socio-Economic Panel Study, Germany)
•   Larry Hoyle (University of Kansas, USA)
•   Jannik Jensen (DDA - Danish Data Archive, Denmark)
•   Stefan Kramer (CISER - Cornell Institute for Social and Economic Research, USA)
•   Amber Leahey (Scholars Portal Project - University of Toronto, Canada)
•   Abdul Rahim (Metadata Technologies Inc., USA)
•   John Shepherdson (UK Data Archive, UK)
•   Dan Smith (Algenta Technologies Inc., USA)
•   Humphrey Southall (Department of Geography, UK Portsmouth University, UK)
•   Wendy Thomas (MPC - Minnesota Population Center, USA)
•   Johanna Vompras (University Bielefeld Library, Germany)
                                                                                      25
Thank you for you attention!




                               26

DC 2012 - Leveraging the DDI Model for Linked Statistical Data in the Social, Behavioural, and Economic Sciences

  • 1.
    Leveraging the DDIModel for Linked Statistical Data in the Social, Behavioural, and Economic Sciences DC 2012 05.09.2012 Thomas Bosch Richard Cyganiak GESIS – Leibniz Institute for the Social Digital Enterprise Research Institute, Sciences, Germany Ireland thomas.bosch@gesis.org richard@cyganiak.de Joachim Wackerow Benjamin Zapilko GESIS – Leibniz Institute for the Social GESIS – Leibniz Institute for the Social Sciences, Germany Sciences, Germany joachim.wackerow@gesis.org benjamin.zapilko@gesis.org
  • 2.
  • 3.
    What is DDI? •DDI (Data Documentation Initiative) • Established international standard for the documentation and management of data from the social, behavioral, and economic sciences • Data model for statistical data • Supports the entire research data lifecycle • Focus on microdata • Structured high quality metadata • enable secondary analysis without the need to contact the primary researcher • Enables the re-use of metadata of existing studies for designing new studies • Currently specified in XML Schema 3
  • 4.
    How was theDDI Ontology developed? • DDI subset • of the most important DDI elements • Use cases • Experts in the statistics domain formulated use cases which are seen as most significant to solve frequent problems • Most important use case: discover microdata connected with multiple studies • Leverage existing DDI-XML docs to DDI-RDF automatically • Direct mapping • Generic mapping (Bosch and Mathiak, 2011) 4
  • 5.
    Why DDI asLinked Data? • Currently no such ontology available • To increase visibility of data holdings using mainstream Web technologies • To open DDI to the Linked Data community • To process DDI-RDF by RDF tools • To link DDI-RDF to other RDF data • To better identify opportunities for merging datasets • To enable inferencing • To research microdata within the LOD cloud 5
  • 6.
    What other metadatastandards vocabularies are used? • Dublin Core Metadata Element Set, Version 1.1 • DCMI Metadata Terms • SKOS • SDMX  RDF Data Cube Vocabulary • ISO/IEC 11179 • ISO 19115 6
  • 7.
    Discovery Use Case • Which studies are connected with a specific coverage consisting of the 3 dimensions: time, country, and subject? • What questions with a specific question text are contained in the study questionnaire? • What questions are connected with a concept with a specific label? • What questions are combined with a variable with an associated coverage consisting of the 3 dimensions time, country, and subject? • What concepts are linked to particular variables or questions? • What representation does a specific variable have? • What codes and what categories are part of this representation? • What variable label does a variable with a particular variable name have? • What‘s the maximum value of a certain variable? • What are the absolute and relative frequencies of a specific code? • What data files contain the entire dataset? 7
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
    values | valuelabels 14
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
    logical dataset |dataset | data file 20
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
    Acknowledgements • Archana Bidargaddi (NSD - Norwegian Social Science Data Services, Norway) • Franck Cotton (INSEE - Institut National de la Statistique et des Études Économiques, France) • Richard Cyganiak (DERI - Digital Enterprise Research Institute, Ireland) • Daniel Gilman (BLS - Bureau of Labor Statistics, USA) • Marcel Hebing (SOEP - German Socio-Economic Panel Study, Germany) • Larry Hoyle (University of Kansas, USA) • Jannik Jensen (DDA - Danish Data Archive, Denmark) • Stefan Kramer (CISER - Cornell Institute for Social and Economic Research, USA) • Amber Leahey (Scholars Portal Project - University of Toronto, Canada) • Abdul Rahim (Metadata Technologies Inc., USA) • John Shepherdson (UK Data Archive, UK) • Dan Smith (Algenta Technologies Inc., USA) • Humphrey Southall (Department of Geography, UK Portsmouth University, UK) • Wendy Thomas (MPC - Minnesota Population Center, USA) • Johanna Vompras (University Bielefeld Library, Germany) 25
  • 26.
    Thank you foryou attention! 26