Leveraging the DDI Model for Linked Statistical Data
 in the Social, Behavioural, and Economic Sciences

           Workshop on Semantic Statistics
                     15.10.2012 – 19.10.2012



                          Thomas Bosch

                              M.Sc. (TUM)
                         postgraduate student
                 http://boschthomas.blogspot.com
            GESIS - Leibniz Institute for the Social Sciences
Agenda




         2
Why DDI as Linked Data?

• Currently no such ontology available
• To increase visibility of data holdings using mainstream Web
  technologies
• To open DDI to the Linked Data community
• To process DDI-RDF by RDF tools
• To link DDI-RDF to other RDF data
• To better identify opportunities for merging datasets
• To enable inferencing
• To research microdata within the LOD cloud


                                                                 3
How was the DDI Ontology developed?

• DDI subset
   • of the most important DDI elements
• Use cases
   • Experts in the statistics domain formulated use cases which are seen
     as most significant to solve frequent problems
   • Most important use case: discover microdata connected with multiple
     studies
• Leverage existing DDI-XML docs to DDI-RDF automatically
   • Direct mapping
   • Generic mapping (Bosch and Mathiak, 2011)



                                                                            4
Discovery Use Case
•   Which studies are connected with a specific coverage consisting of the 3
    dimensions: time, country, and subject?
•   What questions with a specific question text are contained in the study
    questionnaire?
•   What questions are connected with a concept with a specific label?
•   What questions are combined with a variable with an associated coverage
    consisting of the 3 dimensions time, country, and subject?
•   What concepts are linked to particular variables or questions?
•   What representation does a specific variable have?
•   What codes and what categories are part of this representation?
•   What variable label does a variable with a particular variable name have?
•   What‘s the maximum value of a certain variable?
•   What are the absolute and relative frequencies of a specific code?
•   What data files contain the entire dataset?
                                                                                5
6
study | coverage




                   7
8
instrument | question | concept




                                  9
10
11
values | value labels




                        12
13
14
variable | descriptive statistics




                                    15
16
17
logical dataset | dataset | data file




                                        18
19
20
conceptual model




                   21
22
Open Issues
•   DDI Ontology URL and Prefix
•   DC namespace
•   Naming Conventions
•   Cardinalities
•   Consistency Check
•   Universe vs. Coverage
•   DescriptiveStatistics
•   Study Groups
•   Classes
•   Datatype Properties
•   Object Properties
                                    23
Thank you for you attention!




                               24

2012.10 - Workshop on Semantic Statistics - 1

  • 1.
    Leveraging the DDIModel for Linked Statistical Data in the Social, Behavioural, and Economic Sciences Workshop on Semantic Statistics 15.10.2012 – 19.10.2012 Thomas Bosch M.Sc. (TUM) postgraduate student http://boschthomas.blogspot.com GESIS - Leibniz Institute for the Social Sciences
  • 2.
  • 3.
    Why DDI asLinked Data? • Currently no such ontology available • To increase visibility of data holdings using mainstream Web technologies • To open DDI to the Linked Data community • To process DDI-RDF by RDF tools • To link DDI-RDF to other RDF data • To better identify opportunities for merging datasets • To enable inferencing • To research microdata within the LOD cloud 3
  • 4.
    How was theDDI Ontology developed? • DDI subset • of the most important DDI elements • Use cases • Experts in the statistics domain formulated use cases which are seen as most significant to solve frequent problems • Most important use case: discover microdata connected with multiple studies • Leverage existing DDI-XML docs to DDI-RDF automatically • Direct mapping • Generic mapping (Bosch and Mathiak, 2011) 4
  • 5.
    Discovery Use Case • Which studies are connected with a specific coverage consisting of the 3 dimensions: time, country, and subject? • What questions with a specific question text are contained in the study questionnaire? • What questions are connected with a concept with a specific label? • What questions are combined with a variable with an associated coverage consisting of the 3 dimensions time, country, and subject? • What concepts are linked to particular variables or questions? • What representation does a specific variable have? • What codes and what categories are part of this representation? • What variable label does a variable with a particular variable name have? • What‘s the maximum value of a certain variable? • What are the absolute and relative frequencies of a specific code? • What data files contain the entire dataset? 5
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
    values | valuelabels 12
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
    logical dataset |dataset | data file 18
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
    Open Issues • DDI Ontology URL and Prefix • DC namespace • Naming Conventions • Cardinalities • Consistency Check • Universe vs. Coverage • DescriptiveStatistics • Study Groups • Classes • Datatype Properties • Object Properties 23
  • 24.
    Thank you foryou attention! 24