DDI Discovery Vocabulary


DDI Lifecycle – Moving Forward
         21.10.2012 – 26.10.2012



              Thomas Bosch

                  M.Sc. (TUM)
             postgraduate student
     http://boschthomas.blogspot.com
GESIS - Leibniz Institute for the Social Sciences
Agenda




         2
Why DDI as Linked Data?

• Currently no such ontology available
• To increase visibility of data holdings using mainstream Web
  technologies
• To open DDI to the Linked Data community
• To process DDI-RDF by RDF tools
• To link DDI-RDF to other RDF data
• To better identify opportunities for merging datasets
• To enable inferencing
• To research microdata within the LOD cloud


                                                                 3
How was the DDI Ontology developed?

• DDI subset
   • of the most important DDI elements
• Use cases
   • Experts in the statistics domain formulated use cases which are seen
     as most significant to solve frequent problems
   • Most important use case: discover microdata connected with multiple
     studies
• Leverage existing DDI-XML docs to DDI-RDF automatically
   • Direct mapping
   • Generic mapping (Bosch and Mathiak, 2011)



                                                                            4
Discovery Use Case
•   Which studies are connected with a specific coverage consisting of the 3
    dimensions: time, country, and subject?
•   What questions with a specific question text are contained in the study
    questionnaire?
•   What questions are connected with a concept with a specific label?
•   What questions are combined with a variable with an associated coverage
    consisting of the 3 dimensions time, country, and subject?
•   What concepts are linked to particular variables or questions?
•   What representation does a specific variable have?
•   What codes and what categories are part of this representation?
•   What variable label does a variable with a particular variable name have?
•   What‘s the maximum value of a certain variable?
•   What are the absolute and relative frequencies of a specific code?
•   What data files contain the entire dataset?
                                                                                5
6
study | coverage




                   7
8
instrument | question | concept




                                  9
10
11
values | value labels




                        12
13
14
variable | descriptive statistics




                                    15
16
17
logical dataset | dataset | data file




                                        18
19
20
conceptual model




                   21
22
Thank you for your attention!




                                23

2012.10 - DDI Lifecycle - Moving Forward

  • 1.
    DDI Discovery Vocabulary DDILifecycle – Moving Forward 21.10.2012 – 26.10.2012 Thomas Bosch M.Sc. (TUM) postgraduate student http://boschthomas.blogspot.com GESIS - Leibniz Institute for the Social Sciences
  • 2.
  • 3.
    Why DDI asLinked Data? • Currently no such ontology available • To increase visibility of data holdings using mainstream Web technologies • To open DDI to the Linked Data community • To process DDI-RDF by RDF tools • To link DDI-RDF to other RDF data • To better identify opportunities for merging datasets • To enable inferencing • To research microdata within the LOD cloud 3
  • 4.
    How was theDDI Ontology developed? • DDI subset • of the most important DDI elements • Use cases • Experts in the statistics domain formulated use cases which are seen as most significant to solve frequent problems • Most important use case: discover microdata connected with multiple studies • Leverage existing DDI-XML docs to DDI-RDF automatically • Direct mapping • Generic mapping (Bosch and Mathiak, 2011) 4
  • 5.
    Discovery Use Case • Which studies are connected with a specific coverage consisting of the 3 dimensions: time, country, and subject? • What questions with a specific question text are contained in the study questionnaire? • What questions are connected with a concept with a specific label? • What questions are combined with a variable with an associated coverage consisting of the 3 dimensions time, country, and subject? • What concepts are linked to particular variables or questions? • What representation does a specific variable have? • What codes and what categories are part of this representation? • What variable label does a variable with a particular variable name have? • What‘s the maximum value of a certain variable? • What are the absolute and relative frequencies of a specific code? • What data files contain the entire dataset? 5
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
    values | valuelabels 12
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
    logical dataset |dataset | data file 18
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
    Thank you foryour attention! 23