LIBER: e-Science Workshop
   Rob Grim
   e-Science Coordinator, Tilburg University
   Executive manager Open Data Foundation (ODaF)
   December 5th, Bristol 2011
e-Science, Research Data and Libraries

Overview of this presentation:
1.    Open Data Foundation (ODaF)
2.    e-Science
3.    Research Data Life Cycle: Data Documentation Initiative (DDI 3)
4.    Technology for Statistical Data and Metadata Exchange (SDMX)
5.    Role of Libraries


Main issue of my talk:
•    What kind of problems can be solved with metadata management?
•    How and where can metadata management help libraries to support
     research?
•    What sort of data services could libraries develop?


                          LIBER e-Science Workshop   14-12-2011         2
What is ODaF?
 The Open Data Foundation (ODaF) is a non-profit organization
promoting the adoption of global metadata standards and the development
of open-source solutions for the management and use of statistical data.

 We focus on improving data and metadata accessibility and overall
quality in support of research, policy making, and transparency, in the
fields of Social, Behavioral and Economic sciences.

ODaF is heavily involved in developing and promoting SDMX and DDI 3
Why ODaF?
 The Open Data Foundation (ODaF) was established to fill a gap in the
area of statistical data and metadata management in Social, Behavioral
and Economic sciences (SBE).

    The adoption of metadata specifications (DC, DDI, SDMX, ISO/IEC
     11179, ISO19115) has been impaired by the LACK OF TOOLS and agreed
     guidelines for their use.


 Building such tools requires the coordination of strong information
technology and cross-domain expertise that is NOT typically a function of
these agencies. This is not by lack of interest: it is simply not their
mandate, mission or responsibility.
What does ODaF do?
1. Support and coordinate the development of open-source
   tools for management of statistical data and metadata
2. Provide technical assistance to agencies for the adoption of
   metadata specifications, best practices in data
   management, and capacity building
3. Provide access to public metadata collections and registries
4. Promote international cooperation and address global issues
5. Develop training resources and reference materials
6. Provide web-based facilities to foster the dialog between
   various communities
Adopters/Interest in SDMX
  1. European Central Bank (ECB)
  2. International Monetary Fund (IMF)
  3. United Nations (MDG, WHO, UNESCO)
  4. World Bank (WB)
  5. UNESCO (Education)
  6. > 100 National Statistical Offices (NSO’s)

Adopters/Interest in DDI3
  1. Australian Bureau of Statistics
  2. CESSDA partners
  3. OECD
  4. Research Data Centers (CentERdata)
e-Science and Research Data

1. e-Science is about
      Digital Curation
                                               Machine actionable!
      Automated Capture
      Tools Development
2. Three characteristics of the “Digital Revolution”:
      More Data
      Data Sharing
      Data Life Cycle
3. Metadata management is a critical issue to all of these!




                          LIBER e-Science Workshop   14-12-2011      7
DDI 3 Lifecycle Model




                 LIBER e-Science Workshop   14-12-2011 8
Structure of the General Statistical Business
Process Model (GSBPM)

  Process
  Phases

  Sub-
  processes
  (Descriptions)




Source: Steven Vale, UNECE, 2010
DDI 3 Use Cases

•   Study design/survey instrumentation
•   Questionnaire generation/data collection and procesing
•   Data recoding, aggregation and other processing
•   Data dissemination/discovery
•   Archival ingestion/metadata value-add
•   Question/concept/variable banks
•   DDI for use within a research project
•   Capture of metadata regarding data use
•   Metadata mining for comparison, etc.
•   Generating instruction packages/presentations


                          LIBER e-Science Workshop
DDI 3 Perspective

                                Media/Press
               General Public                    Academic


   Policy Makers
                                                                Government



Sponsors
                                                                          Business




             Producers                                      Users


                                Archivists
                                              Source: Pascal Heus, ODaF
DDI 3 Technical Overview
    • DDI 3 is composed of several schemas
        • Use only what you need!
        • Schemas represent modules, sub-modules
          (substitutions), reusable, external schemas
•     archive                            •   instance
•     comparative                        •   logicalproduct
•     conceptualcomponent                •   ncube_recordlayout
•     datacollection                     •   physicaldataproduct
•     dataset                            •   physicalinstance
•     dcelements                         •   proprietary_record_layout (beta)
•     DDIprofile                         •   reusable
•     ddi-xhtml11                        •   simpledc20021212
•     ddi-xhtml11-model-1                •   studyunit
•     ddi-xhtml11-modules-1              •   tabular_ncube_recordlayout
•     group                              •   xml
•     inline_ncube_recordlayout          •   set of xml schemas to support xhtml



                                     Source: Arofan Gregory/Wendy Thomas
Data Set Structure:Concepts
          Stock/Flow
Country
              Unit Multiplier
                      Unit
                                           Time/Frequency


             Computers need structure of data
             •Concepts
             •Code lists
            Topicvalues
             •Data
             •How these fit together
Data Makes Sense
          Q,ZA,B,1,1999-06-30=16547

                   Quarterly, South Africa,
                   Bank Loans, Stocks,
                   for 30 June 1999

           16457
Libraries and Research Data Involvement



   Four key areas of activity:


   1. Data Availability
   2. Data Discovery Services
   3. Access and Accessibility
   4. Delivery Services


                LIBER e-Science Workshop   14-12-2011   37
Data Availability     Data Discovery         Access and             Delivery
                                             Accessibility



Registries            Research data          Metadata               Enhanced
                      portals                management tools       Publications
                                             (distributed access,
                                             secured access to
                                             data structures)
Data Archiving        Subject repositories   Research Data          Data Publications
(Repositories)                               Warehousing            and Data Journals


Collection building   Resource               Data Curation          Supplementary
(application of       Aggregation                                   materials
ontologies) +         (Disciplinary)
                                                                    “Dark Archive
                                                                    Materials”
Locally produced or   Metadata                  Data Security and     Data Dissemination
reused research        Mining                   Data Privacy
data                  (“mash ups”)              Digital Rights
                                                Management (DRM)
                              LIBER e-Science Workshop     14-12-2011              38
Library and IT Services,Tilburg University


1. Research data services: registering, archiving, accessibility
2. Link publications, research data and supplementary materials
3. Data discovery services: subject portals European Values Study
4. Lobby to value research data as scientific output
5. Lobby for a generally adopted research data policy




                        LIBER e-Science Workshop   14-12-2011       39
Disclaimer




    “No one, including NSF is quite sure what is
    meant by DATA MANAGEMENT Or PLAN.”

       Christine Borgman (DCC, Chicago, 2010)

    Thanks for your attention!

e-Science, Research Data and Libaries

  • 1.
    LIBER: e-Science Workshop Rob Grim e-Science Coordinator, Tilburg University Executive manager Open Data Foundation (ODaF) December 5th, Bristol 2011
  • 2.
    e-Science, Research Dataand Libraries Overview of this presentation: 1. Open Data Foundation (ODaF) 2. e-Science 3. Research Data Life Cycle: Data Documentation Initiative (DDI 3) 4. Technology for Statistical Data and Metadata Exchange (SDMX) 5. Role of Libraries Main issue of my talk: • What kind of problems can be solved with metadata management? • How and where can metadata management help libraries to support research? • What sort of data services could libraries develop? LIBER e-Science Workshop 14-12-2011 2
  • 3.
    What is ODaF? The Open Data Foundation (ODaF) is a non-profit organization promoting the adoption of global metadata standards and the development of open-source solutions for the management and use of statistical data.  We focus on improving data and metadata accessibility and overall quality in support of research, policy making, and transparency, in the fields of Social, Behavioral and Economic sciences. ODaF is heavily involved in developing and promoting SDMX and DDI 3
  • 4.
    Why ODaF?  TheOpen Data Foundation (ODaF) was established to fill a gap in the area of statistical data and metadata management in Social, Behavioral and Economic sciences (SBE).  The adoption of metadata specifications (DC, DDI, SDMX, ISO/IEC 11179, ISO19115) has been impaired by the LACK OF TOOLS and agreed guidelines for their use.  Building such tools requires the coordination of strong information technology and cross-domain expertise that is NOT typically a function of these agencies. This is not by lack of interest: it is simply not their mandate, mission or responsibility.
  • 5.
    What does ODaFdo? 1. Support and coordinate the development of open-source tools for management of statistical data and metadata 2. Provide technical assistance to agencies for the adoption of metadata specifications, best practices in data management, and capacity building 3. Provide access to public metadata collections and registries 4. Promote international cooperation and address global issues 5. Develop training resources and reference materials 6. Provide web-based facilities to foster the dialog between various communities
  • 6.
    Adopters/Interest in SDMX 1. European Central Bank (ECB) 2. International Monetary Fund (IMF) 3. United Nations (MDG, WHO, UNESCO) 4. World Bank (WB) 5. UNESCO (Education) 6. > 100 National Statistical Offices (NSO’s) Adopters/Interest in DDI3 1. Australian Bureau of Statistics 2. CESSDA partners 3. OECD 4. Research Data Centers (CentERdata)
  • 7.
    e-Science and ResearchData 1. e-Science is about  Digital Curation Machine actionable!  Automated Capture  Tools Development 2. Three characteristics of the “Digital Revolution”:  More Data  Data Sharing  Data Life Cycle 3. Metadata management is a critical issue to all of these! LIBER e-Science Workshop 14-12-2011 7
  • 8.
    DDI 3 LifecycleModel LIBER e-Science Workshop 14-12-2011 8
  • 9.
    Structure of theGeneral Statistical Business Process Model (GSBPM) Process Phases Sub- processes (Descriptions) Source: Steven Vale, UNECE, 2010
  • 10.
    DDI 3 UseCases • Study design/survey instrumentation • Questionnaire generation/data collection and procesing • Data recoding, aggregation and other processing • Data dissemination/discovery • Archival ingestion/metadata value-add • Question/concept/variable banks • DDI for use within a research project • Capture of metadata regarding data use • Metadata mining for comparison, etc. • Generating instruction packages/presentations LIBER e-Science Workshop
  • 11.
    DDI 3 Perspective Media/Press General Public Academic Policy Makers Government Sponsors Business Producers Users Archivists Source: Pascal Heus, ODaF
  • 12.
    DDI 3 TechnicalOverview • DDI 3 is composed of several schemas • Use only what you need! • Schemas represent modules, sub-modules (substitutions), reusable, external schemas • archive • instance • comparative • logicalproduct • conceptualcomponent • ncube_recordlayout • datacollection • physicaldataproduct • dataset • physicalinstance • dcelements • proprietary_record_layout (beta) • DDIprofile • reusable • ddi-xhtml11 • simpledc20021212 • ddi-xhtml11-model-1 • studyunit • ddi-xhtml11-modules-1 • tabular_ncube_recordlayout • group • xml • inline_ncube_recordlayout • set of xml schemas to support xhtml Source: Arofan Gregory/Wendy Thomas
  • 13.
    Data Set Structure:Concepts Stock/Flow Country Unit Multiplier Unit Time/Frequency Computers need structure of data •Concepts •Code lists Topicvalues •Data •How these fit together
  • 14.
    Data Makes Sense Q,ZA,B,1,1999-06-30=16547 Quarterly, South Africa, Bank Loans, Stocks, for 30 June 1999 16457
  • 15.
    Libraries and ResearchData Involvement Four key areas of activity: 1. Data Availability 2. Data Discovery Services 3. Access and Accessibility 4. Delivery Services LIBER e-Science Workshop 14-12-2011 37
  • 16.
    Data Availability Data Discovery Access and Delivery Accessibility Registries Research data Metadata Enhanced portals management tools Publications (distributed access, secured access to data structures) Data Archiving Subject repositories Research Data Data Publications (Repositories) Warehousing and Data Journals Collection building Resource Data Curation Supplementary (application of Aggregation materials ontologies) + (Disciplinary) “Dark Archive Materials” Locally produced or Metadata Data Security and Data Dissemination reused research Mining Data Privacy data (“mash ups”) Digital Rights Management (DRM) LIBER e-Science Workshop 14-12-2011 38
  • 17.
    Library and ITServices,Tilburg University 1. Research data services: registering, archiving, accessibility 2. Link publications, research data and supplementary materials 3. Data discovery services: subject portals European Values Study 4. Lobby to value research data as scientific output 5. Lobby for a generally adopted research data policy LIBER e-Science Workshop 14-12-2011 39
  • 18.
    Disclaimer “No one, including NSF is quite sure what is meant by DATA MANAGEMENT Or PLAN.” Christine Borgman (DCC, Chicago, 2010) Thanks for your attention!

Editor's Notes

  • #3 Two issues that are key to supporting research are the research data life cycle and the challenges and hindrances for research data sharing. I use the term E-Science interchangeably with E-Research.Two key issues for research data support Jim Gray: e-Science is where IT meets Science
  • #8 Now lets see how this works….>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>Provide these examples after 3:More data: Metadata content caching is used for optimizing data retrieval queries in wireless networks.Reference: Metadata guided evaluation of resource-constrained queries in content caching based wireless networks.Data sharing: the economic crises over the last decades have been the main stimulator to foster a global infrastructure for the exchange of statistical data and raising awareness for data quality. IMF DQAS. Data life cycle: transparency, science as a social contract but also relevant to verification, replication, documentation and reuse of data.Literatuur: Liu, Zheng, Liu, Wang en Chen. Metadata guided evaluation of resource-constrained queries in content caching based wireless networks. Wireless networks, 17:1833-1850. Springer.
  • #9 DDI 3 takes the data life cycle as a starting point. --- MOST COMPLEX STANDARD OUT THERE --- 
  • #10 Do we need such complex standards?
  • #15 Big Data, Cloud Computingloud? Capture? Curation?Tool developmet
  • #38 Functional perspective on the services that libraries might be willing to provide.See also: E. Harold (IBM column).
  • #39 Excluding: Licensed contentStructured dataDiscovery LOD
  • #40 Mention: projects ODAP, EO, CARDS
  • #46  1. Pires, C.M., Information infrastructure(s) for the ERA. 2010: Bonn: Knowledge Exchange Strategy Forum, 8 October 2010.  
  • #49 Commissie de Ling verlorenkrediet.Escience: where IT meets scienceEscience: curation, capture, toolingDDI 3. SDMX Complex standard . Why use it ?