Two issues that are key to supporting research are the research data life cycle and the challenges and hindrances for research data sharing. I use the term E-Science interchangeably with E-Research.Two key issues for research data support Jim Gray: e-Science is where IT meets Science
Now lets see how this works….>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>Provide these examples after 3:More data: Metadata content caching is used for optimizing data retrieval queries in wireless networks.Reference: Metadata guided evaluation of resource-constrained queries in content caching based wireless networks.Data sharing: the economic crises over the last decades have been the main stimulator to foster a global infrastructure for the exchange of statistical data and raising awareness for data quality. IMF DQAS. Data life cycle: transparency, science as a social contract but also relevant to verification, replication, documentation and reuse of data.Literatuur: Liu, Zheng, Liu, Wang en Chen. Metadata guided evaluation of resource-constrained queries in content caching based wireless networks. Wireless networks, 17:1833-1850. Springer.
DDI 3 takes the data life cycle as a starting point. --- MOST COMPLEX STANDARD OUT THERE ---
Do we need such complex standards?
Big Data, Cloud Computingloud? Capture? Curation?Tool developmet
Functional perspective on the services that libraries might be willing to provide.See also: E. Harold (IBM column).
1. Pires, C.M., Information infrastructure(s) for the ERA. 2010: Bonn: Knowledge Exchange Strategy Forum, 8 October 2010.
Commissie de Ling verlorenkrediet.Escience: where IT meets scienceEscience: curation, capture, toolingDDI 3. SDMX Complex standard . Why use it ?
e-Science, Research Data and Libaries
LIBER: e-Science Workshop Rob Grim e-Science Coordinator, Tilburg University Executive manager Open Data Foundation (ODaF) December 5th, Bristol 2011
e-Science, Research Data and LibrariesOverview of this presentation:1. Open Data Foundation (ODaF)2. e-Science3. Research Data Life Cycle: Data Documentation Initiative (DDI 3)4. Technology for Statistical Data and Metadata Exchange (SDMX)5. Role of LibrariesMain issue of my talk:• What kind of problems can be solved with metadata management?• How and where can metadata management help libraries to support research?• What sort of data services could libraries develop? LIBER e-Science Workshop 14-12-2011 2
What is ODaF? The Open Data Foundation (ODaF) is a non-profit organizationpromoting the adoption of global metadata standards and the developmentof open-source solutions for the management and use of statistical data. We focus on improving data and metadata accessibility and overallquality in support of research, policy making, and transparency, in thefields of Social, Behavioral and Economic sciences.ODaF is heavily involved in developing and promoting SDMX and DDI 3
Why ODaF? The Open Data Foundation (ODaF) was established to fill a gap in thearea of statistical data and metadata management in Social, Behavioraland Economic sciences (SBE). The adoption of metadata specifications (DC, DDI, SDMX, ISO/IEC 11179, ISO19115) has been impaired by the LACK OF TOOLS and agreed guidelines for their use. Building such tools requires the coordination of strong informationtechnology and cross-domain expertise that is NOT typically a function ofthese agencies. This is not by lack of interest: it is simply not theirmandate, mission or responsibility.
What does ODaF do?1. Support and coordinate the development of open-source tools for management of statistical data and metadata2. Provide technical assistance to agencies for the adoption of metadata specifications, best practices in data management, and capacity building3. Provide access to public metadata collections and registries4. Promote international cooperation and address global issues5. Develop training resources and reference materials6. Provide web-based facilities to foster the dialog between various communities
Adopters/Interest in SDMX 1. European Central Bank (ECB) 2. International Monetary Fund (IMF) 3. United Nations (MDG, WHO, UNESCO) 4. World Bank (WB) 5. UNESCO (Education) 6. > 100 National Statistical Offices (NSO’s)Adopters/Interest in DDI3 1. Australian Bureau of Statistics 2. CESSDA partners 3. OECD 4. Research Data Centers (CentERdata)
e-Science and Research Data1. e-Science is about Digital Curation Machine actionable! Automated Capture Tools Development2. Three characteristics of the “Digital Revolution”: More Data Data Sharing Data Life Cycle3. Metadata management is a critical issue to all of these! LIBER e-Science Workshop 14-12-2011 7
DDI 3 Lifecycle Model LIBER e-Science Workshop 14-12-2011 8
Structure of the General Statistical BusinessProcess Model (GSBPM) Process Phases Sub- processes (Descriptions)Source: Steven Vale, UNECE, 2010
DDI 3 Use Cases• Study design/survey instrumentation• Questionnaire generation/data collection and procesing• Data recoding, aggregation and other processing• Data dissemination/discovery• Archival ingestion/metadata value-add• Question/concept/variable banks• DDI for use within a research project• Capture of metadata regarding data use• Metadata mining for comparison, etc.• Generating instruction packages/presentations LIBER e-Science Workshop
DDI 3 Perspective Media/Press General Public Academic Policy Makers GovernmentSponsors Business Producers Users Archivists Source: Pascal Heus, ODaF
DDI 3 Technical Overview • DDI 3 is composed of several schemas • Use only what you need! • Schemas represent modules, sub-modules (substitutions), reusable, external schemas• archive • instance• comparative • logicalproduct• conceptualcomponent • ncube_recordlayout• datacollection • physicaldataproduct• dataset • physicalinstance• dcelements • proprietary_record_layout (beta)• DDIprofile • reusable• ddi-xhtml11 • simpledc20021212• ddi-xhtml11-model-1 • studyunit• ddi-xhtml11-modules-1 • tabular_ncube_recordlayout• group • xml• inline_ncube_recordlayout • set of xml schemas to support xhtml Source: Arofan Gregory/Wendy Thomas
Data Set Structure:Concepts Stock/FlowCountry Unit Multiplier Unit Time/Frequency Computers need structure of data •Concepts •Code lists Topicvalues •Data •How these fit together
Data Makes Sense Q,ZA,B,1,1999-06-30=16547 Quarterly, South Africa, Bank Loans, Stocks, for 30 June 1999 16457
Libraries and Research Data Involvement Four key areas of activity: 1. Data Availability 2. Data Discovery Services 3. Access and Accessibility 4. Delivery Services LIBER e-Science Workshop 14-12-2011 37
Data Availability Data Discovery Access and Delivery AccessibilityRegistries Research data Metadata Enhanced portals management tools Publications (distributed access, secured access to data structures)Data Archiving Subject repositories Research Data Data Publications(Repositories) Warehousing and Data JournalsCollection building Resource Data Curation Supplementary(application of Aggregation materialsontologies) + (Disciplinary) “Dark Archive Materials”Locally produced or Metadata Data Security and Data Disseminationreused research Mining Data Privacydata (“mash ups”) Digital Rights Management (DRM) LIBER e-Science Workshop 14-12-2011 38
Library and IT Services,Tilburg University1. Research data services: registering, archiving, accessibility2. Link publications, research data and supplementary materials3. Data discovery services: subject portals European Values Study4. Lobby to value research data as scientific output5. Lobby for a generally adopted research data policy LIBER e-Science Workshop 14-12-2011 39
Disclaimer “No one, including NSF is quite sure what is meant by DATA MANAGEMENT Or PLAN.” Christine Borgman (DCC, Chicago, 2010) Thanks for your attention!