Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

ADA, DDI and the data lifecycle - Steve McEachern - 7 April 2017


Published on

ADA, DDI and the data lifecycle - Steve McEachern - 7 April 2017
Presented at the April 2017 ANDS Tech Talk

Published in: Science
  • Be the first to comment

  • Be the first to like this

ADA, DDI and the data lifecycle - Steve McEachern - 7 April 2017

  1. 1. ADA, DDI and the Data Lifecycle Dr. Steve McEachern Director, ADA Tech Talk April 2017
  2. 2. ADA in Brief • The Social Science Data Archive (now ADA) was set up in 1981, housed in the Research School of Social Sciences at ANU, with a mission to collect and preserve Australian social science data on behalf of the social science research community • The Archive holds over 5000 datasets from around 1500 studies, including national election studies; public opinion polls; social attitudes surveys, censuses, aggregate statistics, administrative data and many other sources. • Data holdings are sourced from academic, government and private sectors.
  3. 3. The Data Documentation Initiative standard
  4. 4. About DDI • A structured metadata specification of and for the community • Two major development lines – XML Schemas – DDI Codebook – DDI Lifecycle • Additional specifications: – Controlled vocabularies – RDF vocabularies for use with Linked Data • Model based version is in development – with serialisations in XML and RDF – Includes support for provenance and process models • Managed by the DDI Alliance –
  5. 5. DDI-Codebook • XML based, first published in 2000 • Four sections: 1. Document description: characteristics of the DDI XML document itself 2. Study description: characteristics of the Study (project) that the DDI is describing (including Related Materials: documents associated with the project, such as questionnaires, codebooks, etc.) 3. File description: characteristics of the physical data files 4. Variable description: characteristics of the variables in the data file
  6. 6. DDI Lifecycle Model 6 Metadata Reuse
  7. 7. Why can DDI Lifecycle do more? • It is machine-actionable – not just documentary • It’s more complex with a tighter structure • It manages metadata objects through a structured identification and reference system that allows sharing between organizations • It has greater support for related standards • Reuse of metadata within the lifecycle of a study and between studies 7
  8. 8. Managing and Depositing Data: ADA and DDI
  9. 9. Approach • Core archive website: – • Sub-archives focussed on specialised thematic or methodological areas - eg. • “Add-on” systems for complex analysis or visualisation tasks: – Nesstar – GIS: – Longitudinal visualisation: Panemalia – Historical census data:
  10. 10. OAIS architecture
  11. 11. Data deposit: ADAPT
  12. 12. Archival processing Manual system with some automation tools 1. Deposit: – Review of ADAPT submission – Storage via ADAPT to file store 2. Data processing: – File format conversion (usually to SPSS for processing) – Privacy/confidentiality review – Data cleaning (in consultation with depositor) 3. Metadata processing: – DDI-C metadata creation in Nesstar Publisher 4. Publishing: – Archival storage and access format creation – Data publication to Nesstar server – Metadata publication to Nesstar and ADA CMS
  13. 13. The ADA study page Study information is available through the tabs at the top of the study: • Study: information including the investigators, abstract, sample, data collection methods, and access requirements. • Variables: a list of variables available in a quantitative dataset • Related Materials: additional documentation, links and other related studies (eg. others in the series) that may interest you The study page is also the access point for the ADA Nesstar system, for: • Analysis of quantitative data online, • Download of data to your own computer.
  14. 14. The ADA Study Page
  15. 15. Future plans: Dataverse • • “Dataverse is an open source web application to share, preserve, cite, explore, and analyze research data. It facilitates making data available to others, and allows you to replicate others' work more easily. Researchers, data authors, publishers, data distributors, and affiliated institutions all receive academic credit and web visibility. • A Dataverse repository is the software installation, which then hosts multiple dataverses. Each dataverse contains datasets, and each dataset contains descriptive metadata and data files (including documentation and code that accompany the data). As an organizing method, dataverses may also contain other dataverses.”
  16. 16. Harvard Dataverse
  17. 17. Features • One installation, multiple logins • Multiple hosting options: Bare metal, VMWare, AWS, OpenStack, … • Login options: Native, ORCID, Shibboleth, … • API and GUI access • Client libraries: R, Python, Java • OAI-PMH harvesting • Open and Restricted data access • New implications for data archiving, curation, management and dissemination
  18. 18. Questions? Steven McEachern