Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Ada slide presentation rsc day_feb2017_v2


Published on

Introduction to Australian Data Archive by Steven McEachern presented at the Research Support Community Day 2017

  • Be the first to comment

Ada slide presentation rsc day_feb2017_v2

  1. 1. Introduction to ADA: The Australian Data Archive as a Trusted Repository for Research Data Dr. Steve McEachern Director, ADA 2017 Research Support Community Day Colombo Theatre, UNSW 13 February, 2017
  2. 2. ADA in Brief • The Social Science Data Archive (now ADA) was set up in 1981, housed in the Research School of Social Sciences, with a mission to collect and preserve Australian social science data on behalf of the social science research community • The Archive holds over 5000 datasets from around 1500 studies, including national election studies; public opinion polls; social attitudes surveys, censuses, aggregate statistics, administrative data and many other sources. • Data holdings are sourced from academic, government and private sectors.
  3. 3. So what is a data archive? • ‘A “trusted system” that provides... an accessible and comprehensive service empowering researchers to locate, request, retrieve and use data resources in a simple, seamless and cost effective way, while at the same time protecting the privacy, confidentiality and intellectual property rights of those involved.’ Social Sciences and Humanities Research Council of Canada. “National Data Archive Consultation Final Report: Building Infrastructure for Access to and Preservation of Research Data in Canada” URL: [20 November 2003].
  4. 4. ADA Subarchives • Social Science – predominantly survey or polling based quantitative social science data • Historical – an archive of Australian census data tables from 1834 to the present day • Indigenous – A thematic archive bringing together research data about Aboriginal and Torres Strait Islanders • Longitudinal –major longitudinal cohort and panel surveys of the Australian population • Qualitative – a new collection which provides specialist data archiving and access services to qualitative researchers • Crime & Justice – major collections of data in crime, law and justice, including criminal justice administrative data • International – a central point of access for links to international data sources around the world
  5. 5. ADA Data Holdings  Ageing  Business and management  Census data  Culture  Demography  Drugs, alcohol and tobacco  Economics  Education, employment and work  Environment, Conservation, Land use  Family studies  Foreign affairs  Gambling  Health  Housing  Law, Crime, Courts  Mass media, communication and language  Migration, immigration and multiculturalism  Politics and elections  Public opinion and social attitudes  Psychology  Quality of life  Science, Technology  Social welfare  Sociology  Tourism, recreation and leisure  Travel and transport ADA data holdings cover a wide variety of subject areas:
  6. 6. Example studies • Australian Survey of Social Attitudes (ANU, UWA, UQ, …) • Longitudinal Surveys of Australian Youth (NCVER) • Australian Election Studies (ANU, QUT) • ANUPolls, Morgan Gallup Polls, Age Polls, Lowy polls (1947 – Present) • Colonial census tables and images, 1838-1901 (ABS) • Census tabulations, 1966 – Present (ABS) • National Drug Strategy Household Survey, 1994 – Present (AIHW) • Australian Workplace Relations Survey, 1990, 1995, 2014 (forthcoming) - Dept of Employment • Negotiating the Life Course (ANU, AIFS, UQ)
  7. 7. Forthcoming • Longitudinal studies – Department of Social Services • HILDA, LSAC, LSIC, BNLA – National Centre for Vocational Education Research • LSAY new wave – Department of Health • Australian Longitudinal Studies on Womens Health (ALSWH) and Mens Health (Ten to Men) – Bruce: Child Support study • Exercise, Recreation and Sport Survey 2001-2010 (Australian Sports Commission) • Giving Australia survey (DSS)
  8. 8. The ADA website
  9. 9. The ADA Study Page
  10. 10. Dataset study pages Study information is based on the DDI-C (Data Documentation Initiative) standard, and includes: • Study: information including the investigators, abstract, sample, data collection methods, and access requirements. • Variables: a list of variables available in a quantitative dataset • Related Materials: additional documentation (reports, questionnaires, technical information), links and other related studies (eg. others in the series) that may interest you
  11. 11. Who uses ADA? • 2016 – 12000 online analyses (usually crosstabulations) – 1100 data file downloads • Registrations: – Approx. 1000 new users each year • User types: – Undergraduates: 41% of analysis, 16% of downloads – Postgraduates: 33% / 40% – Researchers:11% / 40% – Others (media, government, NGO, etc.): 15% / 4% • Institution types: (approx.) – Australian universities: 70% – International universities: 15% – Government departments and agencies: 10% – Other: 5%
  12. 12. Data dissemination options
  13. 13. The ADA study page Study information is available through the tabs at the top of the study: • Study: information including the investigators, abstract, sample, data collection methods, and access requirements. • Variables: a list of variables available in a quantitative dataset • Related Materials: additional documentation, links and other related studies (eg. others in the series) that may interest you The study page is also the access point for the ADA Nesstar system, for: • Analysis of quantitative data online, • Download of data to your own computer. Note: you will need to log in to your ADA user account in order to access the Nesstar system.
  14. 14. Types of access • Browse (viewing metadata): – Open access • Analyse (Online analysis): free user registration – General access studies: Free access for registered users – Restricted studies: User still requires approval to access • Data download: – For unrestricted data: submit a user request, and sign ADA general user undertaking (reviewed by ADA staff) – For restricted data: restricted access request form and specific user undertaking (reviewed by ADA and depositor of data) – Special access: depends on the particular access requirements
  15. 15. Browsing: The ADA Study Page
  16. 16. Exploring data in Nesstar • The information about the study (from the ADA study page) is also available in Nesstar. Click on the Dataset icon to explore the study. • For quantitative analysis, you can also view basic statistics and charts for individual variables in this section, by exploring the Variables tab
  17. 17. Exploring variables in Nesstar
  18. 18. Creating a cross-tabulation
  19. 19. Downloading data • Nesstar is also used as the ADA data download system, to export the data files for the study to your own computer. • To download data, you need to have been approved for download access for the study you are interested in. • This can be done by submitting a Request for Data Access: – a) from the “Request Analysis and Download access” link from a study page, OR – b) from your personal User page ( • This request then goes to the ADA User Services team for approval. • Once your download access has been approved, you will receive an email notification from ADA, and a link to the study will be added to your User Page.
  20. 20. Managing and Depositing Data: ADA and DDI
  21. 21. Data deposit: ADAPT
  22. 22. Archival processing Manual system with some automation tools 1. Deposit: – Review of ADAPT submission – Storage via ADAPT to file store 2. Data processing: – File format conversion (usually to SPSS for processing) – Privacy/confidentiality review – Data cleaning (in consultation with depositor) 3. Metadata processing: – DDI-C metadata creation in Nesstar Publisher 4. Publishing: – Archival storage and access format creation – Data publication to Nesstar server – Metadata publication to Nesstar and ADA CMS
  23. 23. Future directions
  24. 24. Future trends • Mandated rather than recommended data archiving – How do we scale? – Looking at self-deposit systems • Open access to data as the default – Government: PM&C Open Data Policy, – Research: Horizon2020, ESRC, NSF, ARC/NHMRC?? • Broader range of data types available – Qualitative data: YES – Social media data: • Raw feed (firehose): NO • Processed data: ??? (how to support access) – Administrative data: ??? • Broader range of users of that data – Different disciplines: health, environment, comp. sci. – Different users: public/media/government – Different geographies: internationally
  25. 25. Core needs for social science data • Collection • Preservation • Integration • Analysis • Dissemination
  26. 26. ADA trusted digital repository project • Funded by ANDS 2016-17 • Aims: – Completion of the Data Seal of Approval self-assessment and certification process • • 16 requirements: • Assessment on 0-4 scale: • All requirements must be at least a 1 – Implemention of improvements to ADA systems and procedures to improve certification assessment – Review of the DSA certification process and criteria to assess suitability for the Australian research data environment
  27. 27. DSA requirements • “Fundamental to the following guidelines are five criteria, that together determine whether or not the digital research data may be qualified as sustainably archived: – The research data can be found on the Internet. – The research data are accessible, while taking into account relevant legislation with regard to personal information and intellectual property of the data. – The research data are available in a usable format. – The research data are reliable. – The research data can be referred to.” • 2013/09/27/dsa-booklet_1_june2010.pdf
  28. 28. The guidelines • “The associated guidelines relate to the implementation of these criteria and focus on three stakeholders: the data producer, the data repository and the data consumer. 1. The data producer is responsible for the quality of the digital research data. 2. The data repository is responsible for the quality of storage and availability of the data and data management. 3. The data consumer is responsible for the quality of use of the digital research data.” – 7/dsa-booklet_1_june2010.pdf • Guidelines: eDRSTE53bDUwd28/view
  29. 29. Repositories and archives project • With UNSW Library (Maude Frances) • Exploring mechanisms for deposit and preservation of data through repository to the data archive • Questions we are exploring: – Where should we deposit the data? – Who should store the data? – What metadata should we collect? – Who should manage the metadata? – How to transfer content (data and metadata) between repository and archive? – How to determine the “source of truth”? (e.g. who should mint the DOI?)
  30. 30. ADA Dataverse • Redevelopment of our database and website infrastructure – New website – New data catalogue • New functionality: – Self-deposit of data – Open data access – API access (both for deposit and access, e.g. through R) – Shibboleth authentication • Currently in early testing – For completion in 2017 (probably Q3) • Functionality intended to support additional DSA requirements
  31. 31. ADA Dataverse
  32. 32. Questions? Steven McEachern
  33. 33. Data documentation standards
  34. 34. DDI-Codebook • Two flavours of DDI – Codebook and Lifecycle • Focus on DDI-C, four sections: 1. Document description: characteristics of the DDI XML document itself 2. Study description: characteristics of the Study (project) that the DDI is describing (including Related Materials: documents associated with the project, such as questionnaires, codebooks, etc.) 3. File description: characteristics of the physical data files 4. Variable description: characteristics of the variables in the data file
  35. 35. Dublin Core • Type • Format • Identifier • Source • Language • Relation • Coverage • Rights • Title • Creator • Subject • Description • Publisher • Contributor • Date
  36. 36. DCAT (W3C) DCAT standard is relatively simple, and includes four basic objects: • Dataset: “a collection of data, published or curated by a single agent, and available for access or download in one or more formats” • Data catalog(ue): “ a curated collection of metadata about datasets” • Catalog(ue) record: “a record in a data catalog, describing a single dataset” • Distribution: “represents a specific available form of a dataset” • Key object for SRC is the Dataset – others are distribution-related
  37. 37. ADA systems architecture
  38. 38. Approach • Core archive website: – • Sub-archives focussed on specialised thematic or methodological areas - eg. • “Add-on” systems for complex analysis or visualisation tasks: – Nesstar – GIS: – Longitudinal visualisation: Panemalia – Historical census data:
  39. 39. OAIS architecture
  40. 40. Data sharing policies in Australia
  41. 41. Policy trends in data access • Mandated rather than recommended data archiving • Open access to data as the default (NSF, Office of the President,,.uk)) • Broader range of data types available • Broader range of users of that data
  42. 42. Policy drivers • Funders: Return on investment: – Government data: Treasury, PM&C – Research data: ARC/NHMRC, Horizon 2020 • Journal publishers: Reputation: – Open access journals (e.g. PLOS One) and – For-profit publishers (e.g. Nature, Science, Elsevier) concerned about loss of credibility from fraudulent research • Learned societies and disciplines: Good science AND reputation: – American Political Science Association: DART initiative – American Economic Association:
  43. 43. Government data • Australia: Australian Government Public Data Policy Statement – The Australian Government commits to optimise the use and reuse of public data; to release non-sensitive data as open by default; and to collaborate with the private and research sectors to extend the value of public data for the benefit of the Australian public. – Public data includes all data collected by government entities for any purposes including; government administration, research or service delivery. – Non-sensitive data is anonymised data that does not identify an individual or breach privacy or security requirements. – vt_public_data_policy_statement_1.pdf
  44. 44. Research data • Australian Code for the Responsible Conduct of Research • (Joint ARC/NHMRC publication) • Section 2: Management of research data and primary materials • Then provides related links to ethics statements and similar
  45. 45. ACRCR Section 2: Responsibilities of Institutions Section 2.1.1: In general, the minimum recommended period for retention of research data is 5 years from the date of publication. However, in any particular case, the period for which data should be retained should be determined by the specific type of research. For example: • for short-term research projects that are for assessment purposes only, such as research projects completed by students, retaining research data for 12 months after the completion of the project may be sufficient • for most clinical trials, retaining research data for 15 years or more may be necessary • for areas such as gene therapy, research data must be retained permanently (eg patient records) • if the work has community or heritage value, research data should be kept permanently at this stage, preferably within a national collection.
  46. 46. ARC statement "Researchers and institutions have an obligation to care for and maintain research data in accordance with the Australian Code for the Responsible Conduct of Research (2007). The ARC considers data management planning an important part of the responsible conduct of research and strongly encourages the depositing of data arising from a Project in an appropriate publicly accessible subject and/or institutional repository"
  47. 47. ANDS suggest three questions 1. Where will your research data be stored at completion of the project? 2. What access will you provide to the data set on completion of the project? 3. How will you enable others to reuse your research data?
  48. 48. Horizon 2020 • -funding-guide/cross-cutting-issues/open-access- data-management/open-access_en.htm • (All grants): Develop a data management plan (DMP) within 6 months of commencement of project • Pilot program (2014-17): – Deposit research data described in DMP, preferably in a research data repository – As far as possible, projects must then take measures to enable third parties to access, mine, exploit, reproduce and disseminate (free of charge for any user) this research data. – Guidelines recommend FAIR principles
  49. 49. FAIR principles • Findable • Accessible • Interoperable • Reusable • Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3:160018 doi: 10.1038/sdata.2016.18 (2016).