Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Winter school in research data science research data management - final

101 views

Published on

Research Data Management 1 - presented at the Winter School in Research Data Science. 14-15 June 2018

Published in: Education
  • Be the first to comment

  • Be the first to like this

Winter school in research data science research data management - final

  1. 1. Winter School in Research Data Science Research Data Management I 14-15 June 2018 University of Queensland © The authors This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
  2. 2. Introductions Natasha Simons Program Leader, Skills Policy and Resources ANDS, Nectar, RDS & Industry Fellow, The University of Queensland Professor Ginny Barbour Director, Australasian Open Access Strategy Group, Professor Library and OREI, QUT
  3. 3. Research Data Management ● Introduction to data management ● Data management planning ● Making your data FAIR (Findable, Accessible, Interoperable and Reusable) ● How to manage your working data ● Managing personal and sensitive data ● Maximizing your research (data) impact ● Bringing it all together
  4. 4. Your experience ● Have you ever used someone else’s data? ● Have you ever shared your data with someone else? ● Where (if anywhere) have you published your data? ● What do you consider to be the biggest challenge in managing your data?
  5. 5. Research Data Management: ● Introduction to data management ● Data management planning ● Making your data FAIR (Findable, Accessible, Interoperable and Reusable) ● How to manage your working data ● Managing personal and sensitive data ● Maximizing your research (data) impact ● Bringing it all together
  6. 6. What is Research Data? Research data means: data in the form of facts, observations, images, computer program results, recordings, measurements or experiences on which an argument, theory, test or hypothesis, or another research output is based. Data may be numerical, descriptive, visual or tactile. It may be raw, cleaned or processed, and may be held in any format or media. But this is only one definition of many…. Photo by rawpixel on Unsplash
  7. 7. What is Research Data? Any definition of research data is likely to depend on the context in which the question is asked. http://www.ands.org.au/guides/what-is-research-data Photo by h heyerlein on Unsplash
  8. 8. What’s Research Data Management? Research Data Management covers the planning, collecting, organising, managing, storage, security, backing up, preserving, and sharing your data. It ensures that research data are managed according to legal, statutory, ethical and funding body requirements. Source: UQ LibGuide Any research will require some level of data management. Photo by imgix on Unsplash
  9. 9. Why should you care about RDM? Good data management can: • Increase the efficiency of your research • Help guarantee the quality and authenticity of your data • Enable the exposure of your research outcomes through collaboration and dissemination • Provide for the reproducibility of experimental and computational outcomes • Facilitate the validation and verification of results. Photo by Jaron Nix on Unsplash
  10. 10. More publishers require data A condition of publication in a Nature journal is that authors are required to make materials, data, code, and associated protocols promptly available to readers without undue qualifications.
  11. 11. More funders require data “We want the research we fund – like publications, data, software and materials – to be open and accessible, so it can have the greatest possible impact” – Wellcome Trust https://wellcome.ac.uk/what-we-do/topics/data- sharing NHMRC’s Australian Code for the Responsible Conduct of Research: includes the proper management and retention of the research data. Australian Research Council (ARC) application forms (Discovery; Linkage) have a short section where you are required to provide an outline of your data management plan. ANDS Guide: ARC applications – filling in the data management sectionhttp://www.ands.org.au/guides/arc-guide-to-filling-in-the-dm-section
  12. 12. More government policies on data The main purpose of the site is to encourage public access to and reuse of public data. It was created following the Government’s Declaration of Open Government and as a response to the Government 2.0 Taskforce Report.
  13. 13. More institutional policies on data The University of Sydney RDM Policy - http://sydney.edu.au/policies/showdoc.aspx?recnum=PDOC2013/337&RendNum=0 - and RDM Procedures - http://sydney.edu.au/policies/showdoc.aspx?recnum=PDOC2014/366
  14. 14. More researchers care about data sharing Figshare open data survey of researchers 2017: • 82% aware of open data sets • 80% willing to reuse open data sets in own research • 60% routinely share their data (frequently or sometimes) • 21% have never made a data set openly available • 74% are now curating their data for sharing • 77% value a data citation the same as an article Science, Digital (2017): The State of Open Data 2017 Report - Infographic. figshare.https://doi.org/10.6084/m9.figshare.5519155.v1 pp. 7-11
  15. 15. More researchers are sharing their data More than two thirds of Wiley researchers reported they are now sharing their data. Though this varies geographically and across research disciplines we are seeing that more researchers are sharing their data and taking efforts to make it reproducible. Wiley Global Data Sharing Infographic June 2017. https://authorservices.wiley.c om/author-resources/Journal- Authors/licensing-open- access/open-access/data- sharing.html
  16. 16. Data sharing models https://vimeo.com/125783029
  17. 17. Why should you share your data? Read Nature Blog: 10 Reasons to share your data https://www.natureindex.com/news-blog/ten- reasons-to-share-your-data Then class discussion: ● What do think of the arguments put forward here? ● Do you agree? Disagree? ● Have you had a different experience?
  18. 18. Key messages • Any definition of research data is likely to depend on the context in which the question is asked. • Any research will require some level of data management. • Good data management can increase the efficiency of your research and enable the exposure of your research outcomes through collaboration and dissemination • More publishers, funders, governments and institutions require data management and sharing • More researchers care about data sharing and are sharing their own data • Not all data can be shared. Data can be open, shared or closed.
  19. 19. Research Data Management: ● Introduction to data management ● Data management planning ● Making your data FAIR (Findable, Accessible, Interoperable and Reusable) ● Managing personal and sensitive data ● How to manage your working data ● Maximizing your research (data) impact ● Bringing it all together
  20. 20. Data Management Planning How to get there from here IMT Sue Cook| Data Librarian
  21. 21. 22 |Presentation title | Presenter name your data FAIR data
  22. 22. “A Data Management Plan (DMP) typically outlines what research data will be created during the course of a research project and how it will be created, plans for sharing and preserving the data and any restrictions that may need to be applied.” http://www.ands.org.au/working-with-data/data-management/data-management-plans Planning in order to get there 23 | http://www.ands.org.au/guides/data-management-plans
  23. 23. • What data will be created • What policies will apply to the data • Who will own and have access to the data • What data management practices will be used • What facilities and equipment will be required • Who will be responsible for each of these activities • And … – Much more Just some of the sorts of RDM decisions that need to be planned: 24 |
  24. 24. 25 | https://www.nature.com/articles/d41586-018-03065-z
  25. 25. Example funder policies • Australian funding example- ARC “Researchers must outline briefly in their Proposal how they plan to manage research data arising from a Project” Section A11.5.2 • International examples: • National Science Foundation “Proposals submitted … must include a supplementary document of no more than two pages labeled "Data Management Plan".” • Wellcome Trust “Data sharing plans should address seven key questions as clearly and concisely as possible, as noted in the Trust's Guidance for researchers: Developing a data management and sharing plan” 26 |
  26. 26. • Australian http://projects.ands.org.au/policy.php • International http://www.ands.org.au/working-with-data/data- management/data-management-plans#International-2 • Discipline specific http://www.ands.org.au/working-with-data/data- management/data-management-plans#Discipline_specific_DMPs-3 • Published https://riojournal.com/browse_journal_articles.php?form_name=filter_ articles&sortby=0&journal_id=17&search_in_=0&section_type%5B%5D =231 Example DMPs to look at 27 |
  27. 27. 1. Constraints and obligations 2. Access 3. Description 4. Processes 5. Storage and compute https://confluence.csiro.au/display/RDM/Research+Data+Planner Example: CSIRO Research Data Planner 28 |
  28. 28. • Research Data Management Organiser RDMO https://rdmorganiser.github.io/en/ • DMPonline https://dmponline.dcc.ac.uk/ • DMP Tool https://dmptool.org/ • DMPonline/DMPtool https://github.com/DMPRoadmap • Protypes: • Data Stewardship Wizard https://dmp.fairdata.solutions/ • Institutional example: • Research Data Manager (UQRDM) https://research.uq.edu.au/project/research-data-manager-uqrdm Tools Available 29 | }
  29. 29. Final word: DMP future directions • Live DMPs • maDMPs • Exposing DMPs • Standards for DMPs • Force11 FAIR DMPs group 30 |
  30. 30. DMP Common Standards - Outputs • Common data model for machine-actionable DMPs • to model information from standard DMPs • NOT a template • NOT a questionnaire • modular design – core set of elements – domain specific extensions • Reference implementations • ready to use models – JSON, XML, RDF, etc. • Guidelines for adoption of the common data model • requirements for supporting systems • pilot studies www.rd-alliance.org - @resdatall Status: Recognised & Endorsed
  31. 31. Australasian DMP Interest Group ● Formed early 2017 ● Facilitated by ANDS ● Seeks to answer questions about Data Management Plans ● Brings the Australian and New Zealand community together to discuss DMP tools and approaches ● Links into international DMP developments, in particular through involvement in the Research Data Alliance DMP IG ● Meets every 2 months online ● Held a workshop at the eResearch Australasia conference http://www.ands.org.au/partners-and-communities/ands-communities/dmps-interest-group
  32. 32. UQ’s Research Data Manager Demonstration Ms Sandrine Kingston-Ducrot RDM Project Manager Office the Deputy Vice-Chancellor (Research) | The University of Queensland | Brisbane Queensland 4072 | Australia Telephone +61 7 336 58094 | email s.ducrot@research.uq.edu.au | web www.uq.edu.au/research Dr Andrew Janke Project Lead | UQ Research Data Manager (UQRDM) Informatics Fellow | National Imaging Facility (NIF) Systems Architect | Research Data Services (RDS) Senior Research Fellow | Centre for Advanced Imaging (CAI) orcid.org/0000-0003-0547-5171 | au.linkedin.com/in/ajanke | github.com/andrewjanke +61 7 3365 3392 | +61 402 700 883 | andrew.janke@uq.edu.au | www.cai.uq.edu.au 57-416 | The University of Queensland | Brisbane Australia 4072 | https://goo.gl/KxWIqG
  33. 33. https://www.youtube.com/watch?v=1GYMB8QdT60
  34. 34. FAIR ● Findable ● Accessible ● Interoperable ● Reusable
  35. 35. Setting the Scene... https://goo.gl/cAzYpj
  36. 36. Open Data/Data sharing - FAIR precursors Blogs.nature.com. (2018). [online] Available at: http://blogs.nature.com/naturejobs/files/2017/06/mat1.jpg [Accessed 23 Mar. 2018].
  37. 37. The FAIR Principles Wilkinson, M., Dumontier, M., Aalbersberg, I., Appleton, G., Axton, M., & Baak, A. et al. (2018). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data. Retrieved 27 March 2018, from https://www.nature.com/articles/sdata201618 Commons.wikimedia.org. (2016). File:FAIR data principles.jpg - Wikimedia Commons. [online] Available at: https://commons.wikimedia.org/wiki/File:FAIR_data_principles.jpg [Accessed 10 Apr. 2018]. “As open as possible, as closed as necessary”
  38. 38. A tool to assess how FAIR data are https://www.ands-nectar-rds.org.au/fair-tool
  39. 39. Find a dataset that relates to your work using one of the following resources, and assess it against the principles using the ANDS FAIR tool https://www.ands-nectar- rds.org.au/fair-tool Research Data Australia - https://researchdata.ands.org.au/ CSIRO Data Access Portal - https://data.csiro.au/ DataCite - https://search.datacite.org/ Discipline-specific repositories https://www.re3data.org/browse/by-subject/ A repository of your choosing. Presentation title | Presenter name40 | Activity - FAIR data assessment
  40. 40. F is for Findable To be Findable: F1. (meta)data are assigned a globally unique and eternally persistent identifier. F2. data are described with rich metadata. F3. (meta)data are registered or indexed in a searchable resource. F4. metadata specify the data identifier. http://www.uniprot.org/uniprot/P98161
  41. 41. What is Metadata? Ontotext.com. (2018). [online] Available at: https://ontotext.com/wp-content/uploads/2017/02/Metadata_01-768x384.png [Accessed 23 Mar. 2018]. “Metadata, you see, is really a love note – it might be to yourself, but in fact it’s a love note to the person after you, or the machine after you, where you’ve saved someone that amount of time to find something by telling them what this thing is.” Cit. Jason Scott’s Weblog
  42. 42. Metadata types Ontotext.com. (2018). [online] Available at: https://ontotext.com/wp-content/uploads/2017/02/Types-of-Metadata_03-768x384.png [Accessed 23 Mar. 2018]. ● Metadata standards
  43. 43. Metadata standards Fairsharing.org. (2018). FAIRsharing. [online] Available at: https://fairsharing.org/educational/# [Accessed 28 Mar. 2018]. http://www.dcc.ac.uk/resources/subject-areas/biology
  44. 44. Findable - metadata standards Examples: • Dublin Core http://dublincore.org/ • Darwin Core • ANZLIC • Marine Community Profile • VO https://www.ands.org.au/working-with-data/metadata
  45. 45. Resources on Findable ● DOI/Handle minting ● Metadata Standards Directories ● Research Data Australia ● Re3Data ● Thing 4: Data Discovery ● Thing 8: Citation metrics for data ● Thing 11: What's my metadata schema? ● www.ands.org.au/fair
  46. 46. A is for Accessible TO BE ACCESSIBLE: A1 (meta)data are retrievable by their identifier using a standardized communications protocol. A1.1 the protocol is open, free, and universally implementable. A1.2 the protocol allows for an authentication and authorization procedure, where necessary. A2 metadata are accessible, even when the data are no longer available.
  47. 47. Resources on Accessible ● Med.data materials ● Australian Data Access conditions ● ANDS Sensitive data materials ● Discovering data services ● Data Services Interest Group ● Thing 10: Sharing sensitive data ● Thing 19: APIs and apps ● www.ands.org.au/fair
  48. 48. Persistent Identifiers - The problem...
  49. 49. What are Persistent Identifiers? ● A persistent identifier (PiD) is a long–lasting reference to a digital resource ● Usually has two parts: ○ A unique identifier (ensures the provenance of a digital resource) ○ Location for the resource over time (ensures that the identifier resolves to the correct location)
  50. 50. Digital Object Identifiers (DOIs) ● DOIs can be created for data sets and associated outputs (e.g. literature, workflows, algorithms, software etc) - DOIs for data are equivalent with DOIs for other scholarly publications ● DOIs enable accurate data citation and bibliometrics (both metrics and altmetrics) ● Resolvable DOIs provide easy online access to research data for discovery, attribution and reuse ● DOIs are a persistent identifier and as such carry expectations of curation, persistent access and rich metadata
  51. 51. Why use PiDs? ● PiDs play a key role in the discoverability, accessibility and reproducibility of research ● Persistent identifiers solve the problem of the persistence of cited resource, particularly in the scholarly literature ● Some persistent identifiers (e.g. DOIs), have an added value in discoverability, making digital objects findable and reusable in multiple scholarly resources
  52. 52. Findable Presentation title | Presenter name53 | Global, persistent ID Metadata Searchable
  53. 53. Accessible (metadata) • Metadata is valuable in itself, when planning research, especially replication studies. • But it doesn’t replace the original data. Presentation title | Presenter name54 | https://retractionwatch.com/2016/12/02/stolen-data- prompts-science-flag-debated-study-fish-plastics/ “the theft of the computer on which the raw data for the paper were stored. These data were not backed up on any other device nor deposited in an appropriate repository.”
  54. 54. Globally unique & persistent identifiers (PIDs) • Book > International Standard Book Number (ISBN) • Research article / data / software > Digital Object Identifier (DOI) Wilkinson MD, Dumontier M, Aalbersberg IjJ, Appleton G, Axton M, Baak A, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016;3: 160018 doi: 10.1038/sdata.2016.18 • Person > Open Researcher & Contributor ID (ORCID) Presentation title | Presenter name55 |
  55. 55. Presentation title | Presenter name56 | Activity - ORCID profile ORCID has recently emerged as the preferred identifier for people by a range of Australian universities, funders and publishers worldwide. Choose from two activities: Option 1 - Don’t have an ORCID? 1. Follow steps 1 and 2 at https://orcid.org/ 2. When you’re done, add your ORCID to your email signature, LinkedIn profile, and any other places (e.g. blog) 3. Send your new ORCID number to a colleague and ask for some feedback on your profile Option 2 - Already have an ORCID? Read https://orcid.org/blog/2015/07/23/six-things-do-now-you've-got-orcid-id and take a moment to update your profile.
  56. 56. Accessible Presentation title | Presenter name57 | Standard access protocols (e.g. https, sftp, Web API) Explicit access conditions
  57. 57. Discovery Mechanisms Pixabay.com. (2018). Free Image on Pixabay - Needle, Hay, Needle In A Haystack. [online] Available at: https://pixabay.com/en/needle-hay-needle-in-a-haystack-1419606/ [Accessed 4 Apr. 2018]. ● Google ● Ask a colleague ● Find link to data in a journal article ● Data journals ● Database registries e.g. re3data ● Open data portals e.g. data.gov ● Institutional repositories ● Data / Discipline repositories e.g. Dryad ● Project website ● Library catalogues, databases ● Data discovery aggregators like Research Data Australia Where do you look for Data...
  58. 58. Research Data Australia https://researchdata.ands.org.au/
  59. 59. Bio-related Repositories ● https://www.nature.com/sdata/policies/repositories#life ● Gene Expression Omnibus: https://www.ncbi.nlm.nih.gov/geo/ ● Array Express: https://www.ebi.ac.uk/arrayexpress/ ● ELIXIR Deposition Databases for Biomolecular Data: https://www.elixir-europe.org/platforms/data/elixir- deposition-databases
  60. 60. Interoperable To be interoperable the data will need to use community agreed formats, language and vocabularies. The metadata will also need to use a community agreed standards and vocabularies, and contain links to related information using identifiers.
  61. 61. Interoperable I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation. I2. (meta)data use vocabularies that follow FAIR principles. I3. (meta)data include qualified references to other (meta)data.
  62. 62. Metadata use a formal, accessible and shared language Ontologies An ontology is a formal, explicit specification of a shared conceptualisation (Studer 1998) Formal: Machine readable Explicit specification: It explicitly defines concept, relations, attributes and constraints Shared: It is accepted by a group Conceptualisation: An abstract model of a phenomenon
  63. 63. Examples of Ontologies ● Gene Ontology Consortium (GO) ● The Sequence Ontology (SO) ● The Generic Model Organism Project (GMOD) ● Ontology for Biomedical Investigation
  64. 64. Controlled Vocabularies ● A controlled vocabulary reflects agreement on terminology used to label concepts. ● When research communities agree to use common language for the concepts in datasets, then the discovery, linking, understanding and reuse of research data are improved. http://www.ands.org.au/online-services/research-vocabularies-australia
  65. 65. Metadata use vocabularies that follow FAIR principles The controlled vocabulary used to describe data sets needs to be documented and resolvable using globally unique and persistent identifiers. This documentation needs to be easily findable and accessible by anyone who uses the data set.
  66. 66. Research Vocabularies Australia https://vocabs.ands.org.au/
  67. 67. Interoperable • Formats (open standards) • Data types (defined and used consistently) • Transfer (exchange between systems) • Rules (schemas, vocabularies) • Linked data models and cross- referencing Presentation title | Presenter name68 |
  68. 68. Reusable Reusable data should maintain its initial richness. For example, it should not be diminished for the purpose of explaining the findings in one particular publication. It needs a clear machine readable licence and provenance information on how the data was formed. It should also have discipline-specific data and metadata standards to give it rich contextual information that will allow for reuse. From: https://www.ands.org.au/working-with-data/fairdata
  69. 69. Reusable R1.1. (meta)data are released with a clear and accessible data usage license. R1.2 (meta)data are associated with their provenance. R1.3. (meta)data meet domain-relevant community standards.
  70. 70. License If data is not licensed no-one else can use it. In Australia, no licence is regarded as the same as 'all rights reserved', confining any reuse to very limited circumstances. From: http://www.ands.org.au/working-with- data/publishing-and-reusing-data/licensing-for-reuse https://www.youtube.com/embed/SHR1EJ0kQ3g?rel=0 [?]
  71. 71. Creative Commons Because: ● Title “Creative Commons 10th Birthday Celebration San Francisco” ● Author “tvol” – linked to his profile page ● Source “Creative Commons 10th Birthday Celebration San Francisco” – linked to original Flickr page ● License “CC BY 2.0” – linked to license deed How you attribute authors of the CC works will depend on whether you modify the content, if you create a derivative, if there are multiple sources, etc. Source: https://creativecommons.org/use-remix/get-permission/ “Creative Commons 10th Birthday Celebration San Francisco” by tvol is licensed under CC BY 2.0
  72. 72. Creative Commons
  73. 73. License chooser
  74. 74. CC-BY SA You are free to: ● Share — copy and redistribute the material in any medium or format ● Adapt — remix, transform, and build upon the material for any purpose, even commercially. Under the following terms: ● Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. ● ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original. No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
  75. 75. CC Australia - License explainer
  76. 76. Research Data Management: ● Introduction to data management ● Data management planning ● Making your data FAIR (Findable, Accessible, Interoperable and Reusable) ● How to manage your working data ● Managing personal and sensitive data ● Maximizing your research (data) impact ● Bringing it all together
  77. 77. Research Data Management: ● Introduction to data management ● Data management planning ● Making your data FAIR (Findable, Accessible, Interoperable and Reusable) ● How to manage your working data ● Managing personal and sensitive data ● Maximizing your research (data) impact ● Bringing it all together
  78. 78. Are you being FAIR to the future you? In 5 years time will my research data be: • Findable – A top draw of USB drives and sticks isn’t always a good data archive • Accessible – My new desktop doesn't have a DVD drive or what was the password on that encrypted data drive? • Interoperable – Wonder where I put my old copy of that software that compiles this binary data file? • Reusable – How accurate was that sensor network I used to gather these observations? Am I allowed to reuse this data? FAIR - Working Data | John Morrissey79 |
  79. 79. FAIR Working Data Findable by whom? How? Minimum viable metadata? • Standardized naming conventions for folders and files • Consider using Readme.txt files to describe content? Maybe you could include metadata.txt or metadata.json files embedded in folders • Think about what persistent identifiers are useful in your project. • Do you need a basic registry to manage metadata? FAIR - Working Data | John Morrissey80 |
  80. 80. FAIR Working Data Accessible by: Whom? How? What? • How will you manage identity and access control? • Shared storage resources – where? • Will you use simple storage or a higher level platform like a shared eLab notebook or database? • What categories of data will you hold/share and which data assets need to be kept long term? FAIR - Working Data | John Morrissey81 |
  81. 81. FAIR Working Data Interoperable: • What are the key standards currently applied to the projects domain/s? • Are my data producing assets standards compliant? Do they need to be? What do I have to do to convert my data assets to the correct format? • Do we have a set of vocabularies we want to use within our project? Where are they? • Who can help me with my standards compliance work? (Librarians? IT Specialists? Information Management Specialists?) FAIR - Working Data | John Morrissey82 |
  82. 82. FAIR Working Data Reusable: • Agree on a licencing framework before the project starts producing data • What data assets need to be preserved long term? • What data assets will we publish? • Where will we publish? • Who has contributed to the data asset and how will they be represented when the data published • Who will manage the long-term data archive? FAIR - Working Data | John Morrissey83 |
  83. 83. Research Data Management: ● Introduction to data management ● Data management planning ● Making your data FAIR (Findable, Accessible, Interoperable and Reusable) ● How to manage your working data ● Managing personal and sensitive data ● Maximizing your research (data) impact ● Bringing it all together
  84. 84. Research Data Management: ● Introduction to data management ● Data management planning ● Making your data FAIR (Findable, Accessible, Interoperable and Reusable) ● How to manage your working data ● Managing personal and sensitive data ● Maximizing your research (data) impact ● Bringing it all together
  85. 85. What are personal and sensitive data? Privacy Act 1988 Personal information Sensitive information Health information Sensitive data “data that can be used to identify an individual, species, object, process or location that introduces a risk of discrimination, harm or unwanted attention.” Guide to Publishing and Sharing Sensitive Data http://www.ands.org.au/guides/sensitivedata
  86. 86. Why it matters http://www.abc.net.au/news/2018-03-18/cambridge-analytica-suspended-by- facebook/9560272 https://www.smh.com.au/national/guilty-health-department-breached-privacy- laws-publishing-data-of-2-5m-people-20180329-p4z6wf.html
  87. 87. Ethics • Informed consent • A key principle of ethics is avoid harm • Can be achieved by removing/minimising sensitivity • De-identifying data if possible (and the meaning is not lost in the process) • Conditions around access to data (mediated access, 5 safes) • Ethics committee approval needs to cover consent and access conditions • See also ANDS’ medical webinar series http://www.ands.org.au/working-with-data/sensitive-data/medical-and- health/webinars-health-and-medical
  88. 88. Informed consent for data sharing 1. Avoid precluding data de-identification, publication and sharing 2. State possibility of future data publication 3. State conditions of access 4. Document consent with collected data to inform subsequent users Example wording available in ANDS Guide to Publishing and Sharing Sensitive Data
  89. 89. Identifiable* Re-identifiable* Non-identifiable* De-identification/ Anonymisation No specific individual can be identified Possible to re-identify an individual Identity of an individual can be reasonably ascertained http://www.ands.org.au/working-with- data/sensitive-data/de-identifying-data * Terms from National Statement on Ethical Conduct in Human Research 2007 (Updated May 2015) Data de-identification
  90. 90. What about sharing data that can’t be de-identified? healthtalkaustralia.org Informed consent / mediated access
  91. 91. Mediated access • The metadata is openly available but the data is not • Access mediated through • The researcher • The research team • The repository • A data access committee
  92. 92. Resources for medical and health data ands.org.au/working-with-data/sensitive-data ands.org.au/medical Publishing and sharing sensitive data Guide Data sharing considerations for Human Research Ethics Committees Guide De-identification Guide
  93. 93. Activity Look at the consent forms • UK Data Archive sample consent form • Global Alliance for Genomics and Health consent tools (focus on Section C) • Health Science Alliance Biobank Consent • https://www.icpsr.umich.edu/files/ICPSR/access/ dataprep.pdf (bottom of page 13) Discuss between groups some of the good and bad points of the consent form you examined.
  94. 94. Research Data Management: ● Introduction to data management ● Data management planning ● Making your data FAIR (Findable, Accessible, Interoperable and Reusable) ● How to manage your working data ● Managing personal and sensitive data ● Maximizing your research (data) impact ● Bringing it all together
  95. 95. Research Data Management: ● Introduction to data management ● Data management planning ● Making your data FAIR (Findable, Accessible, Interoperable and Reusable) ● How to manage your working data ● Managing personal and sensitive data ● Maximizing your research (data) impact ● Bringing it all together
  96. 96. What is impact? ● Public ○ Saving lives ○ Protecting the environment and wildlife ○ Supporting the economy ○ Influencing public policy ○ Educational ● Private ○ For you, your collaborators or university
  97. 97. Principles for maximizing your research (data) impact Highest impact will come from data that are: ● High quality ● Well described (FAIR) ● Citable ● Linked ● Open?
  98. 98. How to maximise impact of data ● Think early about what you are trying to achieve with your data ○ Will it be used for a single or multiple research projects? ○ Will you want to share/collaborate beyond your current colleagues ○ How will it relate to publications? ○ Could it have wider (including commercial) impact?
  99. 99. Impact - global, specific purpose “The arguments for sharing data, and the consequences of not doing so, have been thrown into stark relief by the Ebola and Zika outbreaks. In the context of a public health emergency of international concern, there is an imperative on all parties to make any information available that might have value in combating the crisis.” Wellcome Trust 2016, reissued 2018
  100. 100. Impact - global, wide purpose “CERN Open Data provides content for both education and research. We aim to support high- school students and teachers as well as university students and professors.”
  101. 101. Impact - regional, social “Data on Western Australian children’s health, learning, development and social characteristics will be mapped using geospatial technology so that community leaders and service providers can identify the priority issues for their children.”
  102. 102. Impact - regional, economic “This project is delivering new genetic knowledge directly assisting the breeding of better mungbean varieties for Australian growers.”
  103. 103. Impact through collaboration The PetaJakarta Data Sharing project: “aims to promote national and international research collaboration through the sharing of data related to the response of the city of Jakarta to flooding during the 2014/2015 monsoon season. Watch: the video Info: http://www.petajakarta.org/banjir/en/
  104. 104. Impact through publication ● Many journals now require data sharing ● Initially begun because of issues around reproducibility ● Now researchers are using to maximise impact of the work 39% of researchers: Sharing data “Increases the impact and visibility of my research”
  105. 105. Policies may vary, even within a publisher​ https://www.springernature.com/gp/authors/research-data-policy/data-policy-types/12327096
  106. 106. Maximising impact & retaining control at publication Data Availability: Data are available from the Ecosounds Acoustic Workbench. There are 1200 links provided in the supporting information "S4 File: Sample minutes", that provide access to the data used in this research. The audio files can be accessed through the following links. The project URL for the Ecosounds Acoustics Workbench is https://www.ecosounds.org/projects/ 1029/sites/1192 and https://www.ecosounds.org/projects/ 1029/sites/1193 Additionally, our data is backed up on QUT’s own HPC storage.
  107. 107. Appropriate sharing to maximise impact Full access to some data Moderated access to other data
  108. 108. Impact through data sharing is not new
  109. 109. The nuts and bolts of maximising impact ● High quality data and metadata ○ Persistent identifiers ● Consistent mechanisms for citing ● Thing 7: Data citation for access & attribution
  110. 110. Maximising impact: Cite it right ​Noble, T., Williams, B., & Mundree, S. (2017). Next generation mungbean SNP markers (Version 1). Queensland University of Technology. https://doi.org/10.4225/09/59b8a393e44f9 ● author/s​ ● year of publication​ ● title​ ● publisher (for data, this is often the archive where it is housed)​ ● edition or version​ ● access information (a URL, DOI or other persistent identifier)
  111. 111. Cite it right https://citation.crosscite.org
  112. 112. Key messages ● Maximising impact starts when you start collecting data ● Think about what impact means for you ● How will you share your data ● There are tools available to maximise impact, especially to get the citation right
  113. 113. Research Data Management: ● Introduction to data management ● Data management planning ● Making your data FAIR (Findable, Accessible, Interoperable and Reusable) ● How to manage your working data ● Managing personal and sensitive data ● Maximizing your research (data) impact ● Bringing it all together
  114. 114. Research Data Management: ● Introduction to data management ● Data management planning ● Making your data FAIR (Findable, Accessible, Interoperable and Reusable) ● How to manage your working data ● Managing personal and sensitive data ● Maximizing your research (data) impact ● Bringing it all together
  115. 115. Bringing it all Together ● How do you feel about Research Data Management now? ● Are there areas where you feel you need more information? ● Do you know what impact you want to get from your data? ● What tips have you got for others?
  116. 116. With the exception of third party images or where otherwise indicated, this work is licensed under the Creative Commons 4.0 International Attribution Licence. ANDS, Nectar and RDS are supported by the Australian Government through the National Collaborative Research Infrastructure Strategy Program (NCRIS). Thank you natasha.simons@ands.org.au @n_simons ginny.barbour@qut.edu.au @ginnybarbour

×