Successfully reported this slideshow.
Your SlideShare is downloading. ×

20160414 23 Research Data Things

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
23 Research Data
Things
Research Data Coordinator
Katina Toufexis
How do I access the 23 Things?
Overview
• ands.org.au/23-things
How often do I need to do a Thing?
• 1 Thing is released e...
Calendar of Events

YouTube videos are no longer supported on SlideShare

View original on YouTube

Loading in …3
×

Check these out next

1 of 58 Ad
Advertisement

More Related Content

Slideshows for you (19)

Advertisement

Similar to 20160414 23 Research Data Things (20)

Recently uploaded (20)

Advertisement

20160414 23 Research Data Things

  1. 1. 23 Research Data Things Research Data Coordinator Katina Toufexis
  2. 2. How do I access the 23 Things? Overview • ands.org.au/23-things How often do I need to do a Thing? • 1 Thing is released each week • Complete at your own pace UWA participation • Monthly catch-ups in BJM which include a national ANDS webinar • Discussions in our Google + UWA-only group
  3. 3. Calendar of Events
  4. 4. Week Mon Tue Wed Thu Fri 1 March Kick-off Webinar 2 3 4 1 - Getting started with data 7 8 9 10 11 2 - Issues in research data management 14 15 16 17 18 3 - Data in the research lifecycle 21 22 23 24 25 Easter Week 28 29 30 31 1 April 4 - Repositories for data discovery 4 5 6 7 8 11 12 Webinar Catch-up 13 14 15 5 - Repositories for data sharing 18 19 20 21 22 6 - Long-lived data: curation & preservation 25 26 27 28 29 7 - Data citation for access & attribution 2 May 3 4 5 6 8 - Citation metrics for data 9 10 11 12 13 9 - Licensing data for reuse 16 17 18 19 20 23 24 Webinar Catch-up 25 26 27 10 - Sharing sensitive data 30 31 1 June 2 3 11 - What's my schema? 6 7 8 9 10 12 - Vocabularies for data description 13 14 15 16 17 13 - Walk the crosswalk 20 21 22 23 24 27 28 Webinar Catch-up 29 30 1 July 14 - Identifiers and linked data 4 5 6 7 8 11 12 13 14 15 15 - Data management plans 18 19 20 21 22 16 - What are publishers & funders saying about data? 25 26 27 28 29 1 August 2 Webinar Catch-up 3 4 5 17 - Data literacy & outreach 8 9 10 11 12 18 - Data interviews: talk the talk 15 16 17 18 19 19 - Exploring APIs and Apps 22 23 24 25 26 20 - Do it with data! 29 30 31 1 September 2 5 6 Webinar Catch-up 7 8 9 12 13 14 15 16 21 - Tools of the trade 19 20 21 22 23 22 - What's in a name? 26 27 28 29 30 23 - Making connections 3 October 4 5 6 7 10 11 12 13 14 17 18 Webinar Catch-up 19 20 21
  5. 5. Calendar of Events Catch-up webinars and topic review at BJM each month in 2016 • 1 March • 12 April • 24 May • 28 June • 2 August • 6 September • 18 October
  6. 6. ANDS Website
  7. 7. ANDS Meetup site
  8. 8. UWA-only Discussion Group Invitations have been sent out
  9. 9. Getting started with research data • What "research data" are we talking about? Thing 1
  10. 10. • Open up this record of research data collected during a CSIRO voyage which explored the sea floor (i.e. Benthic zone) of the Marmion Lagoon, located just off Perth, in 2007 Thing 1
  11. 11. • Click on Data tab Thing 1
  12. 12. • Protein Data Bank Thing 1
  13. 13. • your favourite research data tech or software story or experience (eg did you compete in GovHack 2015? or Science Hackfest Melbourne 4-6 March 2016?) • a software tool or service for research data you think others might be interested in • a question or research data problem to crowdsource a solution Thing 1
  14. 14. Thing 1
  15. 15. Thing 1
  16. 16. Issues in research data management Research data is for everyone. Governments and universities all around Australia and the world are now encouraging researchers to better manage their data so others can use it. Research data might be critical to solving the big questions of our time, but so much data are being lost or poorly managed. Thing 2
  17. 17. Issues in research data management • https://www.yout ube.com/watch? v=66oNv_DJuPc • As you watch the cartoon note the data management mistakes which interest or appal you. Thing 2
  18. 18. Issues in research data management "Big Data" is a term we're hearing with increasing frequency. Data management for Big Data brings much complexity - citing dynamic data, software, high volume compute, storage costs, transfer of petabytes of data, preservation, provenance, more. • Read this post and presentation titled: "Big Data: The 5Vs Everyone Must Know. Thing 2
  19. 19. Issues in research data management Thing 2
  20. 20. Issues in research data management Thing 2
  21. 21. Issues in research data management Thing 2
  22. 22. Issues in research data management Thing 2
  23. 23. Issues in research data management Thing 2
  24. 24. Issues in research data management Thing 2
  25. 25. Issues in research data management Thing 2
  26. 26. Thing 3
  27. 27. • Laboratory Notebooks are used by researchers to formally record their research activities. As research has become increasingly digital and collaborative the utility of traditional hard copy Lab Notebooks has been challenged. Not surprisingly then, eLab Notebooks (ELN) have emerged as an alternative. • Effective data management for constantly updated data, such as that within ELNs, is a real challenge for projects who wish to publish their data during the project. Thing 2 Issues in research data management
  28. 28. • Definition Electronic lab notebook (ELN) software allows scientists to access, search and share results of their experiments. An ELN is essentially a computer program that is meant to replace traditional paper laboratory notebooks so that scientists and researchers can search their records more easily and have more efficient means to backup and copy their data onto other electronic devices. ELNs encourage collaboration, as it is possible for multiple researchers or scientists to view lab data at the same time. ELNs also have the capacity to work alongside other research instruments so that additional data can be incorporated quickly and efficiently. ELNs should be supported by strong security measures to ensure that the data and the researchers’ process of creating the data are not jeopardized in any way. Additionally ELNs should be flexible to change if a particular research process is altered or new data is required. This flexibility is best addressed when developing the specific software for an ELN. Thing 2 Issues in research data management
  29. 29. • International team of scientists open sources search for malaria cure about how an international team of scientists and citizen scientists are using open source ELNs to speed up a cure for malaria. Thing 2 Issues in research data management
  30. 30. • You can see their open ELNs here Thing 2 Issues in research data management
  31. 31. • You can see their open ELNs here Thing 2 Issues in research data management
  32. 32. Thing 2 Issues in research data management http://www.wellcome.ac.uk/News/Media-office/Press-releases/2016/WTP060169.htm
  33. 33. Data in the research lifecycle • Data and its management change over time. Here we look at data and research lifecycles and make connections between them. • Data often have a longer lifespan than the research project that creates them. • Follow-up projects may analyse or add to the data, and data may be reused by other researchers. • Journals publishers are increasingly mandating that the data underpinning a journal article be retained and made accessible for the long term. Thing 3
  34. 34. Data in the research lifecycle • A data lifecycle shows the different phases a dataset goes through as the research project moves from o "having a brilliant idea" to o "making ground breaking discoveries" to o "telling the world about it" Thing 3 http://www.data-archive.ac.uk/create- manage/life-cycle
  35. 35. Data in the research lifecycle • A data lifecycle shows the different phases a dataset goes through as the research project moves from o "having a brilliant idea" to o "making ground breaking discoveries" to o "telling the world about it" Thing 3 http://www.data-archive.ac.uk/create- manage/life-cycle
  36. 36. Data in the research lifecycle Thing 3 http://www.data-archive.ac.uk/create- manage/life-cycle • A data lifecycle shows the different phases a dataset goes through as the research project moves from o "having a brilliant idea" to o "making ground breaking discoveries" to o "telling the world about it"
  37. 37. Data in the research lifecycle Thing 3 http://www.data-archive.ac.uk/create- manage/life-cycle • A data lifecycle shows the different phases a dataset goes through as the research project moves from o "having a brilliant idea" to o "making ground breaking discoveries" to o "telling the world about it"
  38. 38. Data in the research lifecycle Thing 3 http://www.data-archive.ac.uk/create- manage/life-cycle • A data lifecycle shows the different phases a dataset goes through as the research project moves from o "having a brilliant idea" to o "making ground breaking discoveries" to o "telling the world about it"
  39. 39. Data in the research lifecycle Thing 3 http://www.data-archive.ac.uk/create- manage/life-cycle • A data lifecycle shows the different phases a dataset goes through as the research project moves from o "having a brilliant idea" to o "making ground breaking discoveries" to o "telling the world about it"
  40. 40. Data in the research lifecycle Thing 3 • A data lifecycle shows the different phases a dataset goes through as the research project moves from o "having a brilliant idea" to o "making ground breaking discoveries" to o "telling the world about it"
  41. 41. Thing 3 Data in the research lifecycle http://www.dcc.ac.uk/resources/curation-lifecycle-model • Digital Curation Centre • Take a look at the DCC Curation Lifecycle Model which concentrates of preservation and curation within data management.
  42. 42. Thing 3 Data in the research lifecycle http://www.dcc.ac.uk/resources/curation-lifecycle-model • What could we add???
  43. 43. Thing 3 Data in the research lifecycle http://www.dcc.ac.uk/resources/curation-lifecycle-model
  44. 44. Thing 3 Data in the research lifecycle
  45. 45. Thing 3 Data in the research lifecycle http://www.library.uwa.edu.au/research/services
  46. 46. Data Discovery • Repositories enable discovery of data by publishing data descriptions ("metadata") about the data they hold - like a library catalogue describes the materials held in a library. • Most repositories provide access to the data itself, but not always. Thing 4
  47. 47. Data Discovery • Data portals or aggregators draw together research data records from a number of repositories. • eg Research Data Australia (RDA) aggregates records from over 100 Australian research repositories. • https://researchdata.ands.org.au/measuring-effects-human-leptonychotes- weddellii/640511/ Thing 4
  48. 48. Data Discovery Thing 4
  49. 49. Data Discovery Thing 4
  50. 50. Data Discovery Thing 4
  51. 51. Data Discovery Thing 4
  52. 52. Data Discovery Thing 4 • What data repositories exist and how are Australian researchers sharing their data? • Start by going to re3data.org
  53. 53. Data Discovery Thing 4 • There are 61 repositories listed for Australia.
  54. 54. Data Discovery Thing 4 • There are 61 repositories listed for Australia.
  55. 55. Data Discovery Thing 4 What makes a "good" data repository?
  56. 56. Data Discovery Thing 4 DCC checklist for evaluating data repositories What does this checklist cover and what does it exclude? Choosing a long-term service to look after data means asking questions similar to those you ask when choosing a publisher; ‘if I hand this over, will they review it, safeguard the content, and make sure it is accessible for as long as it is of value?’ This checklist relates these questions to the following key considerations: 1. Is a reputable repository available? 2. Will it take the data you want to deposit? 3. Will it be safe in legal terms? 4. Will the repository sustain the data value? 5. Will it support analysis and track data usage? See more at: http://www.dcc.ac.uk/resources/how-guides-checklists/where-keep- research-data#1
  57. 57. Contacts Contact UWA 23 Things Coordinators: Caroline Clark caroline.clark@uwa.edu.au Nola Steiner nola.steiner@uwa.edu.au Katina Toufexis katina.toufexis@uwa.edu.au

Editor's Notes

  • Data are distinct pieces of information, usually formatted in a special way. Strictly speaking, data is the plural of datum, a single piece of information. In practice, however, people use data as both the singular and plural form of the word. In database management systems, data files are the files that store the database information.
    Research data is data that is collected, observed, or created, for purposes of analysis to produce original research results. The word “data” is used throughout this site to refer to research data.
    Research data can be generated for different purposes and through different processes, and can be divided into different categories. Each category may require a different type of data management plan.
    Observational: data captured in real-time, usually irreplaceable. For example, sensor data, survey data, sample data, neurological images.
    Experimental: data from lab equipment, often reproducible, but can be expensive. For example, gene sequences, chromatograms, toroid magnetic field data.
    Simulation: data generated from test models where model and metadata are more important than output data. For example, climate models, economic models.
    Derived or compiled: data is reproducible but expensive. For example, text and data mining, compiled database, 3D models.
    Reference or canonical: a (static or organic) conglomeration or collection of smaller (peer-reviewed) datasets, most probably published and curated. For example, gene sequence databanks, chemical structures, or spatial data portals.
    Research data may include all of the following:
    Text or Word documents, spreadsheets
    Laboratory notebooks, field notebooks, diaries
    Questionnaires, transcripts, codebooks
    Audiotapes, videotapes
    Photographs, films
    Test responses
    Slides, artifacts, specimens, samples
    Collection of digital objects acquired and generated during the process of research
    Data files
    Database contents including video, audio, text, images
    Models, algorithms, scripts
    Contents of an application such as input, output, log files for analysis software, simulation software, schemas
    Methodologies and workflows
    Standard operating procedures and protocols
    The following research records may also be important to manage during and beyond the life of a project:
    Correspondence including electronic mail and paper-based correspondence
    Project files
    Grant applications
    Ethics applications
    Technical reports
    Research reports
    Master lists
    Signed consent forms
  • see what different formats data comes in

  • Choose one of the 4 specialised data repositories below, or find another data repository of interest - particularly one in a discipline you are unfamiliar with and spend some time browsing around your chosen repository to get a feel for the data available.

    Think about how the data here differs from data you are familiar with.  Consider for example, format, size and access method.

    Share an idea about how cross disciplinary research could be affected by discipline data conventions, and also one way cross disciplinary data access can be facilitated .
  • The researcher could have copied the data from the USB stick to a shareable storage option like AARNet's Cloudstor (first 100 GB free) As the software was no longer supported, the researcher could have extracted, or tried to extract, the data (in the proprietary format) to another machine readable format (e.g. CSV or XML) The researcher could have copied the data from the USB stick to a secure and backed up system (most institutions have such systems)

    The researcher's continued reluctance to share the data he had collected and his repeated assertions that all the information about the data was in his journal article. This underlined, for me anyway, the researcher's basic lack of understanding surrounding the value and usefulness of the data he collected for other researchers. He failed utterly to consider the possibility that others may not only want to view his data, but actually make use of it in their own research activities. This then led on to all the mistakes he made such as; failing to abide by his publisher's open access mandate, putting the data on a USB without making any copies and then losing it and my favorite, not labeling his fields with useful names and then forgetting what they measured. I suspect this basic lack of understanding is one of the biggest barriers to the practice of good research data management.

    not abiding by funder and publishers rules and regulations for retention and sharing the research data - not using opensource software that can reduce some of these problems - not having a reader friendly data, with legends, guidelines and key word definitions. - not thinking if the research is replicable before publishing the article - not thinking about storing the data appropriately to the subject area and the size of the data that the project produces - not using research infrastructure available to academics either through the university/research institute or online - not engaging with the wider research community in a productive manner to expand the boundaries of science.

    * no cataolgue of what the data actually is and what the columns represent in a spreadsheet * no software that will read the data files * researchers general reluctance to share data, thinking that all the information that anyone would/should ever need is in the article * using USB hard drives to store the data, and then only having one copy.

    How to avoid it - good data management: * multiple copies * secure backup up storage * abiding by publisher/funder mandates * sharing on Research Data Australia or data repository


  • This article uses 5V's: volume, variety, velocity, veracity and value as a concept for how big data can be managed more successfully.
  • This article uses 5V's: volume, variety, velocity, veracity and value as a concept for how big data can be managed more successfully.
  • This article uses 5V's: volume, variety, velocity, veracity and value as a concept for how big data can be managed more successfully.
  • This article uses 5V's: volume, variety, velocity, veracity and value as a concept for how big data can be managed more successfully.
  • This article uses 5V's: volume, variety, velocity, veracity and value as a concept for how big data can be managed more successfully.
  • This article uses 5V's: volume, variety, velocity, veracity and value as a concept for how big data can be managed more successfully.
  • This article uses 5V's: volume, variety, velocity, veracity and value as a concept for how big data can be managed more successfully.
  • This article uses 5V's: volume, variety, velocity, veracity and value as a concept for how big data can be managed more successfully.
  • In late November 2012, the Open Source Malaria (OSM) team gained a new member who lived and worked almost 1700 kilometers away from the synthetic chemistry hub at the University of Sydney. Of course, collaboration across continents is not unusual for scientists, but until recently, recruitment in less than 140 characters certainly was.
    View the complete collection of Open Science Week articles
    Patrick Thompson—who’d just submitted his PhD thesis at the University of Edinburgh—responded to a Twitter request for synthetic help on an important new target for the team. True to his promise, Patrick later delivered several compounds for biological testing in Dundee, Scotland. Although it turned out that the molecules Patrick made weren’t so good at killing the malaria parasite, these "negative" results provided invaluable data for the team.
    Patrick’s contribution would not have been possible in a regular drug discovery program. Veiled in secrecy and often complicated by patents and intellectual property issues, chemists aren’t always the best at sharing their results, at least not until they are published in peer reviewed journals—and sometimes after significant cherry picking. This means that lots of data, especially "negative" data often only resides in piles of dusty paper lab notebooks, hidden from all but the immediate scientific community.
    Avoiding the loss of vast quantities of data is just one of the reasons behind the formation of the OSM team. The open source drug discovery project commenced in 2011, when Matthew Todd’s lab received funding from the Medicines for Malaria Venture (MMV) and then from the Australian Research Council in the form of a linkage grant. GlaxoSmithKline (GSK), a leading pharmaceutical company, had just published a revolutionary paper containing potential antimalarial medicines and placed the information into the public domain. This open GSK data was the initial impetus behind the OSM project and led to the team synthesizing and evaluating three different series of compounds.
    The laws of open science
    The OSM project operates along very similar lines to traditional medicinal chemistry projects in that the team is looking for an antimalarial drug candidate suitable for Phase 1 clinical trials. However, the day to day running of the project works quite differently and is probably most clearly defined by the team’s commitment to The Six Laws of Open Science:
    First law: All data are open and all ideas are shared
    Second Law: Anyone can take part at any level
    Third Law: There will be no patents
    Fourth Law: Suggestions are the best form of criticism
    Fifth Law: Public discussion is much more valuable than private email
    Sixth Law: An open project is bigger than, and is not owned by, any given lab
    The team uses online electronic lab notebooks (ELN) to record all experimental procedures and data. This means that anyone with access to the Internet can search for information from the project. All data, results and conclusions are posted in real time—even when things don’t quite turn out as planned! As the team processes and uploads raw data to the ELN, other scientists are free to compare their own data or to draw different conclusions to the OSM team and provide feedback in the comments section below each ELN entry. This means that people can actually use the data generated by the OSM team, for whatever purpose they wish. The transparent nature of the project also means that there should be less room for error and that results could be easily reproduced in other laboratories.
    The team is ardently opposed to patents, meaning they may need to navigate murky waters if and when they discover an excellent drug candidate. For decades, patents have been an essential part of the process required for bringing new medicines to the market, but the OSM team hopes to change this model.
    "There's a growing number of people questioning whether we need patents for the development of some drugs. Penicillin and the polio vaccine didn't need them. Maybe new medicines for malaria don't either," said Matthew Todd.
    Malaria is a catastrophic disease that mainly affects the world’s poorest people and so it is the ideal starting point for an open source drug discovery effort. New medicines for malaria have to be affordable and ideally administered in a single dose. Attempting to profit from those in dire need of life saving medicine would be morally reprehensible, and therefore the team believes it’s time to throw patents out the window and encourage scientists to work together and openly in order to cure malaria as expediently as possible.
    Coordinating in the open
    The team uses G+, Twitter, and Facebook as social media platforms for the discussion of results, promotion of the science and also (as in the case of Patrick and some other key members of the team) for recruitment of new members. GitHub has proven to be a valuable tool for project organization and discussion. The team avoids email as much as possible in order to facilitate open discussion and garner input from a variety of experts.
    Both members of the core team and volunteers regularly update and maintain the project wiki for use by OSM, the wider scientific community, and, of course, interested members of the public. This is just one area of the project where non-specialists are able to contribute and free up the chemists so that they can spend more time at the bench making compounds.
    Achieving success
    Another great success story for open science and OSM is the collaboration established with a group of 40 Lawrence University undergraduate students. The team at Sydney developed a robust method for the synthesis of a particular family of compounds, which was followed by Stefan Debbert’s lab class using different combinations of related starting materials. The class made lots of new molecules, learned how to prove the structure and purity of their offerings and had fun along the way. They evaluated the molecules for their activity against the malaria parasite and posted all experimental data to the project’s ELN.
    There are marked differences between open source science and the original open source movement, but scientists certainly have a great deal to learn from the software community. Open science removes the traditional hierarchy of research and encourages scientists of all levels—student or professor—to engage and contribute. Synthetic chemists need more than just a computer and access to the Web, and of course not just anyone has access to a lab and the skills required to make molecules. However, the OSM team is trying to lower the barrier to participation, while still conducting science of the highest standard. Until open science is just called "science," accelerating the discovery of a cure for malaria and encouraging others to work more openly are the true measures of success for an initiative such as OSM.
  • DATA
    Data, any information in binary digital form, is at the centre of the Curation Lifecycle.
    This includes:
    Digital Objects: simple digital objects (discrete digital items such as text files, image files or sound files, along with their related identifiers and metadata) or complex digital objects (discrete digital objects made by combining a number of other digital objects, such as websites).
    Databases: structured collections of records or data stored in a computer system.
     
    FULL LIFECYCLE ACTIONS
    Description and Representation Information Assign administrative, descriptive, technical, structural and preservation metadata, using appropriate standards, to ensure adequate description and control over the long-term. Collect and assign representation information required to understand and render both the digital material and the associated metadata.
    Preservation Planning Plan for preservation throughout the curation lifecycle of digital material. This would include plans for management and administration of all curation lifecycle actions.
    Community Watch and Participation Maintain a watch on appropriate community activities, and participate in the development of shared standards, tools and suitable software. 
    Curate and Preserve Be aware of, and undertake management and administrative actions planned to promote curation and preservation throughout the curation lifecycle.
    SEQUENTIAL ACTIONS
    Conceptualise Conceive and plan the creation of data, including capture method and storage options.
    Checklist
    Create or Receive Create data including administrative, descriptive, structural and technical metadata. Preservation metadata may also be added at the time of creation.
    Receive data, in accordance with documented collecting policies, from data creators, other archives, repositories or data centres, and if required assign appropriate metadata.
    Checklist
    Appraise and Select Evaluate data and select for long-term curation and preservation. Adhere to documented guidance, policies or legal requirements.
    Checklist
    Ingest Transfer data to an archive, repository, data centre or other custodian. Adhere to documented guidance, policies or legal requirements.
    Checklist
    Preservation Action Undertake actions to ensure long-term preservation and retention of the authoritative nature of data. Preservation actions should ensure that data remains authentic, reliable and usable while maintaining its integrity. Actions include data cleaning, validation, assigning preservation metadata, assigning representation information and ensuring acceptable data structures or file formats.
    Checklist Store Store the data in a secure manner adhering to relevant standards.
    Checklist
    Access, Use and Reuse Ensure that data is accessible to both designated users and reusers, on a day-to-day basis. This may be in the form of publicly available published information. Robust access controls and authentication procedures may be applicable.
    Checklist
    Transform Create new data from the original, for example:
    by migration into a different format, or
    by creating a subset, by selection or query, to create newly derived results, perhaps for publication
     
    OCCASIONAL ACTIONS
    Dispose Dispose of data, which has not been selected for long-term curation and preservation in accordance with documented policies, guidance or legal requirements. Typically data may be transferred to another archive, repository, data centre or other custodian. In some instances data is destroyed. The data's nature may, for legal reasons, necessitate secure destruction.
    Reappraise Return data which fails validation procedures for further appraisal and re-selection.
    Migrate Migrate data to a different format. This may be done to accord with the storage environment or to ensure the data's immunity from hardware or software obsolescence. 
     
    - See more at: http://www.dcc.ac.uk/resources/curation-lifecycle-model#sthash.6GswGUR7.dpuf
  • Thing 3 asks us to:
    Share a comment about a modification or addition you would include to make this model contextualised to your situation.

    One of the comments in the meetup included:
  • Clickable links to their resources
  • Our equivalent
  • Have a close look at the record to see the ways the Australian Antarctic Division has made this record discoverable and accessible.  

    Citation info

    Licencing info

    Note how many times this dataset has been cited and how to cite this data.  We will look at data citation in more detail in Thing 7. 

  • Citation Info at UWA
  • Citation Info at UWA
  • Licencing Info at UWA
  • This doesn’t present all the research data repositories Australia has to offer: is anything missing?

×