Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Critical Thinking: Data Management in Research - Why and How?

1,029 views

Published on

> What is Data Management and why should it concern you?
> ETH regulations, intellectual property, privacy and access rights
> Data exercise - what it takes to understand someone's data
> Long term preservation of data
> Data management plans
> Data mangement tools incl. SIS and intro to ELN openBIS

Published in: Education
  • Be the first to comment

  • Be the first to like this

Critical Thinking: Data Management in Research - Why and How?

  1. 1. || Ana Sesartic & Matthias Töwe Caterina Barillari Digital Curation Office Scientific IT-Services 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 1 Data Management in Research – Why and How?
  2. 2. ||  from the  Digital Curation Office at ETH-Library  Scientific IT-Services  sharing a scientific background ourselves  here to discuss data management as part of your research  to learn more about your needs in the process  and to motivate you to think critically about the chances and limitations of data management and re-use. 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 2 Nice to meet you, we’re…
  3. 3. ||  Your scientific background  Motivation to attend this course 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 3 Shortly tell us something about… © icon 54 from Noun Project
  4. 4. || What is data management and why should it concern you? ETH regulations • Intellectual property, privacy and access rights Exercise • What does it take to understand someone’s data Active data management Long term preservation of data  coffee break  Data management plans Data management tools • Incl. intro to openBIS ELN-LIMS (SIS) 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 4 Today’s Programme
  5. 5. || What is data management and why should it concern you? Introduction 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 5 Digital Research Data Hypothesis / Research question Data capture or -collection Analysis and interpretation SynthesisPublication Access and verification Re-use
  6. 6. || “A reinterpretable representation of information in a formalized manner suitable for communication, interpretation, or processing.” Digital Curation Centre What is data? 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 6 Slide adapted from the PrePARe Project
  7. 7. ||  Manage your data while doing research  Share, publish, preserve your data for others – and for yourself ! 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 7 Two major issues
  8. 8. ||  Data management is a general term covering how you organize, structure, store, and care for the information used or generated during a research project  It includes:  How you deal with information on a day-to-day basis over the lifetime of a project  What happens to data in the longer term – what you do with it after the project concludes 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 8 What is data management?
  9. 9. || 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 9 Manage the transition from private to public (Derived from Andrew Treloar’s Diagram: Private Research , Shared Research , Publication , and the Boundary Transitions, http://andrew.treloar.net/research/diagrams/data_curation_continuum.pdf, 2012)
  10. 10. ||  Data is usually created without its publication in mind  Research data requires comprehensive documentation  Only technical metadata can be extracted later, but little to no documentation of content and context can be added 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 10 Constraints for preservation and sharing
  11. 11. || Why manage your research data? Or, on carrots (benefits) and sticks (requirements) 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 11
  12. 12. || 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 12 Why spend time and effort on this?  So you can work efficiently and effectively  Preserve data that cannot be replicated (e.g. observational data)  Avoid redundant data creation/collection  Highlight patterns or connections that might otherwise be missed  To enable data re-use and sharing – even for yourself  Facilitate collaboration in your group and globally  Raise your impact factor: your data can be cited  You are the experts who can make informed decisions!  To meet funders’ and institutional requirements  SNF asks for data management plans as of October 2017  EU Horizon 2020 asking for data management plans  Keep work in accordance to good scientific practice, transparency and validity  You may be able to influence the discussion in your community, in your institution and with funders
  13. 13. || ETH regulations, intellectual property, privacy and access rights 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 13
  14. 14. || https://itsecurity.ethz.ch/en/#/manage_your_data 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 14 Recent Overview
  15. 15. ||  At the ETH Zurich research is founded on intellectual honesty. Researchers […] are committed to scientific integrity and truthfulness in research and peer review.  https://www.ethz.ch/content/dam/ethz/main/research/ pdf/forschungsethik/Broschure.pdf 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 15 Guidelines for Research Integrity
  16. 16. ||  All steps in the treatment of primary data (statistical analyses, reorganizations, etc.) must be documented in a form appropriate to the discipline in question (e.g. laboratory logs, other data carriers) in such a way as to ensure that the results obtained from the primary data can be reproduced completely.  The project management is responsible for data management (data collection, storage, data access, compliance with data protection requirements, etc.). In particular, it should ensure that, following completion of the project, the data and materials are retained for the period prescribed in the discipline, and are duly destroyed within the period prescribed by law, if appropriate.  From: https://www.ethz.ch/content/dam/ethz/main/research/pdf/forschungsethik/Broschure.pdf 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 16 Article 11. Collection, documentation and storage of primary data
  17. 17. ||  Project Members:  adhere to the principles of good scientific practice and the guidelines for Research Integrity at ETH.  All steps of treatment of primary data must be documented and results must be reproducible.  Project Manager:  responsible for execution of a scientific project and data management (data collection, storage, data access, compliance with data protection requirements...).  Ensures that all research project participants are aware of the guidelines.  Determines together with the professor, which departed project members should retain access to the primary data or materials. From: https://www.ethz.ch/content/dam/ethz/main/research/pdf/forschungsethik/Broschure.pdf 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 17 Roles and Responsibilities
  18. 18. || 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 18 Do you know where your data is and who has access to it? http://fsfe.org/nocloud
  19. 19. ||  The removal of sensitive data from ETH Zurich (e.g. research data subject to contractual confidentiality with third parties, important ETH Zurich business data such as financial data, personal employee or student data, reports) is not permitted. ETH Zurich must retain access to and control over such data at all times.  The use of cloud and social media services (e.g. Facebook, Google, Dropbox) in research, for exchange with researchers at other universities, or in teaching for exchange with students (lecture folders, etc.) is permitted as long as no sensitive ETH Zurich data are affected and no third party rights, in particular privacy or intellectual property rights, are infringed. Links: https://www1.ethz.ch/id/documentation/rechtliches/Merkblatt_Cloud_Computing_MA.pdf https://www1.ethz.ch/id/documentation/rechtliches/leaflet_example_cloud_EN.pdf 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 19 Cloud Computing@ ETH Zurich Rules and Regulations © Symbolon from Noun Project
  20. 20. ||  […] all ETH members […] are required to integrate the general conditions and internal directives into the work process.  In the research context, the project manager plays an active role in guiding and monitoring junior scientists. In particular, he or she is responsible for making sure that everyone involved in the project is aware of the research integrity guidelines.  Junior scientists are given appropriate guidance.  Primary data is carefully archived.  From: https://rechtssammlung.sp.ethz.ch/Dokumente/133en.pdf 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 20 Compliance Guide
  21. 21. ||  People-related data need to be preserved according to Swiss data protection law  Appropriate anonymization might be required  The deletion of individual datasets must be possible at all times  The study subjects need to sign a declaration of consent 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 21 Privacy © Hea Poh Lin from Noun Project
  22. 22. ||  Intellectual Property Rights (IPR) can be defined as rights acquired over any work created or invented with the intellectual effort of an individual. These rights are time-limited and do not require explicit assertion.  Owners of works are granted exclusive rights  to publish the work  to license the work’s distribution to others  to sue in case of unlawful or deceptive copying  It may be possible to waive these rights through a public declaration similar to a licence 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari Page 22 Intellectual Property Rights © Arthur Shlain from Noun Project
  23. 23. ||  Respect the rights of others  Third parties  Individuals you work with  In case of doubt: seek permission even when a CC-licence is assigned  Note that according to ETH law, ETH reserves most immaterial rights in works by its employees. When in doubt, contact ETH transfer (www.transfer.ethz.ch)  Make sure you keep sufficient rights  Eg. for Open Access Publishing (green path)  Eg. with respect to patent applications: ETH transfer (www.transfer.ethz.ch) 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 23 What you need to consider
  24. 24. || 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 24 share-alike by non-derivative Some rights reserved share non-commercial public domainremix
  25. 25. || www.dcc.ac.uk/resources/how-guides/license-research-data 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 25 Licensing research data Outlines pros and cons of each approach and gives practical advice on how to implement your licence CREATIVE COMMONS LIMITATIONS NC Non-Commercial What counts as commercial? SA Share Alike Reduces interoperability ND No Derivatives Severely restricts use Horizon 2020 guidelines point to OR
  26. 26. || 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 26 Benefits of data sharing © Neil Chue Hong http://dx.doi.org/10.6084/m9.figshare.942289
  27. 27. || “In genomics research, a large-scale analysis of data sharing shows that studies that made data available in repositories received 9% more citations, when controlling for other variables; and that whilst self-reuse citation declines steeply after two years, reuse by third parties increases even after six years.” (Piwowar and Vision, 2013) 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 27 Benefits of Open Data: Impact and Longevity Van den Eynden, V. and Bishop, L. (2014). Incentives and motivations for sharing research data, a researcher’s perspective. A Knowledge Exchange Report, http://repository.jisc.ac.uk/5662/1/KE_report- incentives-for-sharing-researchdata.pdf
  28. 28. || Exercise: What it takes to understand someone’s data 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 28 ©TheUpturnedMicroscope
  29. 29. || 1. What information is needed to understand your data? 2. What information do you expect from metadata in your field? Is this sufficient for you to work with others’ data? 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 29 Discuss
  30. 30. ||  Data, metadata and context are needed to properly understand a data set.  Data management does not start with your own data, but also includes a critical view on other people’s data you use:  Do you understand how they were produced?  Do you have enough information on evaluating their reliability?  Are you comfortable with using data without talking to its producers?  Will you know in a few months time which data you re-used from other researchers?  Do you know how to cite the data you use? 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 30 Critically re-thinking the (re-)use of data
  31. 31. || 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 31
  32. 32. || Active research data management 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 32 Picture by Mushonz (Own work) [CC BY-SA 4.0 (http://creativecommons.org/licenses/by-sa/4.0)], via Wikimedia Commons
  33. 33. || Research data workflow in experimental labs Sample preparation Measurement Analysis Publication 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 33 ?
  34. 34. || What is active research data management? ARDM is the process of annotating, storing and backing up all data when it is generated, throughout the life-span of a project. Why is active research data management important? 1. Experimental results must be reproducible. 2. Data annotation is necessary for reproducibility. 3. Think long-term! When data is generated, we know everything about it. As time goes by, it becomes harder to retrieve information, if no annotation/storage mechanism is in place. 4. Having an annotation/storage mechanism in place simplifies the process of writing papers and PhD thesis. 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 34
  35. 35. || 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 35 Data types in experimental labs Meeting notes Presentations Literature .... Generic files Descriptions of experimental procedures Protocols List of all materials used in lab (e.g chemicals, bio- samples, equipment) Materials Data generated by processing of raw data Processed dataData generated by measuring instruments Raw data Data generated by analysis of raw/processed data with different software/scripts Analyzed data Scripts used for data processing/ analysis Scripts
  36. 36. || 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 36 The “data spread” Local HD NAS CDS LTS Generic files Protocols Materials Raw data Processed data Analyzed data Scripts
  37. 37. || What do we need to consider for managing research data? 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari Experimental description / notes Date Title Materials Methods Analysis ResultsData management Management of samples and protocols 37  As part of ARDM a connection between all these parts needs to be established.  This connection is key to enable reproducibility of results.
  38. 38. || 1. Data management 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari Files/folders naming conventions  Backup Data management platform  Structured organization of data  Data is annotated with metadata  Searchable  One central location  Backup 38
  39. 39. || 2. Experimental description / notes 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari Date Title Materials Methods Analysis Results Paper laboratory notebook Electronic laboratory notebook 39
  40. 40. || 3. Management of samples and protocols 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari Excel/Word Databases Laboratory Information Management System (LIMS)  Not scalable  No sharing  No search  Scalable  Sharing  Search functionality  Can be integrated with measuring instruments 40
  41. 41. || Electronic Laboratory Notebooks  ELNs are the digital version of the classic paper laboratory notebooks, used to record experimental lab procedures  Several open-source and commercial ELNs are available:  market analysis based on LIMSwiki.org (74 commercials + 11 open source)  There are different types of ELNs:  Generic note keeping applications (e.g. OneNote, Evernote, etc)  Scientific online applications (e.g. SciNote, Benchling, etc)  Combined data management + note keeping applications  Combined data management + note keeping + LIMS applications 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari ELN and data management platforms are not the same! 41
  42. 42. || 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 42 How can different lab notebook types help us to keep track of our data?
  43. 43. || 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 43 1. Paper lab notebooks ? Exp. description Local HD NAS CDS LTS Generic files Protocols Materials Raw data Processed data Analyzed data Scripts
  44. 44. || 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 44 2. Generic note-keeping applications Local HD NAS CDS LTS Scripts Raw data Processed data Analyzed data Central local storage Cloud Generic files Protocols ? Materials Date Title Materials Methods Analysis Results
  45. 45. || 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 45 3. Scientific online ELNs Generic files Protocols (Materials) Cloud ? Local HD NAS CDS LTS Scripts Raw data Processed data Analyzed data (Materials) Date Title Materials Methods Analysis Results
  46. 46. || Central local storage/cloud 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 46 4. Data management + note keeping Generic files Scripts Analyzed data Processed data Date Title Materials Methods Analysis Results Materials Protocols Raw data
  47. 47. || 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 47 5. Data management + note keeping + LIMS Central local storage/cloud Generic files Protocols Materials Raw data Analyzed data Processed data Date Title Materials Methods Analysis Results Scripts
  48. 48. || Remarks  There are several options available  There is no “best for all”, it all depends on your research field, data types and research workflow 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 48 WORK  ARDM requires WORK, but it is an investment for the future!
  49. 49. || The FAIR data principles The FAIR Guiding Principles for scientific data management and stewardship, Scientific Data, Issue 3, 2016. 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 49 CC-BY-SA 4.0 Sangya Pundir https://upload.wikimedia.org/wikipedia/commons/a/aa/FAIR_data_principles.jpg ARDM can help you in making your data FAIR
  50. 50. || Long term preservation of data And how to prepare for it 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 50
  51. 51. || What does long term mean?  Different time horizons and purposes  Keeping data for at least ten years to ensure accountability if results are challenged (as defined in the ETH “Guidelines for Research Integrity”)  Potentially unlimited retention of data with permanent value (e.g. long running series of observational data)  Permanent retention of published data which is considered as part of the scientific record and is expected to remain available just like articles and journals are  In general “long term” signifies any time period which spans technological changes in the way data is being used 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 51
  52. 52. || 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 52 Old files? Cartoon «Old Files», taken from: http://xkcd.com/1360/  How old is your oldest file? Is it in use?  Age of files need not be a problem in itself…  …but obsolescence is  How much are you willing to invest to recover unreadable files?  Files in regular use will not become obsolete unnoticed
  53. 53. || 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 53 What is needed to ensure re-usability of data? Data Curation Content Preservation Bitstream Preservation What? Why? Who? Ensure intellectual re-usability Ensure technical re-usability Ensure technical stability Adapted after Jens Ludwig, Wissgrid Data Producers ETH-Bibliothek IT-Services ETH Zurich
  54. 54. || 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 54 What is needed to ensure re-usability of data? Data Curation Content Preservation Bitstream Preservation What? Why? Who? Ensure intellectual re-usability Ensure technical re-usability Ensure technical stability Adapted after Jens Ludwig, Wissgrid Data Producers ETH-Bibliothek IT-Services ETH Zurich You!
  55. 55. ||  Proper data management or its absence determine if preservation of data will be possible  For a period of ten years, data management alone might suffice, but thinking further ahead is useful  If data is to be kept and used for longer periods, further measures apply:  Data should be as self-contained as possible, including documentation of any tools used or better: the tools themselves; remember e.g. including reference outputs for model algorithms  More care is required in the choice and use of file formats 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 55 How does this relate to data management?
  56. 56. ||  Open standards (non proprietary)  If proprietary, convert or if not possible include data viewer  Well documented  Widely used and supported by many tools  Uncompressed (or at least losslessly compressed)  Unencrypted  When in doubt, keep original and create a copy in an open or exchange format  Don’t rely on file extensions  Consider that data might be used in different operating systems 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 56 Preferences for file formats
  57. 57. ||  Images: uncompressed TIFF; JPEG2000  Text: ASCII, including XML etc.  Add encoding information and dependencies such as stylesheets or TeX- libraries  Text (page based): PDF/A1-b, (PDF)  Data from spreadsheets: CSV  Spreadsheets: (CSV), (ODF, OOXML) 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 57 Examples
  58. 58. ||  This does not mean you «must not» keep data in other formats  Just be aware that proprietary or undocumented formats (even your own!) might cause trouble in the future  Think about adding an alternative format (yes, redundantly) for a proprietary one…  …and add any context information you yourself would like to have on your own formats in a few years time in a readme-file, an accompanying document or as metadata 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 58 Note
  59. 59. || 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 59 "A story told in file names": Source: http://www.phdcomics.com/comics/archive.php ?comicid=1323 Copyright: Jorge Cham Does this remind you of something?
  60. 60. ||  Keep stuff together that belongs together  Keep path names short  File names should  Reflect content and be unique  Use only ASCII characters (no diacritic characters) For further file and folder organisation tips, see:  http://www.data.cam.ac.uk/data-management-guide/organising- your-data  http://www.wur.nl/en/Expertise-Services/Data-Management- Support-Hub/Browse-by-Subject/Organising-files-and-folders.htm  http://datalib.edina.ac.uk/mantra/organisingdata/ 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 60 Try this instead… © Wageningen University
  61. 61. || Data Management Planning 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 61
  62. 62. || A brief plan written at the start of a project and updated during its course to define:  What data will be collected or created?  How the data will be documented and described?  Where the data will be stored?  Who will be responsible for data security and backup?  Which data will be shared and/or preserved?  How the data will be shared and with whom? 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 62 What is a Data Management Plan (DMP)?
  63. 63. || DMPs are increasingly required with grant applications (eg. by the SNSF from October 2017 onwards), but are useful whenever researchers are creating data. They can help researchers to:  Make informed decisions to anticipate & avoid problems  Develop procedures early on for consistency  Ensure data are accurate, complete, reliable and secure  Avoid duplication, data loss and security breaches  Save time and effort to make their lives easier! 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 63 Why develop a DMP?
  64. 64. ||  Supports you in the creation of a DMP or in discussing data management in general  Covers general planning and the phases of the data life-cycle, from data collection and creation to data sharing and long-term management  Special sections cover documentation and metadata, file formats, storage, ethical and intellectual property issues  http://bit.ly/rdmchecklist 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 64 What to do? Data Management Checklist ETH / EPFL
  65. 65. || https://dmponline.dcc.ac.uk/ The DMPOnline tool by the UK Digital Curation Centre helps you create Horizon 2020 compliant data management plans, by answering a questionnaire, ensuring that your scientific data are:  Findable  Accessible  Interoperable to specific quality standards  Reusable beyond the original purpose for which it was collected Collection of DMP examples: http://www.dcc.ac.uk/resources/data-management-plans/guidance-examples 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 65 DMPOnline
  66. 66. || 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 66 Data should be FAIR CC-BY-SA 4.0 Sangya Pundir https://upload.wikimedia.org/wikipedia/commons/a/aa/FAIR_data_principles.jpg
  67. 67. ||  From the start of data production, data publishing should be considered and prepared – if appropriate  Complying with requirements from third parties (funders, universities, community standards, law)  Use of comprehensible, documented structures  Use of transparent, speaking or documented naming conventions (e.g. with key to abbreviations); names should not exceed reasonable limits  Defined limits: 255 character in path and file names / no special characters such as spaces, / . @ ä, etc.  Versioning, supporting an easy selection of data e.g. for publication or further processing  Try to make your data as self-contained as possible  Avoid unintentional redundancies 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 67 What to do? - Data production and filing
  68. 68. ||  Self-critical question:  What must data look like to enable us to re-use it with scientific conviction and trust into its quality and correctness?  Is this true for our own data? What is missing?  Tasks for group leaders  Agree on binding rules  Define data management responsible (DMR) within the group  Discuss and document rules (in writing) with DMR 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 68 What to do? Research Group Policy
  69. 69. || Tools 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 69
  70. 70. ||  Versioning: How do you currently handle it? What works well? What went wrong?  Naming conventions: Do you have any? Which rules apply?  Sharing: Which tools or services do you use? What are your experiences?  Literature Management: Which tools do you use? What are their pros and cons? 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 70 Group discussion: current practice
  71. 71. ||  Where will your data reside?  Which legislation applies, e.g. in terms of data protection?  Is the service sustainable?  Do you trust the provider?  Who else can access and use which of your data?  How can you get your data back?  Is a certain license required?  Are there immediate or longer term costs? 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 71 Criteria for chosing services and tools © Jorgen Stamp
  72. 72. || 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 72 Repositorien und Registries http://www.re3data.org http://datadryad.org https://zenodo.org http://figshare.com https://www.openaire.eu/search/data-providers (Only partially recommendable as according to their Terms of Use, figshare is allowed to delete data anytime and without notice.)
  73. 73. || 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 73 Deposit in a repository – but in which one? http://databib.org http://service.re3data.org/search
  74. 74. || Open Access Infrastructure for Research in Europe  Aggregates data on OA publications  Mines and enriches its content by linking things together  Provides services & APIs e.g. to generate publication lists  www.openaire.eu 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 74 OpenAIRE
  75. 75. || Only conditionally recommended  Data stored in EU/USA  Security regulations only partially fulfilled  Never store sensitive / private data there! Recommended  Data stored in Switzerland  Security regulations fulfilled 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 75 Collaboration - Sharing https://www.dropbox.com https://www.switch.ch/drive/ https://www.switch.ch/filesender https://cifex.ethz.ch/ https://polybox.ethz.ch https://www.wetransfer.com
  76. 76. || 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 76 Collaboration - Organisation https://www.openproject.org http://www.redmine.org https://trello.com https://slack.com https://tagpacker.com https://asana.com
  77. 77. || 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 77 Collaboration – Versioning https://subversion.apache.org https://github.com https://bitbucket.orghttps://www1.ethz.ch/id/services/list/sharepoint/sharepoint_services_EN (Sharepoint is only meant to be used for versioning of documents, not of research data!)
  78. 78. || 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 78 Collaboration - Writing https://www.overleaf.com https://www.authorea.com https://atlas.oreilly.com https://hypothes.is https://evernote.com http://simplenote.com https://www.onenote.com https://www1.ethz.ch/id/services/list/ sharepoint/sharepoint_services_EN
  79. 79. || www.zotero.org 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 79 Collaboration – Reference Management www.mendeley.com endnote.com http://www.jabref.org www.citeulike.org www.bibsonomy.org
  80. 80. || 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 80 A few known ELN Solutions in use at ETH https://benchling.com http://labcollector.com http://findingsapp.com
  81. 81. || Scientific IT-Services & their tools https://sis.id.ethz.ch 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 81
  82. 82. || 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 82 ID-SIS role in research data management ID-SIS provides:  Consulting in active data management  Software solutions (openBIS)  Data management services https://sis.id.ethz.ch/researchdatamanagement/ Contact: sis.helpdesk@ethz.ch
  83. 83. ||  openBIS is a data management platform developed by ID-SIS  Under active development since 2007 (CISD until 2013, ID-SIS afterwards)  Originally developed for management of life sciences data within SystemsX projects  Generic underlying structure makes it amenable to be used in other disciplines  Currently used in several labs and facilities at ETH and outside 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 83 openBIS facts
  84. 84. || 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 84 openBIS in a nutshell Workflow manager
  85. 85. || 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 85 openBIS for life sciences HCS Microscopy Proteomics Sequencing
  86. 86. || 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 86 openBIS for life sciences ELN-LIMS Interface developed in collaboration with biologists at D-BSSE and D-BIOL, to improve usability of the system as Electronic Laboratory Notebook and Laboratory Information Management System
  87. 87. || 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 87 openBIS ELN-LIMS Two versions are available:  ELN for life sciences. Default fields suitable for most biological labs are predefined (e.g. antibodies, plasmids, chemicals, etc). These can be customized.  Generic ELN. Only minimal fields are predefined. Need to be customized by users.
  88. 88. ||  Project goal:  evaluate effectiveness of educational materials on students across Switzerland.  openBIS used in addition to another database to store data on students, educational materials and test results.  Custom user interface developed for this purpose. 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 88 openBIS use-case in Institute of Behavioural Sciences (IFV)
  89. 89. || 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 89 openBIS use-case for physicists in University of Geneva  Project goal:  Build 20 cameras for the cherenkov telescopes which analyze particle shower in the atmosphere produced by gamma radiations in space.  Needs:  Database to keep track of all components used to build the cameras and all technical data + quality control reports associated with them.  openBIS fits the needs and it is currently in test phase.
  90. 90. || Laboratory Notebook & Inventory Manager 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 90 How can openBIS ELN-LIMS help you? Samples Protocols Experiment Description Raw Data Analysis Scripts Results openBIS ELN-LIMS is an integrated: Date Title Material s Methods Analysis Results Inventory management system Notebook Data management system
  91. 91. || Samples Protocols Experiment Description Raw Data Analysis Scripts Results Laboratory Notebook & Inventory Manager 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 91 How can openBIS ELN-LIMS help you? All entities can be connected to each other, so it is possible to have the full history of published (and unpublished) results. Date Title Material s Methods Analysis Results Inventory management system Notebook Data management system
  92. 92. || Samples Protocols Experiment Description Raw Data Analysis Scripts Results Laboratory Notebook & Inventory Manager 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 92 How can openBIS ELN-LIMS help you? Result X Matlab script X Raw data X Protocol 1, 2, 3 Sample A, B, C, D Important for reproducibility! All entities can be connected to each other, so it is possible to have the full history of published (and unpublished) results.
  93. 93. || openBIS ELN-LIMS features Relationships Storage manager Plasmid mapsBLAST search Import/Export File upload Big data Instrument integration 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 93
  94. 94. || openBIS ELN-LIMS features Users need to authenticate to the system to login. Users can have different roles: Instance admin, Space admin, Space User, Space observer, Instance observer. Data is stored in “datasets”, which are immutable. Modified data must be uploaded in a new dataset. All modifications made to any entity in the system are recorded in the database. Space admins and Instance admins are allowed to delete entities. This permission can be completely removed upon request. A record of deleted entities can be kept in the database. Datasets can be archived to affordable tape storage when no longer needed. They can be recovered from this storage when needed. It is possible to export the complete lab notebook or parts of it. User roles Change log Deletion ExportData preservation Data archiving 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 94
  95. 95. || 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 95 Recent developements 1. Integration with Jupyterhub  Interactive lab notebook useful for exploratory data analysis  Interactive programming (R, Python, Shell, Scala, etc)  Programs, results and documentation are kept all together 2. External data management  Large amounts of data (several hundreds of TBs)  Large data stored in different places OK! Download Data Analyze Data Create Graphs Upload Results ?
  96. 96. ||  openBIS software installed on a virtual machine provided by ID-SD.  Data stored on NAS share provided by departmental IT.  Software maintenance.  Initial training.  Continuous support. 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 96 Basic DM services provided by ID-SIS
  97. 97. ||  Database customization.  Migration of existing databases.  Instrument integration for direct data upload.  Upload of existing historic raw data. 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 97 Additional DM services provided by ID-SIS upon request
  98. 98. || 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 98 openBIS ELN-LIMS website https://openbis-eln-lims.ethz.ch
  99. 99. || Further Services at ETH Zurich 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 99
  100. 100. ||  ETH E-Collection (http://e-collection.library.ethz.ch/index.php?lang=en)  ETH E-Citations (http://e-citations.ethbib.ethz.ch/index.php?lang=en)  ETH Data-Archive (http://www.library.ethz.ch/Digital-Curation)  Long term preservation of data  Not for mass storage and active data  Open Access (http://www.library.ethz.ch/en/Open-Access) including payment of Article Processing Charges with a range of publishers  DOI registration (http://www.library.ethz.ch/DOI-Desk-EN)  ORCID (http://www.library.ethz.ch/en/ORCID) 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 100 Services at ETH Library Will be merged into «Research Collection» and allow publication of documents and data as of May 2017
  101. 101. || IT Services  Storage provisioning, usually via your IT Support Group  HSM (Hierarchical Storage Management) http://www1.ethz.ch/id/services/list/storage/nas  LTS (Long-Term Storage) http://www1.ethz.ch/id/services/list/storage/LTS  openBIS ELN-LIMS https://openbis-eln-lims.ethz.ch/ ETH-Transfer https://www.ethz.ch/en/the-eth-zurich/organisation/staff-units/eth-transfer.html  Software disclosure workflow with ETH Data-Archive  Advice on Intellectual Property, Patents, Licensing of Software etc. 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 101 IT services and ETH transfer
  102. 102. ||  Training courses and workshops on information research, reference management, data management, scientific writing and open access by the ETH-Library: http://www.library.ethz.ch/en/Services/Training-courses-guided-tours  Courses offered by the ETH Information Center for Chemistry/Biology/Pharmacy: http://www.infozentrum.ethz.ch/en/whats-up/events/  Further topics on demand 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 102 Trainings
  103. 103. ||  Think about what you do!  Start early  Agree on clean concepts and simple tools  You do not need the latest sophisticated apps  Talk to colleagues  Check what your local service providers can offer  «Keep it as simple as possible – but distrust it!» 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 103 Take home message
  104. 104. || Dr. Matthias Töwe Head Digital Curation ETH-Bibliothek Rämistrasse 101 8092 Zurich 044 632 60 32 matthias.toewe@library.ethz.ch 26.04.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 104 Thank you! Are there any questions? Dr. Ana Sesartic Digital Curation ETH-Bibliothek Rämistrasse 101 8092 Zurich 044 632 73 76 ana.sesartic@library.ethz.ch www.library.ethz.ch/Digital-Curation data-archive@library.ethz.ch Dr. Caterina Barillari ID Scientific IT Services Mattenstrasse 22 4058 Basel 061 387 31 29 caterina.barillari@id.ethz.ch Digital Curation Scientific IT Services https://www1.ethz.ch/id/about/sections/sis

×