CC BY-SA 4.0
OPEN DATA
sharing the main actor of a scientific story
pcmasuzzo
24 October 2016 paola masuzzo
CC BY-SA 4.0
What is exactly open data?
Why should you make your data open?
How can you make your data open?
My open data story
CC BY-SA 4.0
What is exactly open data?
Why should you make your data open?
How can you make your data open?
My open data story
CC BY-SA 4.0
Open data implies freedom to access, use
and re-use for any purpose
http://opendefinition.org/od/
CC BY-SA 4.0
Open data implies freedom to access, use
and re-use for any purpose
http://opendefinition.org/od/; http://opendefinition.org/licenses/
There are many open knowledge definition conformant licenses
CC0 waiver
https://creativecommons.org/publicdomain/zero/1.0/
CC BY (Attribution only)
https://creativecommons.org/licenses/by/4.0/
CC BY-SA (Attribution ShareAlike)
https://creativecommons.org/licenses/by-sa/4.0/
CC BY-SA 4.0
Open data implies freedom to access, use
and re-use for any purpose
http://opendefinition.org/od/; http://opendefinition.org/licenses/
There are many open knowledge definition conformant licenses
CC0 waiver
https://creativecommons.org/publicdomain/zero/1.0/
CC BY (Attribution only)
https://creativecommons.org/licenses/by/4.0/
CC BY-SA (Attribution ShareAlike)
https://creativecommons.org/licenses/by-sa/4.0/
CC BY-SA 4.0
What is exactly open data?
Why should you make your data open?
How can you make your data open?
My open data story
CC BY-SA 4.0
Research data need to be treated as
first-class citizens in science
Vines et al., Current Biology, 2014; image courtesy Auke Herrema
CC BY-SA 4.0
Research data need to be treated as
first-class citizens in science
Vines et al., Current Biology, 2014; image courtesy Auke Herrema
Data should
themselves be
considered the
primary output
of research
CC BY-SA 4.0
One could just argue that data produced
with public funds belong to the public
Image courtesy Auke Herrema
CC BY-SA 4.0
But there are so many more great reasons
for data to be open
develop new
analysis methods
improve research
practices
guarantee data
preservation
reduce cost of
science
engage
with citizens
increase visibility and
collaborations
science-driven motivations
society-driven motivations
data users
benefits
data producers
benefits
enhance reproducibility
ask new questions
advance science
CC BY-SA 4.0
Open data means more hands at work, more
brain power and faster innovations
Gina Kolata, The New York Times, 2010; SCIENCEMAG 2016 - Williamson et al., 2016
CC BY-SA 4.0
Open data creates a culture of transparency
and potentially discourages fraud
Wicherts et al., PloS one, 2011
“Willingness to share research data is related to the strength of
the evidence and the quality of reporting of statistical results”
CC BY-SA 4.0
Open data means more reproducibility and
better research practices
Monya Baker, Nature, 2016; image courtesy Auke Herrema
CC BY-SA 4.0
Open data means also visibility and a higher
chance to get cited
Piwowar et al., PeerJ, 2013
citation
advantage
CC BY-SA 4.0
What is exactly open data?
Why should you make your data open?
How can you make your data open?
My open data story
CC BY-SA 4.0
The Panton Principles are a pretty good
starting point
1. When publishing data, make an explicit and robust statement of your
wishes.
2. Use a recognized copyright waiver or license that is appropriate for data.
3. If you want your data to be effectively used and added to by others, it
should be open as defined by the Open Knowledge/Data Definition—in
particular, non-commercial and other restrictive clauses should not be used.
4. Explicit dedication of data underlying published science into the public
domain via PDDL (http://opendatacommons.org/licenses/pddl/1-0/) or
CCZero (http://creativecommons.org/publicdomain/zero/1.0/) is strongly
recommended and ensures compliance with both the Science Commons
Protocol for Implementing Open Access Data and the Open Knowledge/Data
Definition.
http://pantonprinciples.org
CC BY-SA 4.0
A lot of repositories are available to upload
research materials and data
CC BY-SA 4.0
A lot of repositories are available to upload
research materials and data
CC BY-SA 4.0
You certainly don’t need to know more than
1,500 repositories by heart
https://biosharing.org/databases/
CC BY-SA 4.0
There also exist a number of data journals
CC BY-SA 4.0
Making data available is only one half of the
open data equation
intelligent access to the data and interoperability are crucial
Wilkinson et al., 2016, Scientific Data; https://www.force11.org/group/fairgroup
CC BY-SA 4.0
What is exactly open data?
Why should you make your data open?
How can you make your data open?
My open data story
CC BY-SA 4.0
Cell migration experiments are complex and
produce diverse and rich data sets
sample
preparation
image
acquisition
image
processing
data
analysis
Servier Medical Art, CC-BY 3.0; Cell Image Library, CC-BY 3.0
CC BY-SA 4.0
Cell migration experiments are complex and
produce diverse and rich data sets
Servier Medical Art, CC-BY 3.0; Cell Image Library, CC-BY 3.0
• paper laboratory
notebooks
• electronic
laboratory
notebooks
• spreadsheets
• text files
• protocols
• papers...
• raw files
• XML files
• proprietary
microscope or
acquisition software
files  ND2 for
Nikon, LIF for Leica,
OIB or OIF for
Olympus, LSM or ZVI
for Zeiss
• OME-TIFF
• image files with
pixel values and
metadata
• png, jpeg, tiff, avi
• text files describing
processing
algorithms
• text files describing
extracted features
• graphs, plots
• analysis pipelines
• text files describing
computational
algorithms...
sample
preparation
image
acquisition
image
processing
data
analysis
CC BY-SA 4.0
CellMissy is our open-source tool for cell
migration data management and analysis
0 3h 6h
wound
cells
Experiment
Data Analyzer
Data Loader
Collective cell migration Single-cell migration
Experiment Manager
Masuzzo et al., Bioinformatics, 2013; https://github.com/compomics/cellmissy
CC BY-SA 4.0
CellMissy enables data and metadata
exchange
lab A
CC BY-SA 4.0
CellMissy enables data and metadata
exchange
lab A lab B
CC BY-SA 4.0
CellMissy enables data and metadata
exchange
lab B
This is one file in CellMissy! (≈10 MB)
lab A
CC BY-SA 4.0
But we can easily extend this concept
to a bigger scale
Data
Repository
Local Software
CC BY-SA 4.0
And so we did it!
Cell migration workshop, Ghent, March 2014; Masuzzo et al., Trends in Cell Biology, 2015
CC BY-SA 4.0
An open data exchange ecosystem for cell
migration research is now on its way
Masuzzo et al., Trends in Cell Biology, 2015
CC BY-SA 4.0Image courtesy Auke Herrema
CC BY-SA 4.0Image courtesy Auke Herrema
CC BY-SA 4.0
Thank you!

Opening webinar for the Open Access week 2016

  • 1.
    CC BY-SA 4.0 OPENDATA sharing the main actor of a scientific story pcmasuzzo 24 October 2016 paola masuzzo
  • 2.
    CC BY-SA 4.0 Whatis exactly open data? Why should you make your data open? How can you make your data open? My open data story
  • 3.
    CC BY-SA 4.0 Whatis exactly open data? Why should you make your data open? How can you make your data open? My open data story
  • 4.
    CC BY-SA 4.0 Opendata implies freedom to access, use and re-use for any purpose http://opendefinition.org/od/
  • 5.
    CC BY-SA 4.0 Opendata implies freedom to access, use and re-use for any purpose http://opendefinition.org/od/; http://opendefinition.org/licenses/ There are many open knowledge definition conformant licenses CC0 waiver https://creativecommons.org/publicdomain/zero/1.0/ CC BY (Attribution only) https://creativecommons.org/licenses/by/4.0/ CC BY-SA (Attribution ShareAlike) https://creativecommons.org/licenses/by-sa/4.0/
  • 6.
    CC BY-SA 4.0 Opendata implies freedom to access, use and re-use for any purpose http://opendefinition.org/od/; http://opendefinition.org/licenses/ There are many open knowledge definition conformant licenses CC0 waiver https://creativecommons.org/publicdomain/zero/1.0/ CC BY (Attribution only) https://creativecommons.org/licenses/by/4.0/ CC BY-SA (Attribution ShareAlike) https://creativecommons.org/licenses/by-sa/4.0/
  • 7.
    CC BY-SA 4.0 Whatis exactly open data? Why should you make your data open? How can you make your data open? My open data story
  • 8.
    CC BY-SA 4.0 Researchdata need to be treated as first-class citizens in science Vines et al., Current Biology, 2014; image courtesy Auke Herrema
  • 9.
    CC BY-SA 4.0 Researchdata need to be treated as first-class citizens in science Vines et al., Current Biology, 2014; image courtesy Auke Herrema Data should themselves be considered the primary output of research
  • 10.
    CC BY-SA 4.0 Onecould just argue that data produced with public funds belong to the public Image courtesy Auke Herrema
  • 11.
    CC BY-SA 4.0 Butthere are so many more great reasons for data to be open develop new analysis methods improve research practices guarantee data preservation reduce cost of science engage with citizens increase visibility and collaborations science-driven motivations society-driven motivations data users benefits data producers benefits enhance reproducibility ask new questions advance science
  • 12.
    CC BY-SA 4.0 Opendata means more hands at work, more brain power and faster innovations Gina Kolata, The New York Times, 2010; SCIENCEMAG 2016 - Williamson et al., 2016
  • 13.
    CC BY-SA 4.0 Opendata creates a culture of transparency and potentially discourages fraud Wicherts et al., PloS one, 2011 “Willingness to share research data is related to the strength of the evidence and the quality of reporting of statistical results”
  • 14.
    CC BY-SA 4.0 Opendata means more reproducibility and better research practices Monya Baker, Nature, 2016; image courtesy Auke Herrema
  • 15.
    CC BY-SA 4.0 Opendata means also visibility and a higher chance to get cited Piwowar et al., PeerJ, 2013 citation advantage
  • 16.
    CC BY-SA 4.0 Whatis exactly open data? Why should you make your data open? How can you make your data open? My open data story
  • 17.
    CC BY-SA 4.0 ThePanton Principles are a pretty good starting point 1. When publishing data, make an explicit and robust statement of your wishes. 2. Use a recognized copyright waiver or license that is appropriate for data. 3. If you want your data to be effectively used and added to by others, it should be open as defined by the Open Knowledge/Data Definition—in particular, non-commercial and other restrictive clauses should not be used. 4. Explicit dedication of data underlying published science into the public domain via PDDL (http://opendatacommons.org/licenses/pddl/1-0/) or CCZero (http://creativecommons.org/publicdomain/zero/1.0/) is strongly recommended and ensures compliance with both the Science Commons Protocol for Implementing Open Access Data and the Open Knowledge/Data Definition. http://pantonprinciples.org
  • 18.
    CC BY-SA 4.0 Alot of repositories are available to upload research materials and data
  • 19.
    CC BY-SA 4.0 Alot of repositories are available to upload research materials and data
  • 20.
    CC BY-SA 4.0 Youcertainly don’t need to know more than 1,500 repositories by heart https://biosharing.org/databases/
  • 21.
    CC BY-SA 4.0 Therealso exist a number of data journals
  • 22.
    CC BY-SA 4.0 Makingdata available is only one half of the open data equation intelligent access to the data and interoperability are crucial Wilkinson et al., 2016, Scientific Data; https://www.force11.org/group/fairgroup
  • 23.
    CC BY-SA 4.0 Whatis exactly open data? Why should you make your data open? How can you make your data open? My open data story
  • 24.
    CC BY-SA 4.0 Cellmigration experiments are complex and produce diverse and rich data sets sample preparation image acquisition image processing data analysis Servier Medical Art, CC-BY 3.0; Cell Image Library, CC-BY 3.0
  • 25.
    CC BY-SA 4.0 Cellmigration experiments are complex and produce diverse and rich data sets Servier Medical Art, CC-BY 3.0; Cell Image Library, CC-BY 3.0 • paper laboratory notebooks • electronic laboratory notebooks • spreadsheets • text files • protocols • papers... • raw files • XML files • proprietary microscope or acquisition software files  ND2 for Nikon, LIF for Leica, OIB or OIF for Olympus, LSM or ZVI for Zeiss • OME-TIFF • image files with pixel values and metadata • png, jpeg, tiff, avi • text files describing processing algorithms • text files describing extracted features • graphs, plots • analysis pipelines • text files describing computational algorithms... sample preparation image acquisition image processing data analysis
  • 26.
    CC BY-SA 4.0 CellMissyis our open-source tool for cell migration data management and analysis 0 3h 6h wound cells Experiment Data Analyzer Data Loader Collective cell migration Single-cell migration Experiment Manager Masuzzo et al., Bioinformatics, 2013; https://github.com/compomics/cellmissy
  • 27.
    CC BY-SA 4.0 CellMissyenables data and metadata exchange lab A
  • 28.
    CC BY-SA 4.0 CellMissyenables data and metadata exchange lab A lab B
  • 29.
    CC BY-SA 4.0 CellMissyenables data and metadata exchange lab B This is one file in CellMissy! (≈10 MB) lab A
  • 30.
    CC BY-SA 4.0 Butwe can easily extend this concept to a bigger scale Data Repository Local Software
  • 31.
    CC BY-SA 4.0 Andso we did it! Cell migration workshop, Ghent, March 2014; Masuzzo et al., Trends in Cell Biology, 2015
  • 32.
    CC BY-SA 4.0 Anopen data exchange ecosystem for cell migration research is now on its way Masuzzo et al., Trends in Cell Biology, 2015
  • 33.
    CC BY-SA 4.0Imagecourtesy Auke Herrema
  • 34.
    CC BY-SA 4.0Imagecourtesy Auke Herrema
  • 35.

Editor's Notes

  • #6 people who use the data must credit whoever has published or generate the data (attribution) copies or adaptations of the data must be released similarly as open data (share-alike)
  • #14 There is little point in opening up data if it is not used; it does not intrinsically lead to better science in and of itself, although it could be argued that the open publication of datasets will directly discourage fraud.
  • #15 More than 70% of researchers have tried and failed to reproduce another scientist's experiments, and more than half have failed to reproduce their own experiments. Those are some of the telling figures that emerged from Nature's survey of 1,576 researchers who took a brief online questionnaire on reproducibility in research. Methods, code - raw data not available
  • #16 It would be useful to evaluate the reuse of current open data, but evidence is limited due to issues in tracking data citations. However, it does appear that publicly sharing your data increases citation rate, at least in cancer microarray experiments, which is positive encouragement that open biological data is being reused
  • #18 Note: no license does NOT mean that your data is open!
  • #22 Data must be well described before others can use it and benefit from it. Scientists who share data in a reusable manner deserve credit through citable publications.