This document provides an overview of open research data, including definitions, licensing, standards, and history. It defines open data as data that anyone can freely access, use, modify, and share with few restrictions. For data to be truly open, it recommends using a CC0 public domain waiver or an attribution-only license. It discusses issues with non-commercial and no derivatives restrictions. The document also provides guidance on technical aspects like recommended file formats and standards. It briefly summarizes the history of data sharing, from centralized data centers to online supplementary data to emerging data paper journals. The key messages are that data should be FAIR (Findable, Accessible, Interoperable, Reusable) and that open data benefits both
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...Ross Mounce
A talk given at the Geological Society of London, UK on 2016/03/09 as part of the Lyell meeting on Palaeoinformatics. http://www.geolsoc.org.uk/lyell16 #lyell16
Open scholarship [a FOSTER open science talk]Ross Mounce
A talk by Dr Ross Mounce, given at the FOSTER Open Science event 4th September, King's College London http://www.fosteropenscience.eu/event/foster-discovering-open-practices-pgr-and-early-career-researchers-0
Published on Jan 29, 2016 by PMR
Keynote talk to LEARN (LERU/H2020 project) for research data management. Emphasizes that problems are cultural not technical. Promotes modern approaches such as Git / continuous Integration, announces DAT. Asserts that the Right to Read in the Right to Mine. Calls for widespread development of content mining (TDM)
Liberating facts from the scientific literature - Jisc Digifest 2016 TheContentMine
Published on Mar 4, 2016 by PMR
Text and data mining (TDM) techniques can be applied to a wide range of materials, from published research papers, books and theses, to cultural heritage materials, digitised collections, administrative and management reports and documentation, etc. Use cases include academic research, resource discovery and business intelligence.
This workshop will show the value and benefits of TDM techniques and demonstrate how ContentMine aims to liberate 100,000,000 facts from the scientific literature, and ContentMine will provide a hands on demo on a topical and accessible scientific/medical subject.
Automatic Extraction of Knowledge from the LiteratureTheContentMine
Published on May 11, 2016 by PMR
ContentMine tools (and the Harvest alliance) can be used to search the literature for knowledge, especially in biomedicine. All tools are Open and shortly we shall be indexing the complete daily scholarly literature
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...Ross Mounce
A talk given at the Geological Society of London, UK on 2016/03/09 as part of the Lyell meeting on Palaeoinformatics. http://www.geolsoc.org.uk/lyell16 #lyell16
Open scholarship [a FOSTER open science talk]Ross Mounce
A talk by Dr Ross Mounce, given at the FOSTER Open Science event 4th September, King's College London http://www.fosteropenscience.eu/event/foster-discovering-open-practices-pgr-and-early-career-researchers-0
Published on Jan 29, 2016 by PMR
Keynote talk to LEARN (LERU/H2020 project) for research data management. Emphasizes that problems are cultural not technical. Promotes modern approaches such as Git / continuous Integration, announces DAT. Asserts that the Right to Read in the Right to Mine. Calls for widespread development of content mining (TDM)
Liberating facts from the scientific literature - Jisc Digifest 2016 TheContentMine
Published on Mar 4, 2016 by PMR
Text and data mining (TDM) techniques can be applied to a wide range of materials, from published research papers, books and theses, to cultural heritage materials, digitised collections, administrative and management reports and documentation, etc. Use cases include academic research, resource discovery and business intelligence.
This workshop will show the value and benefits of TDM techniques and demonstrate how ContentMine aims to liberate 100,000,000 facts from the scientific literature, and ContentMine will provide a hands on demo on a topical and accessible scientific/medical subject.
Automatic Extraction of Knowledge from the LiteratureTheContentMine
Published on May 11, 2016 by PMR
ContentMine tools (and the Harvest alliance) can be used to search the literature for knowledge, especially in biomedicine. All tools are Open and shortly we shall be indexing the complete daily scholarly literature
An overview of Text and Data Mining (ContentMining) including live demonstrations. The fundamentals: discover, scrape, normalize , facet/index, analyze, publish are exemplified using the recent Zika outbreak. Mining covers textual and non-textual content and examples of chemistry and phylogenetic tress are given.
Amanuens.is HUmans and machines annotating scholarly literaturepetermurrayrust
about 10,000 scholarly articles ("papers") are published each day. Amanuens.is is a symbiont of ContentMine and Hypothes.is (both Shuttleworth projects/Fellows) which annotates theses using an array of controlled vocabularies ("dictionaries"). The results, in semantic form are used to annotate the original material. The talk had live demos and used plant chemistry as the examples
Automatic Extraction of Knowledge from the Literaturepetermurrayrust
ContentMine tools (and the Harvest alliance) can be used to search the literature for knowledge, especially in biomedicine. All tools are Open and shortly we shall be indexing the complete daily scholarly literature
High throughput mining of the scholarly literature TheContentMine
Published on Jun 7, 2016 by PMR
Talk given to statisticians in Tilburg, with emphasis on scholarly comms for detecting unusual features. Includes demo of Amanuens.is and image mining
Open Access for Early Career ResearchersRoss Mounce
My talk for the University of Bath Open Access Week session; 23rd October 2013.
http://www.bath.ac.uk/learningandteaching/rdu/courses/pgskills/modules/RP00335.htm
Use of ContentMine tools on the Open Access subset of EuropePubMedCentral to discover new knowledge about the Zika virus.
Three slides have embedded movies - these do not show in slideshare and a first pass of this can be seen as a single file at https://vimeo.com/154705161
Amanuens.is HUmans and machines annotating scholarly literature TheContentMine
Published on May 19, 2016 by PMR
about 10,000 scholarly articles ("papers") are published each day. Amanuens.is is a symbiont of ContentMine and Hypothes.is (both Shuttleworth projects/Fellows) which annotates theses using an array of controlled vocabularies ("dictionaries"). The results, in semantic form are used to annotate the original material. The talk had live demos and used plant chemistry as the examples
Automatic Extraction of Knowledge from Biomedical literaturepetermurrayrust
a plenary lecture to Cochrane Collaboration in Birmingham, on the value of automatically extracting knowledge. Covers the Why? How? What? Who? and problems and invites collaboration
ContentMining for France and Europe; Lessons from 2 years in UKpetermurrayrust
I have spend 2 years carrying out Content Mining (aka Text and Data Mining) in the UK under the 2014 "Hargreaves" exception. This talk was given in Paris, to ADBU , after France had passed the law of the numeric Republique. I illustrate what worked in what did not and why and offer ideas to France and Europe
Talk to OpenForum Academy (Open Forum Europe) about Text and data Mining. Four use cases selected fo non-scientists. Also discussion of latest on Europena copyright reform and TDM exceptions
How can repositories support the text-mining of their content and why? Nancy Pontika
Co-presented with Petr Knoth http://www.slideshare.net/petrknoth/ at the "Mining Repositories: How to assist the research and academic community on their text and data mining needs" workshop, which took place at the 11th International Conference on Open Repositories, Monday 13 June 2016.
An overview of Text and Data Mining (ContentMining) including live demonstrations. The fundamentals: discover, scrape, normalize , facet/index, analyze, publish are exemplified using the recent Zika outbreak. Mining covers textual and non-textual content and examples of chemistry and phylogenetic tress are given.
Amanuens.is HUmans and machines annotating scholarly literaturepetermurrayrust
about 10,000 scholarly articles ("papers") are published each day. Amanuens.is is a symbiont of ContentMine and Hypothes.is (both Shuttleworth projects/Fellows) which annotates theses using an array of controlled vocabularies ("dictionaries"). The results, in semantic form are used to annotate the original material. The talk had live demos and used plant chemistry as the examples
Automatic Extraction of Knowledge from the Literaturepetermurrayrust
ContentMine tools (and the Harvest alliance) can be used to search the literature for knowledge, especially in biomedicine. All tools are Open and shortly we shall be indexing the complete daily scholarly literature
High throughput mining of the scholarly literature TheContentMine
Published on Jun 7, 2016 by PMR
Talk given to statisticians in Tilburg, with emphasis on scholarly comms for detecting unusual features. Includes demo of Amanuens.is and image mining
Open Access for Early Career ResearchersRoss Mounce
My talk for the University of Bath Open Access Week session; 23rd October 2013.
http://www.bath.ac.uk/learningandteaching/rdu/courses/pgskills/modules/RP00335.htm
Use of ContentMine tools on the Open Access subset of EuropePubMedCentral to discover new knowledge about the Zika virus.
Three slides have embedded movies - these do not show in slideshare and a first pass of this can be seen as a single file at https://vimeo.com/154705161
Amanuens.is HUmans and machines annotating scholarly literature TheContentMine
Published on May 19, 2016 by PMR
about 10,000 scholarly articles ("papers") are published each day. Amanuens.is is a symbiont of ContentMine and Hypothes.is (both Shuttleworth projects/Fellows) which annotates theses using an array of controlled vocabularies ("dictionaries"). The results, in semantic form are used to annotate the original material. The talk had live demos and used plant chemistry as the examples
Automatic Extraction of Knowledge from Biomedical literaturepetermurrayrust
a plenary lecture to Cochrane Collaboration in Birmingham, on the value of automatically extracting knowledge. Covers the Why? How? What? Who? and problems and invites collaboration
ContentMining for France and Europe; Lessons from 2 years in UKpetermurrayrust
I have spend 2 years carrying out Content Mining (aka Text and Data Mining) in the UK under the 2014 "Hargreaves" exception. This talk was given in Paris, to ADBU , after France had passed the law of the numeric Republique. I illustrate what worked in what did not and why and offer ideas to France and Europe
Talk to OpenForum Academy (Open Forum Europe) about Text and data Mining. Four use cases selected fo non-scientists. Also discussion of latest on Europena copyright reform and TDM exceptions
How can repositories support the text-mining of their content and why? Nancy Pontika
Co-presented with Petr Knoth http://www.slideshare.net/petrknoth/ at the "Mining Repositories: How to assist the research and academic community on their text and data mining needs" workshop, which took place at the 11th International Conference on Open Repositories, Monday 13 June 2016.
PLUTo: Phyloinformatic Literature Unlocking Tools
A BBSRC-funded project to find phylogenetic trees in the literature, and make their underlying data re-usable again by extracting it & re-releasing it from the figure image as open, re-usable data
The slides that will accompany my live webcast for OpenCon 2014 attendees, all about open data in research. The benefits, the how to (both legally & technically), examples, pitfalls, and the future of open research data.
Subscription costs versus open access costs, & Dissolving journals' boundariesAlex Holcombe
draft of talk for Reclaiming the Knowledge Commons http://www.eventbrite.com.au/e/reclaiming-the-knowledge-commons-the-ethics-of-academic-publishing-and-the-futures-of-research-tickets-17560178968
SocialCite makes its debut at the HighWire Press meetingKent Anderson
A new service designed to allow readers and researchers to comment on the appropriateness, quality, and type of citations made in the literature made its debut at the HighWire Press Publishers Meeting yesterday.
Open access (OA) to scholarly literature recently hit a major milestone: Half of all research articles published become open access, either immediately or after an embargo period. Are the articles you read among them? What about the articles you write? Are the journals to which you submit open-access friendly? What about the journals for which you peer review? Are there any reasons why the public should not have access to the results of taxpayer-funded research?
In this slideshow, Jill Cirasella (Associate Librarian for Public Services and Scholarly Communication, Graduate Center, CUNY) explains the motivation for OA, describes the details of OA, and differentiates between publishing in open access journals (“gold” OA) and self-archiving works in OA repositories (“green” OA). She also dispels persistent myths about OA and examines some of the challenges to OA.
Fifty shades of green and gold: open access to scholarly informationhierohiero
Presentation for Urban Research Utrecht, a research school at Utrecht University, on Open Access to scholarly information in geography and planning, focussing of advantages, disadvantges, various forms, costs and actions of stakeholders
Scott Edmunds slides for class 8 from the HKU Data Curation (module MLIM7350 from the Faculty of Education) course covering science data, medical data and ethics, and the FAIR data principles.
Talk at JISC Repositories conference intended for repository managers or research managers on some of the issues involved. Talk had to be originally given unaided because of a technology problem!
A open science presentation focusing on the benefits to be gained and basic practices to follow. This was given on behalf of FOSTER at the Open Science Boos(t)camp event at KU Leuven on 24th October 2014.
re3data.org – Registry of Research Data RepositoriesHeinz Pampel
Heinz Pampel | GFZ German Research Centre for Geosciences, LIS
Maxi Kindling | Humboldt-Universität zu Berlin, Berlin School of Library and Information Science Frank Scholze | Karlsruhe Institute of Technology, KIT Library
RDA-Deutschland-Treffen 2015| Potsdam, November 26, 2015
Jean-Claude Bradley presents on "Peer Review and Science2.0: blogs, wikis and social networking sites" as a guest lecturer for the “Peer Review Culture in Scholarly Publication and Grantmaking” course at Drexel University. The main thrust of the presentation is that peer review alone is not capable of coping with the increasing flood of scientific information being generated and shared. Arguments are made to show that providing sufficient proof for scientific findings does scale and weakens the tragedy of the trusted source cascade.
Keynote presentation delivered at ELAG 2013 in Gent, Belgium, on May 29 2013. Discusses Research Objects and the relationship to work my team has been involved in during the past couple of years: OAI-ORE, Open Annotation, Memento.
The common use by archaeologists of ubiquitous technologies such as computers and digital cameras means that archaeological research projects now produce huge amounts of diverse, digital documentation. However, while the technology is available to collect this documentation, we still largely lack community accepted dissemination channels appropriate for such torrents of data. Open Context (http://www.opencontext.org) aims to help fill this gap by providing open access data publication services for archaeology. Open Context has a flexible and generalized technical architecture that can accommodate most archaeological datasets, despite the lack of common recording systems or other documentation standards. Open Context includes a variety of tools to make data dissemination easier and more worthwhile. Authorship is clearly identified through citation tools, a web-based publication systems enables individuals upload their own data for review, and collaboration is facilitated through easy download and other features. While we have demonstrated a potentially valuable approach for data sharing, we face significant challenges in scaling Open Context up for serving large quantities of data from multiple projects.
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...Jonathan Tedds
http://dlab.berkeley.edu/event/open-research-challenge-peer-review-and-publication-research-data
A talk by Dr. Jonathan Tedds, Senior Research Fellow, D2K Data to Knowledge, Dept of Health Sciences, University of Leicester.
PI: #BRISSKit www.brisskit.le.ac.uk
PI: #PREPARDE www.le.ac.uk/projects/preparde
The Peer REview for Publication & Accreditation of Research data in the Earth sciences (PREPARDE) project seeks to capture the processes and procedures required to publish a scientific dataset, ranging from ingestion into a data repository, through to formal publication in a data journal. It will also address key issues arising in the data publication paradigm, namely, how does one peer-review a dataset, what criteria are needed for a repository to be considered objectively trustworthy, and how can datasets and journal publications be effectively cross-linked for the benefit of the wider research community.
I will discuss this and alternative approaches to research data management and publishing through examples in astronomy, biomedical and interdisciplinary research including the arts and humanities. Who can help in the long tail of research if lacking established data centers, archives or adequate institutional support? How much can we transfer from the so called “big data” sciences to other settings and where does the institution fit in with all this? What about software?
Publishing research data brings a wide and differing range of challenges for all involved, whatever the discipline. In PREPARDE we also considered the pre and post publication peer review paradigm, as implemented in the F1000 Research Publishing Model for the life sciences. Finally, in an era of truly international research how might we coordinate the many institutional, regional, national and international initiatives – has the time come for an international Research Data Alliance?
Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...GigaScience, BGI Hong Kong
Scott Edmunds talk at the AIST Computational Biology Research Center in Tokyo: Overcoming the Reproducibility Crisis: and why I stopped worrying a learned to love open data (& methods), July 1st 2014
My 2 slides for #nfdp13
It was a 5min talk (and I was told strictly no more than 3 slides!)
The future of data sharing involves educating future generations in digital techniques, tools & values e.g. http://www.opensciencetraining.com/
My talk given at the 2nd meeting of the Licences for Europe Stakeholder dialogue meeting in Brussels (8th March, 2013), Working Group 4: Text & Data Mining.
context: http://ec.europa.eu/licences-for-europe-dialogue/en/content/about-site
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Phenomics assisted breeding in crop improvementIshaGoswami9
As the population is increasing and will reach about 9 billion upto 2050. Also due to climate change, it is difficult to meet the food requirement of such a large population. Facing the challenges presented by resource shortages, climate
change, and increasing global population, crop yield and quality need to be improved in a sustainable way over the coming decades. Genetic improvement by breeding is the best way to increase crop productivity. With the rapid progression of functional
genomics, an increasing number of crop genomes have been sequenced and dozens of genes influencing key agronomic traits have been identified. However, current genome sequence information has not been adequately exploited for understanding
the complex characteristics of multiple gene, owing to a lack of crop phenotypic data. Efficient, automatic, and accurate technologies and platforms that can capture phenotypic data that can
be linked to genomics information for crop improvement at all growth stages have become as important as genotyping. Thus,
high-throughput phenotyping has become the major bottleneck restricting crop breeding. Plant phenomics has been defined as the high-throughput, accurate acquisition and analysis of multi-dimensional phenotypes
during crop growing stages at the organism level, including the cell, tissue, organ, individual plant, plot, and field levels. With the rapid development of novel sensors, imaging technology,
and analysis methods, numerous infrastructure platforms have been developed for phenotyping.
Toxic effects of heavy metals : Lead and Arsenicsanjana502982
Heavy metals are naturally occuring metallic chemical elements that have relatively high density, and are toxic at even low concentrations. All toxic metals are termed as heavy metals irrespective of their atomic mass and density, eg. arsenic, lead, mercury, cadmium, thallium, chromium, etc.
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...University of Maribor
Slides from talk:
Aleš Zamuda: Remote Sensing and Computational, Evolutionary, Supercomputing, and Intelligent Systems.
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Inter-Society Networking Panel GRSS/MTT-S/CIS Panel Session: Promoting Connection and Cooperation
https://www.etran.rs/2024/en/home-english/
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Studia Poinsotiana
I Introduction
II Subalternation and Theology
III Theology and Dogmatic Declarations
IV The Mixed Principles of Theology
V Virtual Revelation: The Unity of Theology
VI Theology as a Natural Science
VII Theology’s Certitude
VIII Conclusion
Notes
Bibliography
All the contents are fully attributable to the author, Doctor Victor Salas. Should you wish to get this text republished, get in touch with the author or the editorial committee of the Studia Poinsotiana. Insofar as possible, we will be happy to broker your contact.
Open Research Data: Licensing | Standards | Future
1. Open Research Data:
Licensing | Standards | Future
Ross Mounce (@RMounce)
Natural History Museum, London
British Ecological Society
Open Data & Reproducibility
Workshop, London, 2015-04-21
XKCD 1179 on ISO 8601
2. bit.ly/opendataintro
These slides are a re-spin of my longer
OpenCon 2014 deck, on Slideshare here:
All my textual content is licensed under the
Creative Commons Attribution License 4.0 (CC BY), unless otherwise indicated
3. Outline
●
What is open data?
●
A short history of data sharing
●
Supplementary data needs to die
●
FAIR data as a 1st
class research output
4.
5. By sharing data we can see further
Data (& code) are the building
blocks of science
Shared, re-used data allow us to
more rigorously test hypotheses;
“to see further”
...and to do it all more quickly and
easily.
6. What exactly is open data?
From http://opendefinition.org/,
see http://opendefinition.org/od/ for more detail
Open means anyone can
freely access, use, modify,
and share for any purpose
(subject, at most, to requirements
that preserve provenance and
openness)
7. Legally, what is open data?
There are many open knowledge definition (OKD) conformant licences,
including (but not limited to):
See here for the comprehensive list: http://opendefinition.org/licenses/
CC0 waiver
http://creativecommons.org/
publicdomain/zero/1.0/
CC BY (Attribution only)
https://creativecommons.org/
licenses/by/4.0/
CC BY-SA (Attribution-ShareAlike)
https://creativecommons.org/licenses/by
-sa/4.0/
8. CC0 should be the default for data
CC0 is the default for data at:
Hrynaszkiewicz & Cockerill (2012) BMC Research Notes
Open by default: a proposed copyright license and waiver agreement for open access
research and data in peer-reviewed journals
Strongly recommended for data by:
9. Not all Creative Commons licences are 'open'
}
NC -- You “may not use this work for
commercial purposes”.
Work under this licence cannot be used for
any purpose, therefore it is not open.
Can have significant, often unexpected
negative impact on potential re-use.
ND -- “No Derivative Works”.
Work under this licence cannot be
adapted if it is re-used. Not very
helpful for research!
NC & ND – An extremely restrictive re-
use licence, neither commercial
purposes nor adaptations are allowed.
KEY PAPER: Hagedorn et al (2011) ZooKeys
Creative commons licenses and the non-commercial condition: Implications for the
re-use of biodiversity information
10. Non-open licencing causes real
problems for research & education
The Creative Commons non-commercial (-NC) restriction is poorly defined in
most jurisdictions, and even more poorly understood by many of its users.
“non-commercial” != “non-profit”
A) Non-commercial actually excludes many teaching purposes:
In the UK, university students typically pay expensive tuition fees to attend.
Thus university teaching is often a commercial activity, -NC restricted
materials cannot be used to teach students in these circumstances.
B) Licence incompatibility – NC licences are not compatible with licences
used on major collaboration platforms like Wikipedia or Wikimedia Commons
C) Non-commercial organizations (e.g. Deutschlandradio)
have been successfully sued for re-using CC BY-NC
content without permission.
Klimpel (2012) Consequences, risks and side-effects of
the license module “non-commercial use only - NC”
11. Real problems of non-open data:
GBIF & biodiversity data
Desmet, P. (2013) Showing you this map of aggregated bullfrog occurrences
would be illegal http://peterdesmet.com/posts/illegal-bullfrogs.html
12. Open data in scholarship, and beyond
The open data movement is much broader than just academia/research
It's been successful & popular in areas like open government data:
For transparency, detecting & discouraging corruption
For releasing social & commercial value (governments collect a lot of data
already, why not make wider use of it, at little or no extra cost?)
For participatory governance –
citizens can be more informed, a “read/write” society
Some text adapted from http://opengovernmentdata.org/
Each of these has clear parallels with open
research data: transparency & fraud detection,
extra value through research data re-use,
participatory citizen science
13. Open data in scholarship, and beyond
Similarly, and with some overlap to open research data,
there's the open GLAM movement
(GLAM = Galleries, Libraries, Archives & Museums)
In this case, their data is typically collections metadata
but also digital images of their collections
See http://openglam.org/ for more
14. Technical aspects of open data
So, you understand the imporance of licensing...
What next?
How best can we make our data openly available?
Where should I upload to?
What format(s) should I make the data available in?
15. Data Standards & Data File Formats
Adhere to existing standards, if possible!
xkcd 927 on standards
16. Data Standards & Data File Formats
Take note of community standards:
e.g. the Bermuda Principles for sharing DNA seq. data
● Automatic release of sequence
assemblies larger than 1 kb
(preferably within 24 hours).
● Immediate publication of finished
annotated sequences.
● Aim to make the entire sequence
freely available in the public domain
17. Data Standards & Data File Formats
If there are no formally agreed community standards,
canvas the community to create/formalise a standard
e.g. Best Practices for Data Sharing in Phylogenetic Research
Cranston et al (2014) PLOS Currents Tree of Life
e.g. The 1st Open Economics International Workshop
(Cambridge, 2013) bringing together academic
economists from around the world to discuss data
sharing in economics research.
18. Data Standards & Data File Formats
If there are multiple, competing file formats:
Opt for file formats based on open standards
https://en.wikipedia.org/wiki/Open_standard
e.g.
Avoid proprietary formats
https://en.wikipedia.org/wiki/Proprietary_format
e.g.
19.
20. Data Standards & Data File Formats
A real example: recent creation of a new data
standard for exchange of 3-dimensional reconstruction
of objects from tomographic imaging data
SPIERS software
+ VAXML data standard
Sutton et al (2012) SPIERS and VAXML: A software
toolkit for tomographic visualisation and a format for
virtual specimen interchange.
Palaeontologia Electronica
21. A super brief, eclectic
history of scientific data
sharing
22. Centralised Data Centres
for specific data types
The Cambridge Crystallographic Data Centre, est. 1965
It maintains the Cambridge Structural Database **
** Not open data sensu stricto …some types of users/uses are charged
23. Data Sharing (by snail mail)
e.g. “The full profile listings are on floppy disks
which are available upon request”
Fernholz et al (1989) A survey of measurements and measuring
techniques in rapidly distorted compressible turbulent boundary layers.
24. Bilofsky & Burks (1988)
Nucleic Acids Research v16 n5
“The author will provide the
accession number to the
PROCEEDINGS [PNAS]
office to be included in a
footnote to the published
paper.”
1989
25. Supplementary Data (Online)
[ journal-hosted ]
Chen et al (1999)
Fluorescence Polarization in
Homogeneous Nucleic Acid
Analysis. Genome Research
“Numerical values for the
data are available as online
supplementary material at
http://www.genome.org.”
26. http://treebase.org/, est. 1994
Not all databases succeed.
Build it, and they may not come...
Of phylogenetic analyses published in 2010,
only ~4% of them have data available for re-use
Stoltzfus et al 2012. Sharing and re-use of phylogenetic trees
(and associated data) to facilitate synthesis. BMC Research Notes
27. “Each custodian of data on plant traits will retain the right to be informed of
any TRY activity that may involve his/her data, and will have the opportunity to
negotiate whether his/her data can be used, and whether general
guidelines of authorship need to be modified in that particular case
Custodians retain the rights to withdraw their data at any time.”
Not all databases provide open data
https://www.try-db.org/TryWeb/Submission.php
http://danielfalster.com/blog/2013/08/23/making-a-case-for-a-fully-open-trait-database/
Recommended reading:
28. Supp. Data Needs to Die
From the 1990s to 2010s, online supplementary data was used as a way of
dumping data online in an ad hoc manner... It was available *shrugs*
Traditional, journal-hosted supplementary files bury data. Additional files are
bunged online with little or no additional metadata describing them.
Thus typically, SI isn't searchable. That's a huge problem
Data should be FAIR:
Findable, Accessible, Interoperable, Re-usable
It should be findable independent of the research article
https://www.force11.org/group/fairgroup
29. Supp. Data Often Neglected
Publisher-neglect (Wiley) meant this
paper was online for a week without
the crucial spreadsheet file the
entire article was describing !!!
Deeply embarrassing.
N.B.
This has
happened at
many other
journals too.
30. Where to upload FAIR open data?
Genbank,
SRA,
1000's more!
http://www.crystallography.net/
32. Intelligent data papers allow databases
to automatically pull-in your data
Many publishers (e.g. Pensoft) intelligently
markup data papers so that the data can be
automatically ingested into appropriate db's
on the day of publication!
Data
data
33. Data sharing benefits authors & re-users
Piwowar HA, Vision TJ. (2013)
Data reuse and the open data
citation advantage. PeerJ
1:e175
“...open data citation
benefit for this sample
to be 9%”
relative to papers
providing no public
data, for gene
expression microarray
data
10.7717/peerj.175/fig-2
See also previous work by
Piwowar:
10.1371/journal.pone.0000308
34. Those who share data, do better science
Wicherts, J. M., Bakker, M. & Molenaar, D. (2011)
Willingness to share research data is related to the
strength of the evidence and the quality of reporting of
statistical results. PLoS ONE 6, e26828+ URL
http://dx.doi.org/10.1371/journal.pone.0026828
The authors examined psychological papers for the quality of statistical
reporting & asked the authors of those papers for the full data underlying
the reported results. Generally, those who shared, had more statistically
robust, reproducible results.
35. “Email the author for data” - doesnt work
Wicherts JM, Borsboom D,
Kats J, Molenaar D (2006)
The poor availability of
psychological research
data for reanalysis.
American Psychologist 61:
726–728 link
A well-known problem, which
I myself have also faced
many times!!!
Many legacy journals
unfortunately still pretend
that “email the author” is
still acceptable.
36. Best practice open data is time consuming
(but still worth the extra effort!)
Emilio M. Bruna recently provided an estimate of the amount of
time it took him to prepare & upload open data related to
publication to figshare & dryad.
http://brunalab.org/blog/2014/09/04/the-opportunity-cost-of-my-openscience-was-35-
hours-690/
11
Hours
& $90
(for Dryad)
Providing open-source code was the most time consuming part (25.5 hours),
and Open Access publication the most expensive ($600).
37. Not all data should be open!
Intelligent openness
is required
– Royal Society report
However, with informed consent,
if patients really want to, they should be
allowed to publish their own medical data
Obviously, there are some types of data which
should NOT be made
mandatorily open e.g.
sensitive medical data
38. Other exceptions to the open default
Sensitive species conservation data
e.g. exact geocoordinates of home range
Certain species of wild orchids, cacti & carnivorous plants
are highly endangered by illegal harvesting.
Publishing the exact geolocation data of the remaining
populations of commercially-desirable, endangered
species is really dumb thing to do.
Such data is typically held privately in databases (not
publicly available).
39. The 5 stars of open data
Most research data would get
ZERO (not available online)
Or just ONE star
http://5stardata.info/
40. 3-star open research data is achievable
This is where research data publication
should be aiming for in the short term.
Publishing .csv / non-proprietary open data is
NOT actually that hard!
http://5stardata.info/
41. Further Reading
1.Editor’s Introduction - Samuel A. Moore
2.Open Content Mining - Peter Murray-Rust,
Jennifer C. Molloy, Diane Cabell
3.The Need to Humanize Open Science - Eric C.
Kansa
4.Data Sharing in a Humanitarian Organization: The
Experience of Médecins Sans Frontières - Unni
Karunakara
5.Why Open Drug Discovery Needs Four Simple
Rules for Licensing Data and Models - Antony J.
Williams, John Wilbanks, Sean Ekins
6.Open Data in the Earth and Climate Sciences -
Sarah Callaghan
7.Open Minded Psychology - Wouter van den Bos,
Mirjam Jenny, Dirk Wulff
8.Open Data in Health Sciences - Tom Pollard
9.Open Research Data in Economics - Velichka
Dimitrova
10.Open Data and Palaeontology - Ross Mounce
Open Access Book, CC BY
Published by Ubiquity Press
42. Confirmed speakers include: Michael Eisen & Patrick Brown
( 2 out of 3 of the co-founders of PLOS )
opencon2015.org @open_con #opencon2015
Last year:
Washington DC
Day 3 with advocacy at
NIH and US Senate
This year:
Brussels
Day 3 with advocacy at
European Commission
43. Further Reading
●
The Open Data Handbook - http://opendatahandbook.org/
●
5 star Open Data - http://5stardata.info/
●
Science as an open enterprise (2012) A Royal Society report
●
Caetano, D. S. & Aisenberg, A. 2014 Forgotten treasures: the fate of data in animal
behaviour studies Animal Behaviour
Data sharing in phylogenetics
●
Magee et al 2014 The Dawn of Open Access to Phylogenetic Data PLOS ONE
●
Drew et al 2013 Lost Branches on the Tree of Life. PLOS Biology
●
Stoltzfus et al 2012 Sharing and re-use of phylogenetic trees (and associated data)
to facilitate synthesis. BMC Research Notes
On licencing & legal issues with re-use
●
Hagedorn et al 2011 Creative commons licenses and the non-commercial
condition: Implications for the re-use of biodiversity information. ZooKeys
●
Mounce 2012. Life as a palaeontologist: Academia, the Internet and Creative
Commons. Palaeontology Online
●
Klimpel, P. 2012 Consequences, Risks, and side-effects of the license module
Non-Commercial – NC [PDF]
44. Further Reading
●
Murray-Rust, P. Open data in science. Serials Review 34, 52-64 (2008). URL
http://dx.doi.org/10.1016/j.serrev.2008.01.001
●
Leonelli, S., Smirnoff, N., Moore, J., Cook, C. & Bastow, R. Making open data work
for plant scientists. Journal of Experimental Botany 64, 4109-4117 (2013). URL
http://dx.doi.org/10.1093/jxb/ert273
●
Hrynaszkiewicz, I. & Cockerill, M. Open by default: a proposed copyright license
and waiver agreement for open access research and data in peer-reviewed
journals. BMC Research Notes 5, 494+ (2012). URL
http://dx.doi.org/10.1186/1756-0500-5-494
●
Boulton, G., Rawlins, M., Vallance, P. & Walport, M. Science as a public enterprise:
the case for open data. The Lancet 377, 1633-1635 (2011). URL
http://dx.doi.org/10.1016/s0140-6736(11)60647-8
●
Parr, C. S. Open sourcing ecological data. BioScience 57, 309-310 (2007). URL
http://dx.doi.org/10.1641/b570402
●
Poisot, T., Mounce, R. & Gravel, D. Moving toward a sustainable ecological
science: don't let data go to waste! Ideas in Ecology and Evolution 6 (2013). URL
http://dx.doi.org/10.4033/iee.2013.6b.14.f