Data as a research output and a research asset: the case for Open Science/Simon Hodson
Data as a research output and a research asset:
the case for Open Science
Simon Hodson, Executive Director, CODATA
www.codata.org
National Dialogue on Mainstreaming Open Data Access and Use in Uganda
Skyz Hotel, Naguru Kampala
25 April 2018
INTERNATIONAL DATA WEEK
IDW 2018
Gaborone, Botswana: 5-8 November 2018
Information: http://internationaldataweek.org/
Deadline for abstracts, 31 May:
https://www.scidatacon.org/IDW2018/
Why Open Science / FAIR Data?
• Good scientific practice depends on communicating the evidence.
• Open research data are essential for reproducibility, self-correction.
• Academic publishing has not kept up with age of digital data.
• Danger of an replication / evidence / credibility gap.
• Boulton: to fail to communicate the data that supports scientific
assertions is malpractice
• Open data practices have transformed certain areas of research.
• Genomics and related biomedical sciences; crystallography;
astronomy; areas of earth systems science; various disciplines using
remote sensing data…
• FAIR data helps use of data at scale, by machines, harnessing
technological potential.
• Research data often have considerable potential for reuse,
reinterpretation, use in different studies.
• Open data foster innovation and accelerate scientific discovery through
reuse of data within and outside the academic system.
• Research data produced by publicly funded research are a public
asset.
What is Open Science? (1)
Open access to research literature.
Data that is as Open as possible, as closed as necessary.
FAIR Data (Findable, Accessible, Interoperable,
Reusable).
Data is a recognised and important output of research.
A culture and methodology of open discussion and
enquiry (including methodology, lab notebooks, pre-
prints).
Data code and analysis processes are shared for
reproducibility.
Engagement with society and the economy in research
activities (citizen science, co-design / transdisciplinary
research, interface between research, development and
innovation).
What is Open Science? (2)
Open Science is not just Open Access + Open Data.
Individuals, institutions and the science system benefits
from putting research outputs (including data) in the
open: shop window and repository of all research
outputs.
Important role of open processes, open data and
reproducibility / replicability.
Role of AI / Machine Learning: analysis at scale.
Open innovation and transdisciplinary research.
The Open Science ethos and co-design helps build
collaboration between research institutions, societal
groups, government agencies, third sector and industry.
Dryad Joint Data Archiving Policy, Feb 2010: http://datadryad.org/jdap
This journal requires, as a condition for publication, that data supporting the results in the
paper should be archived in an appropriate public archive, such as GenBank, TreeBASE,
Dryad, or the Knowledge Network for Biocomplexity.
PLOS Data Availability Policy, revised Feb 2014:
http://www.plosone.org/static/policies.action#sharing
PLOS journals require authors to make all data underlying the findings described in their
manuscript fully available without restriction, with rare exceptions.
Springer Nature initiative to standardise policies:
http://www.springernature.com/gp/group/data-policy/policy-types
RDA Interest Group developing standardised journal data policies.
Developments: Journal Policies
Bill and Melinda Gates Foundation, Open Access and Open Data Policy
https://www.gatesfoundation.org/how-we-work/general-information/open-access-policy
‘Data Underlying Published Research Results Will Be Accessible and Open Immediately. The
foundation will require that data underlying the published research results be immediately accessible
and open. This too is subject to the transition period and a 12-month embargo may be applied.’
MSF Data Sharing Policy: http://www.msf.org/en/msf-data-sharing-policy
‘MSF recognizes the ethical imperative it has to share its data openly, transparently and in a timely
manner for the greater public health good.’
Appropriate restrictions for consent, privacy, etc.
European Commission Data Policy: ‘as open as possible, as closed as necessary’, FAIR Data
Wellcome Trust: strong support for Open Data sharing, with appropriate restrictions.
Developments: Donor Policies
80% of ecology data irretrievable after 20 years
(516 studies)
Vines TH et al. (2013) Current Biology DOI: 10.1016/j.cub.2013.11.014
Good Data Management /
Data Science Saves Time!
Lowndes et al., 2017, Our path to better science in less time
using open data science tools, Nature Ecology and Evolution,
http://dx.doi.org/10.1038/s41559-017-0160
Ocean Health Index Project
Data Revolution:
A World that Counts!
Creating a world that counts: Mobilising the Data
Revolution for Sustainable Development.
To meet the new sustainablity goals ‘there is an urgent need to
mobilise the data revolution for all people and the whole planet in
order to monitor progress, hold governments accountable and
foster sustainable development.’
Without immediate action, gaps between developed and
developing countries, between information-rich and information-
poor people, and between the private and public sectors will
widen, and risks of harm and abuses of human rights will grow.
Data quality and integrity
Data disaggregation (no-one should be invisible)
Data timeliness
Data transparency and openness
Data usability and curation
Data protection and privacy
Data governance and independence
Data resources and capacity
Data rights
The Value of Open Data Sharing
Report by CODATA for GEO, the Group on Earth
Observation.
Provides a concise, accessible, high level synthesis of key
arguments and evidence of the benefits and value of
open data sharing.
Particular, but not exclusive, reference to Earth
Observation data.
Benefits in the areas of:
Economic Benefits
Social Welfare Benefits
Research and Innovation Opportunities
Education
Governance
Available at http://dx.doi.org/10.5281/zenodo.33830
GEO DSWG is building on this work with further
examples: would be valuable to work with this
community.
The Case for Open Data
in a Big Data World
• Science International Accord on Open Data in a Big Data
World: http://www.science-international.org/
• Supported by four major international science
organisations.
• Presents a powerful case that the profound
transformations mean that data should be:
• Open by default: as open as possible, as closed as
necessary
• Intelligently open, FAIR data
• Lays out a framework of principles, responsibilities and
enabling practices for how the vision of Open Data in a
Big Data World can be achieved.
• Campaign for endorsements: over 150 organisations so
far.
• Please consider endorsing the Accord:
http://www.science-international.org/#endorse
Framework for Regional, National and
Institutional Data Strategies
National / Institutional Open Science and FAIR Data Strategy
Consultative forum, stakeholder engagement.
Open data policies and guidance at national and institutional
level.
Clarify the boundaries of open (particularly privacy, IPR).
Clarify the data in scope, guidelines on selection.
Develop incentives and reward systems.
Mechanisms (infrastructure and policy) to ensure
concurrent publication of data as research output.
Data ‘publication’ and citations of data included in
assessment of research contribution.
Promotion of data skills:
Essential data skills for researchers.
Develop skills and competencies for data stewards, data
scientists.
Framework for Regional, National and
Institutional Data Strategies
Scope, roadmap and implement data infrastructure.
Key components of national and regional
infrastructure (network / NREN, economies of scale
for storage and compute).
Development of regional, national and institutional
infrastructure(s) for research collaboration and data
stewardship/RDM, generic research
platforms/environments, trusted digital repositories.
Collaborative infrastructures for certain research
disciplines, nationally, regionally to pool expertise and
lower costs.
International infrastructure / data ecosystem
components: permanent identifiers, metadata
standards.
Data is difficult: benefits and challenges
Open and FAIR data is essential for transparency and reproducibility; to take advantage of
analysis at scale; to tackle major interdisciplinary challenges that require integration of data
from many resources; has significant economic and other societal benefits, including
encouraging partnerships between research, government, innovation and development.
But…
Research funders and research performing institutions will have to invest in data
infrastructure.
Essential to consider the cost of data stewardship and dissemination as part of the total cost
of doing research.
Data description, definitions and ontologies, data management require significant effort.
Requires new data skills…
Requires a change in culture, new processes and activities…
Open Science and FAIR Data:
Benefits for Stakeholders
Government and Innovation / Development
Increased impact from investment in activities relating to data; economic, innovation and research
benefits.
Partnerships for research, development and innovation around co-design, Open Science and FAIR
data.
Research Institutions:
Development of data capacity and data skills;
Not losing valuable data (stored on hard drives, not annotated or reusable);
Shop window of research activities and expertise (Open Access, Open Data / FAIR Data)
Capacity to build research schools around data assets and skills, attract international collaboration
and investment.
Build case for ‘data sovereignty’, data (re-)patriation.
Researchers:
Increased data skills, expertise in FAIR data builds competitive edge.
Citation advantage of Open Access / Open Data.
Culture of certain research disciplines is already strongly in favour of Open Data / Open Science.
Vision and Mission of an
African Open Science Platform
African scientists are at the cutting edge of contemporary, data-intensive science as a
fundamental resource for a modern society.
A digital ecosystem with four complementary aims governed by a set of common principles
and practices:
1. A virtual space for scientists to find, deposit, manage, share and reuse data, software
and metadata;
2. A means of continually developing capacities at all levels of national science systems
and amongst professionals and their institutions operating in the public and private
domain;
3. A basis for multi-stakeholder consortia that wish to utilise powerful digital tools in
addressing major common problems, and for work in the trans-disciplinary mode;
4. A forum for exchange of ideas, best practices and opportunities amongst Platform
partners and with the international data-science community.
5. An African Data Science Institute, to advance the frontiers of data science and provide
support for interdisciplinary research domains where there are particularly strong data
assets in Africa.
Simon Hodson
Executive Director CODATA
www.codata.org
http://lists.codata.org/mailman/listinfo/codata-international_lists.codata.org
Email: simon@codata.org
Twitter: @simonhodson99
Tel (Office): +33 1 45 25 04 96 | Tel (Cell): +33 6 86 30 42 59
CODATA (ICSU Committee on Data for Science and Technology), 5 rue Auguste Vacquerie, 75016 Paris,
Thank you for your attention!
INTERNATIONAL DATA WEEK
IDW 2018
Gaborone, Botswana: 5-8 November 2018
Information: http://internationaldataweek.org/
Deadline for abstracts, 31 May:
https://www.scidatacon.org/IDW2018/
CODATA-RDA School of Research
Data Science
• Contemporary research – particularly
when addressing the most significant,
interdisciplinary research challenges –
increasingly depends on a range of skills
relating to data.
• These skills include the principles and
practice of Open Science; research data
management and curation, how to
prepare a data management plan and to
annotate data; software and data
carpentry; principles and practices of
visualisation; data analysis, statistics and
machine learning; use of computational
infrastructures. The ensemble of these
skills, relating to data in research, can
usefully be called ‘Research Data Science’.
CODATA-RDA School of
Research Data Science
• Annual foundational school at ICTP, Trieste (with the
objective to build a network of partners, train-the-
trainers).
• Advanced workshops, ICTP, Trieste, following the
foundational school.
• National or regional schools, organised with local
partners.
2018
• Next #DataTrieste Summer School, 6-17 August 2018.
• Next #DataTrieste Advanced Workshops 20-24 August
2018.
• Call for applications, deadline 21 May:
http://www.codata.org/datatrieste2018
• Schools in Brisbane (UQ and Australian Academy of
Sciences); ICTP Kigali (October); ICTP São Paulo
(December)
DataTrieste Film on Vimeo: https://vimeo.com/232209813
Call for applications, deadline 21 May: http://www.codata.org/datatrieste2018
Editor's Notes
CODATA was established by the International Council for Science to promote the availability and quality of data for all areas of research.
CODATA has three strategic priority areas: Please consult the CODATA strategy and Prospectus for more information.
promoting data principles, policies and practice: recent work includes a survey of research data policies, a report on the value of open data sharing for GEO, the promotion of data citation and the Science International Accord on Open Data in a Big Data World, which has been endorsed by IUCr.
advancing the frontiers of data science: this is done through Task Groups and Working Groups; by means of the Data Science Journal, relaunched with Ubiquity Press and regular conferences (henceforth we intend to organise a CODATA Conference in odd years and International Data Week, with RDA and WDS, in even years).
mobilising data capacity (with particular attention strategies, skills and ‘soft’ infrastructure in LMICs): through the initiative for a foundational curriculum for research data science (research data science summer schools), the regular Open Data Training Workshops hosted by CODATA China and the capacity building element of initiatives like the African Open Science Platform.
Gaborone International Convention Centre (GICC)
Open Science, Open Data and FAIR data have become internationally important for very good reasons.
Custom has been peer to peer sharing. That is ineffective over the long run, as shown by recent study from Tim Vines.
We are experiencing a
We are experiencing a
We are experiencing a
CODATA made a major contribution to the debate through the Science International Accord on Open Data in a Big Data World. Excellent IUCr position paper in response. Welcome endorsements from other organisations.
We are experiencing a
We are experiencing a
Data is difficult and the contribution of those who curate data needs to be recognised and rewarded. Data citation is a necessary but not sufficient part of this.
Data is difficult and the contribution of those who curate data needs to be recognised and rewarded. Data citation is a necessary but not sufficient part of this.